When I was first learning about computer science in the early 1980s, much was made of the difference between “low-level” and “high level” languages. Back then, it seemed as if the distinction was fairly clear:
- “low level” languages are languages which require you to understand the mechanics of how a computer works, and tell it every little step..
- “high level” languages give you the ability to write programs using concepts from maths (variables, expressions, sets and so on) and from computer science (loops, functions, recursion etc..)
By these rules, considering the languages of the times, machine code, assembly language (and pseudo-assembly-languages such as CESIL and MIX) are “low level”, and pretty much everything else (FORTRAN, APL, COBOL, CORAL, ALGOL, BASIC, Pascal and so on) is “high level”
Even back then, this approach to categorizing languages was showing the strain. Languages based on specialist concepts such as LISP and Prolog didn’t fit nicely with the model, system scripting and “job control” languages were quietly ignored, and the growth of task-specific languages including PILOT and SQL took the field off in another direction altogether. My personal favourite language at the time (FORTH) seemed to completely ignore such arbitrary distinctions, being at once very machine specific, and yet capable of high-order abstractions syntactically indistinguishable from the core language.
These days the situation is much worse. A huge proliferation of languages and language environments with widely varying specialities and capabilities, the concepts of internal and external “domain-specific languages” executable and linguistic file formats, the list goes on.
Does the concept of language “level” still have any value, or should it merely be considered a historical quirk like the many types of sailing ships or horse-drawn carriages? I suggest maybe it does have value, but as a tool to compare languages and help with some of the choice of which to use. There are probably as many ways of comparing languages as there are languages, but I have a particular axis along which I think it is most useful to compare. That axis is portability.
Let’s start at the “bottom”.
- “machine language” is probably the lowest level we can go. Raw bits and bytes at locations in memory are very tricky to create, edit, manage, and reason about; they are completely application-agnostic, and they are specific not just to a particular kind of machine, but to the particular memory in a particular machine.
- “assembly language” is next up. While specific to a particular machine architecture and capabilities, it can be compiled into machine code which will run on any compatible machine. It’s a bit easier to reason about and manage as it can be stored, printed and edited as text.
- Above assembly language I place the relatively large class of “compiled languages”. Compilers of varying degrees of intelligence and flexibility can help these languages be more or less portable. C is in this category and (as I found several days ago) can be used to write software which can be compiled to run on strikingly different architectures.
- Above compiled languages sit a class of languages which compile to a virtual machine. This includes all the JVM languages (Java, Clojure, Scala, Groovy, and so on) and many other popular languages are taking this approach. The big benefit with these languages is that not only is the source code portable (as in the C case), but the compiled code is too. The output of the compiler can run on any physical machine which has the appropriate virtual machine and interface capabilities. If you were to believe the early hype about Java, this is the pinnacle of portability. They used to say “write once, run anywhere”, which was catchy, but kind of missed the point.as this was already a feature of compiled languages such as C and C++. The real benefit of a VM language is the ability to “compile once, run anywhere”.
- To my mind, though, there is still more headroom above VM languages. The next rung goes to “interpreted languages” (and to their close cousins “compile-on-demand languages”. In these kinds of languages there is no compile step. The source code is the program. Code in these languages can be run on anything with an appropriate interpreter, with no machine-specific steps, not even virtual-machine-specific steps; no compilation, no tools required to manipulate, manage and load compiled output. This goes a long way toward explaining the popularity of these kinds of languages, which include perl, Python, PHP, shell scripts and batch files, tcl and many others.
- As we climb even higher the distinction becomes slightly different. It is still portability, but now that we have escaped the need for machine-specific compilation the portability which becomes important is portability to new domains. Most languages below this point have been designed as ways of expressing concepts from mathematics and computer science. This is so common it’s often assumed and rarely questioned. Some early languages (COBOL, LISP, APL and Prolog spring to mind) were created for specific, slightly different contexts, but most modern languages still require every concept from the problem domain to be mentally translated (“compiled”, if you will) into the language of maths and computer science before it can be implemented. Luckily, some languages are beginning to break out of this trap. The most well-known of these is probably Ruby, famed for its use of “internal DSLs” which allow (within certain key constraints) the structure and syntax of the language to adapt to that of the business need.
- Above Ruby there is certainly room for languages with even more flexible syntax which might be able to adapt even more to the concepts and language of the domain, but these are hard to find, and experimental at best.
Although I say that level 6 languages are rare, I could probably plausibly claim that this is where FORTH should sit. The central concept of FORTH is that it has hardly any rules of its own, but instead grows a syntax and vocabulary in which important things in the problem domain can be expressed succinctly, naturally and unambiguously. It sounds like a research pipe dream, yet this is a language which has been in productive use since around 1970. However, all the implementations of FORTH and FORTH-like languages I have encountered so far have tended to miss this vital point. Instead of emphasising the minimal nature of the language and its flexibility to adapt and mimic wildly different domains, most vendors have instead trudged along the well-worn path of emulating the same mathematical and computer science concepts as the languages at levels 3 and 4. This has not worked very well as a strategy – the hand-crafted, specific syntax offered by compiled languages is almost always better at this domain.
The one shining example of a FORTH-like language really making the most of its flexibility and mimicry has done it so well that most users don’t even think they are dealing with a programming language at all. I’m talking of Postscript (and by inference PDF, which is a kind of semi-compiled version of the same idea.) Postscript can be thought of as a relative of FORTH which has adapted so well to the domain of document mark-up and production that programs are passed around and treated as documents themselves.
What have I learned by pondering all this? I want to see how well I can address the ELIUS system language at the truly flexible level 6 niche. If possible this should allow for the same elemental concept to be used for everything from low-level hardware control and operating system internals to clearly expressing business concepts and processes in applications. This part is beginning to seem even more of a challenge than writing the operating system itself!
Pingback: More language thoughts | Raspberry Alpha Omega
Pingback: Frank Carver's Punch Barrel / Functional Testing, BDD and domain-specific languages