More language thoughts

Following my post a few days ago about high and low level languages, I received an interesting email from Paul Hammant with some astute comments. After thinking about this for a bit I'll have another go.

Note that this is a follow-up, and probably makes a bit more sense if you read the original post first.

I'll admit I got caught up in the idea of slotting languages into arbitrary levels, and did not make my basic point as clear as I might. I also omitted mention of the once-popular 1, 2, 3 and 4GL method of classification. I also didn't mention any of the very interesting work being done in the area of Behaviour-Driven Design (BDD).

My point was really about the trade-off in languages between the (at one end) the theory and practice of how computers work, and (at the other end) the needs of particular applications and domains. Particular languages don't often sit neatly at a spot on this continuum, but they do exhibit clusters of characteristics which could lead to a sort of characterisation. Following my original post:

Machine code is very tied down. It runs at one location in the memory of one particular system, it expresses hardly anything of the domain, and typically requires several stages of translation to create it from something more meaningful.

Assembly code is less tied to specific memory, but is tied to particular hardware and architecture. Even though it typically requires one less conversion step than pure machine code, it still expresses hardly anything about the domain, typically just a few memory labels.

Most compiled languages, although a bit more portable across hardware, are still very rigid in their syntax. Unlike machine code and assembly language, which embody a model of hardware registers and memory, languages in these groups tend to embody a model of computer science and mathematical concepts - they map closely to algebraic expressions, flow charts, object oriented models and the like. Although the provide a bit more tailoring to domain concepts and language, this is very much a "second-class" option. Syntax for method definitions and flow control is considered "part of the language", named types, variables and functions are "part of the program" and have to fit in with the syntactical requirements of the theory.

A second group of compiled languages, and a group which is becoming more popular, is languages which compile, but to an abstract and virtual rather than specific and physical machine architecture. This has the advantage that a single compilation step is all that is required to target multiple different deployment machines. Traditional compiled languages in the previous group would need a separate compilation step for each target.

Some languages push beyond the general class of compiled languages by allowing various kinds of extensions to the core language. Although it's not a requirement, many of this group are interpreted rather than compiled, as the additional flexibility can be tough to "bake in" to a compiled program. Interpreted languages can also help to remove yet another extra translation step, that of the build and compilation process used to make an eventual executable. An example of this kind of language is Ruby, which is relatively loose about its syntax, and so allows developers to build programs which look a bit more like domain ideas and bit less like computer science theory. Even languages in this group are still bound by the same theoretical concepts as the rest of the compiled languages, though. A variable is a variable and a loop is a loop however you spell it.

My interest is in what comes beyond the rigid syntax group. Some languages have hardly any syntax of their own, instead, the task of programming is more akin to building a new language in which the concepts, values and processes of the application can be described clearly and meaningfully. The key point of this group is that these domain languages are not fixed, but vary as the needs of the application vary -changing the language is as much a development task as naming a variable. Languages fully in this group are very rare, although some languages have aspects of this approach. Lisp, for example has a syntax consisting of just brackets, spaces and quotation marks, and FORTH has conceptually no syntax at all, although pretty much every implementation I have seen misses this point and instead tries to standardize and document "the" language.

The importance of the BDD approach to this area of study is that BDD practitioners seem much more willing to start from domain concepts and language, and try to derive or grow a formalization, rather than start with a formalization of another domain and knock it vaguely into shape.

To recap, what is currently thought of as "computer programming" or "software development" typically consists of a sequence of conversion steps, many of them manual and unquestioned. These steps include: Fomalizing casual domain language into a set of defined terms, similarly formalizing tacit, casual and implicit processes into defined procedures, translating the formal terms into the name syntax of a language, translating formalized procedures into computer science abstractions, translating computer science abstractions into an implementation language, translating this "source code" into one or more compiled forms, building compiled objects and data into deployable artefacts, installing artefacts on target machines, loading compiled code intro an executable memory image, etc. Particular development processes may include even more steps, translations, conversions, relocations and so on.

In my mind the further up this cascade of conversions we can climb, the better. The closer a development language can approach the natural, casual form of the domain, the more sense the "program" will make in domain terms.

Raspberry Alpha Omega

Raspberry Pi from start to finish

More language thoughts