Basics of a FORTH style systen

I have been reading about the early days of FORTH recently, and learned a lot about the ideas and philosophy behind the language and the system. In the past, when I had tried to create my own FORTH-style languages I think I was missing something important. I had been confusing the "language" of available FORTH implementations with the core concepts of FORTH itself.

Anyone who knows even a little about FORTH has seen word definitions like

: COUNT 10 0 DO ." hello " LOOP ;

and its natural to assume that this kind of construction is part of FORTH. What we see of FORTH is the collection of words which appear in all the tutorials: control flow words such as IF, ELSE, THEN, DO, LOOP, BEGIN, AGAIN, stack words such as DROP, SWAP, DUP, NIP, TUCK, ROLL, >R, maths words such +, -, *, /, MOD, I/O words such as . and ." and so on. The current ANSI standard for FORTH has over 170 core words and a whole load more are defined but optional.

Almost all of these words have their equivalents in other languages; function definition, control flow, and arithmetic are almost universally "baked in" to the syntax of the language itself, and most languages include some sort of data manipulation (arrays, structures, records, lists, classes, objects and the like) and console i/o as standard, so it's not unexpected to encounter them here.

What I have come to realise, though, is that while this represents a particular evolutionary path of the "problem oriented language" (POL) concept, it also misses the point. In effect it is using the flexibility of POL to build a language tailored to the field of computer science as expressed in other languages, rather than starting from basics and building the most appropriate language for any particular application. None of the hundreds of "standard" ANSI FORTH words say anything about any real application.

This begins to explain why I have found it so hard to be satisfied with my own FORTH-like languages in the past - I was aiming at the wrong target. My experience of programming had led me to the same assumptions as so many other FORTH developers, and wandered away from Charles Moore's original POL concept. Rather than assuming that the implementation is somehow only complete when it has all (or even most) of the words in a standards document, now I am much more focussed on making the smallest possible language that I can.

I guess the idea of "smallest possible language" needs a bit of clarification. Let's start with an analogy. When I started with bare-metal programming on my Raspberry Pi I began by following Alex Chadwick's "baking pi" tutorials. However, I soon found it cumbersome working with raw assembly language, and wanted to drag myself up to a "higher-level" language where I could concentrate more on what I wanted to do, and less on how to fiddle with registers and memory. I wanted the part of the system in assembler language to be as small as possible because it was so constraining. I quickly moved to C, and even though I ended up coding many of the same things that Alex does, I wrote them in C rather than in assembler.

This is what I want to be able to do with the POL language I am creating for CORNELIUS, to "climb up" as soon as possible from writing the language, to writing in the language. This means taking an even harder look at my assumptions of what is part of the language and what is part of the application.

Charles Moore suggested back in 1970 that all a POL really language needs is somewhere to store code and data (a dictionary), a way of passing information between bits of code (a stack), a way of invoking bits of code from other code (an evaluation function) and a way of knowing where to come back to when it's done (a return stack). A few primitive operations to read and write memory, and a few global variables so that code can find the stacks and dictionary, and that's it done.

The above paragraph is worth reading twice. That tiny set of things really is the complete set. Everything else (a way to define words, arithmetic, control flow, I/O, data structures) are all considered as part of the next layer up. They are nice to have, and you may find you have favourites which you use many times, but they are optional, and most importantly they can be written in the language itself. This is a staggering insight. A true POL has no syntax of its own. Everything is aimed at getting, as directly and simply as possible, to the language of the problem domain.

I have been tinkering with just such a language for the last few weeks which I am currently calling e13, you can see the progress of the code at github. I'm now thinking that even the small amount I have coded may be too much. I have written things which initially seemed vital (a way of parsing literal numbers, a read-evaluate loop for user input, and, so on) but now I am challenging myself to see how far I can push the concept. How much of these can I write in the language itself?

Leave a Reply

Your email address will not be published. Required fields are marked *