Raspberry Alpha Omega

Raspberry Pi from start to finish

Literal strings in ELIUS

Jan 30, 2013 - 4 minute read -

As you may recall, if you have read a few of these blogs, ELIUS is the name I am using for the stack-based "problem-oriented language" (POL) which I plan to include as a system language for the bare-metal operating system I am building for the Raspberry Pi.

Charles Moore's POL concept, and languages based on it, tend to have very simple parsers, sometimes as simple as just separating "words" with spaces. One down side of this is that some of the words need to be special, in that they affect or override the parsing process itself. To recognize and activate such words, they must be delimited with spaces, so they can be recognized by the basic parser. This can lead to some slightly odd constructs, for example FORTH's ." hello" which prints, not  hello but just hello without the leading space. the word ." is recognized by the simple parser and executed. It's result is to use a different parsing approach until a " character is found, at which point it switches back to the regular parser.

The effect of such parser choices can also be seen in the way new words are defined. The defining word : hijacks the parsing process and grabs the first following word as the name of the new word, and everything else up to ; as the body of the definition.

This is certainly a powerful technique, but it does mean that strings and string processing do not fit the same stack-based metaphor as used for numeric operations.

After musing about string processing and string pools a few days ago, I realised that it's also important to be able to get string literals into programs. I'm still not entirely comfortable with the syntax I have chosen for this, but it's an important step toward being able to define most of the language in terms of itself. For now, I have chosen to delimit literal strings with [ and ] characters. I have deliberately chosen not to use the more familiar single- or double- quotation marks because they use the same character to both start and end strings. This makes it really difficult to deal with nested string literals - something which can be very important when defining new words.

I am hoping that this choice of a literal text syntax will allow word definitions to be just as stack-oriented as any other operation, without needing to mess with the parsing process. For example [23 45] [ugh] D+ would define a new word ugh which, when executed, acts as if a user has typed 23 45, pushing 23, then 45 on to the data stack.

I'll admit I have not yet coded the D+ word in ELIUS, because that in turn needs words to read and write memory locations and some simple arithmetic.

As it stands at the moment, the first string [23 45] is recognized, it is looked up in the string pool(added if not found), and its address pushed on the stack. The same happens for the second string [ugh]. When I have coded it, the word D+ will then construct a new dictionary entry by writing four 32-bit values to memory: the name from the top of the stack, the address of a general string evaluation function, the code string next on the stack, and a pointer to the previous dictionary entry. All it needs to do then is update the head of the dictionary to point to the new word, and the job is done.

Once the new word is defined, entering ugh will cause the system to look up the word in the string pool, find it, look up its definition in the dictionary, find it, then call its evaluation function passing the address of the text 23 45. This will in turn be executed, resulting in 23 and 45 being pushed to the stack.

The aim is that string literals of this form can be used interchangeably as code or data, so "code which writes code" becomes an obvious possibility.

I'm still not quite at the point where I can actually type in some code and have it do stuff, but it feels very close.

denfinition dictionary ELIUS elius FORTH forth languages literal string


Comments

  1. Philip

    Hi, I just found your site and noted the similarities between ELIUS and a toy language I've been working on. I also ended up using brackets for nested strings, something that I soon learned that Tcl does as well (Tcl uses {} instead of [] though). This makes conditionals, loops and passing bits of code around painless in general.

Leave a Reply

Your email address will not be published.