String handling in FORTH and ELIUS

Forth, and similar languages inspired by Charles Moore's approach to software have evolved in a particular direction. Like Donna Noble in The Doctor's Daughter, FORTH is "good with numbers", but traditionally very limited in its string processing abilities.

In Programming A Problem-Oriented-Language (POL), Moore writes:

What can you do with a character string? I've only found 2 uses. They are very similar, but part of the frustration of implementing them is to take advantage of the similarity. You can type a string, or you can move it to a character field.

I may be missing something, but it seems that the computing landscape has changed somewhat since 1970, particularly in the context of communications. A lot of protocols and formats rely heavily on textual representations and intermediate languages. Consider HTML, JSON, email, and so on. It seems to me that a modern interpretation of the POL ideas ought to be able to deal with text in more interesting and useful ways.

So, how might this kind of ability fit with the razor-minimalist approach of POL? The basic operations of reading and writing memory work just as well for character data as they do for numeric or binary data, so once a character string is in memory (e.g. in an input buffer), simple operations such as copying text from one place to another, or looking for a particular character are a given. More complex operations, particularly ones involving multiple strings (for example matching against patterns, splitting and joining, and token substitution) are trickier. For this sort of stuff you can't really leave the data in the input buffer - it needs somewhere to live, and we need a way to find it later.

Following the lead of the data layout in POL, I am toying with the idea of a "string pool". This, like Moore's "dictionary", will be a linked chain of variable length memory blocks. Basic tools will be provided to add some text and create a new entry, and also to lookup an entry. The address of an entry will represent its "handle" which can be pushed and popped onto the stack, stored in simple variables, and used by any other code in the system.

This "string pool" approach works well for text which is essentially immutable, and best of all for text which is re-used in several places in the code. It does not seem to be such a good fit for dynamically changing text, or text used and then thrown away during processing. For this I am considering the idea of one or more "scratchpads", each being a block of memory used for temporary byte storage, and usable by other code once its related processing has finished.

I have not fully implemented this yet, nut the basics of the string pool seem to be working, and have helped to make the dictionary stricture a bit simpler as a side effect.

Raspberry Alpha Omega

Raspberry Pi from start to finish

String handling in FORTH and ELIUS