Original article appeared in Fourth Dimensions Volume V, Issue 2
Other articles in this series: Laxen meta-compiling one .. Laxen meta-compiling three
Meta Compiling II
Henry Laxen
In Volume IV, number 6 of Forth Dimensions we took a look at one of the underlying foundations of the meta-compiling process, namely that of mapping the address space of the target system into the address space of the host system. If you don’t understand the above sentence I suggest you reread the last article on meta-compiling. This article will use the technique of address mapping discussed last time to implement the “guts” of a meta-compiler. I must warn you that I am still leaving out a lot of details, which I will try to cover in the next article. This article will illustrate how to meta-compile code definitions and simple colon definitions.
First, let’s take a look at how we can meta-compile code definitions. What we need is an assembler which generates its code in the target system. We discussed that briefly last time, and showed how to define a JMP
instruction so that its opcode and address would be assembled in the target system rather than in the host. But that is only half of the battle. When we define a code word, we are defining a name in the target system, which must appear in the target dictionary. Also, if a future colon definition references this code word, we want to compile the code field address of this word in the target dictionary. Let’s take this one step at a time. First we must define a name in the target dictionary. There is no magic in this; it depends heavily on the structure you decide on for names. The only thing you have to be careful about is to make sure the bytes are place in the target system address space, and that you keep track of the link fields somehow. Figure One illustrates a word that will compile FIG-like headers into a target system. Notice that if we want to compile headerless code all we need to do is set WIDTH-T
to zero. Headerless code is one of the fringe benefits of meta-compiling, not its main purpose.
[Figure One] \ Fig 1. Headers in Target System 13MAY83HHL : S,-T (S addr len -- ) 0 ?DO DUP C@ C,-T 1+ LOOP DROP ; VARIABLE WIDTH-T VARIABLE LATEST-T : HEADER (S -- ) BL WORD C@ 1+ WIDTH-T @ MIN ?DUP IF HERE-T HERE ROT S,-T ( Lay down name, save NFA ) LATEST-T @ ,-T ( Link Field) DUP LATEST-T ! 128 SWAP THERE SET 128 HERE-T 1- THERE SET ( BUG! ) ( Set the high order bits at each end of the name ) THEN ;
[Bug: second set sets high order bit in link field!]
Now that we can lay down headers in the target system, we need to think about what happens when a word defined in the target system is supposed to be compiled in a colon definition. For example, suppose we have defined DUP
and +
as code words, and have meta-compiled them so that their headers and their code bodies are resident in the target system. Now we want to define the word 2*
as follows:
: 2* DUP + ;
What is supposed to happen? Well, we need to make the meta-compiler do the same thing that would ordinarily happen in a running system; namely, :
should create a name in the dictionary and should compile the code field addresses of DUP
and +
and EXIT
into the parameter field of 2*
. Finally, compiling should be terminated by the ;
. Notice that the behavior of DUP
and +
in the meta :
context is totally different from their behavior in the normal Forth context, namely DUP
should duplicate something and +
should add something. In the meta :
context they compile something.
Now that we know what should happen inside a :
definition, it is just a matter of implementation. The approach at this point is to construct a symbol table of all the words that are defined, and when building a :
definition, to look up each word in this symbol table and compile the code field address corresponding to this word into the target system. In Pascal or BASIC this would probably require twenty pages of code. The skeleton Forth code that does this is written in Figure Two. It probably requires a few words of explanation. First, >IN
is the pointer into the input stream, which must be saved and restored because both HEADER
and CREATE
modify it. IN-SYMBOLS
isn’t defined elsewhere, but I will describe its function. It is highly dependent on how your system implements vocabularies. It must make sure the word that CREATE
creates is in a sealed and separate vocabulary. This is very important, since almost all of the Forth nucleus words will be defined during the meta-compiling process, and it would be deadly to redefine them all in the Forth vocabulary. Also, it guarantees that when a symbol is looked up, the meta one is found, not the corresponding Forth one. The idea is to use the existing vocabulary structure to implement a symbol table by placing all of the meta names into it and sealing it so it does not chain to any other vocabularies. If we adopt such a structure, the regular CREATE
can be used to enter a symbol into the symbol table and the regular FIND
can be used to search it. Never buy what you can steal! Similarly the IN-META
restores things back to the way they were.
[Figure Two] \ Fig 2. Create a Target Image and Symbol 13MAY83HHL : MAKE-CODE (S addr -- ) @ ,-T ; : TARGET-CREATE (S -- ) >IN @ HEADER >IN ! ( Without moving input stream ) IN-SYMBOLS CREATE IN-META HERE-T , ( Save cfa ) DOES> MAKE-CODE ; : CODE (S -- ) TARGET-CREATE HERE-T 2+ ,-T ASSEMBLER ;
Now let’s look at the next phrase, namely HERE-T
, and figure out what it is doing. It is saving the current address in target system into the parameter field of the symbol that was just created. Since the header has already been created, HERE-T
is the code field address of the word that has just been created. So by saving it in the parameter field of the symbol, we are remembering the code field address for future reference. This future reference takes place in the run-time portion of the definition. MAKE-CODE
does nothing more than fetch the code field we just saved and compile it into the target system. Thus, if we use the meta definition of CODE
that appears, we see that what it does is create HEADER
in the target system, as well as a symbol in the symbol table. Furthermore, it sets up the code field in the target system to point at its parameter field, just as every good code field should in an ITC system. Finally, it switches to the ASSEMBLER
vocabulary to allow compilation of machine-language opcodes. Later, if the word defined by CODE
is executed, the DOES>
portion comes into play, which simply compiles itself into the target system. Nothing could be simpler or more devious!
Now let’s finish this discussion by looking at what must take place when we meta-compile a :
definition. Take a look at the code in Figure Three. First we must create a header and a symbol, just as we did for CODE
words. Next we must lay down the address of the runtime for :
in the code field of the word being defined. (I have assumed that NEST
is a constant that returns a that address for me.) Finally, we must enter a loop that looks words up in the symbol table and compiles their code fields into the target system. That function is performed by the meta version of ]
. The function of compiling the code fields is cleverly performed by executing the words that are found in the symbol table. That is why the DOES>
portion of TARGET-CREATE
compiles the code field that was saved. I have not provided code for the definition of DEFINED
and NUMBER-T
but I leave to your imagination what it is that they do.
[Figure Three] \ Fig 3. High Level Meta Definitions 13MAY83HHL : ] (S -- ) BEGIN DEFINED IF EXECUTE ELSE NUMBER-T THEN AGAIN ; : : (S -- ) TARGET-CREATE NEST ,-T ] ;
The code I have presented is very simplistic, and is not really adequate as is. However, it does contain the central ideas that are needed in order to implement a meta-compiler. Next time we will look at some of the subtler issues in meta-compiling such as how to handle IMMEDIATE
words, and what about [COMPILE]
? Until then, good luck, and may the Forth be with you.
Copyright © 1983 by Henry Laxen. All rights reserved.
Other articles in this series: Laxen meta-compiling one .. Laxen meta-compiling three