Original article appeared in Fourth Dimensions Volume V, Issue 2

Other articles in this series: Laxen meta-compiling one .. Laxen meta-compiling three

Meta Compiling II

Henry Laxen

In Volume IV, number 6 of Forth Dimensions we took a look at one of the underlying foundations of the meta-compiling process, namely that of mapping the address space of the target system into the address space of the host system. If you don’t understand the above sentence I suggest you reread the last article on meta-compiling. This article will use the technique of address mapping discussed last time to implement the “guts” of a meta-compiler. I must warn you that I am still leaving out a lot of details, which I will try to cover in the next article. This article will illustrate how to meta-compile code definitions and simple colon definitions.

First, let’s take a look at how we can meta-compile code definitions. What we need is an assembler which generates its code in the target system. We discussed that briefly last time, and showed how to define a JMP instruction so that its opcode and address would be assembled in the target system rather than in the host. But that is only half of the battle. When we define a code word, we are defining a name in the target system, which must appear in the target dictionary. Also, if a future colon definition references this code word, we want to compile the code field address of this word in the target dictionary. Let’s take this one step at a time. First we must define a name in the target dictionary. There is no magic in this; it depends heavily on the structure you decide on for names. The only thing you have to be careful about is to make sure the bytes are place in the target system address space, and that you keep track of the link fields somehow. Figure One illustrates a word that will compile FIG-like headers into a target system. Notice that if we want to compile headerless code all we need to do is set WIDTH-T to zero. Headerless code is one of the fringe benefits of meta-compiling, not its main purpose.

  [Figure One]
  \ Fig 1.   Headers in Target System            13MAY83HHL
  : S,-T (S addr len -- )
     0 ?DO   DUP C@ C,-T    1+   LOOP   DROP ;
  : HEADER (S -- )
     BL WORD   C@ 1+ WIDTH-T @ MIN   ?DUP IF
        HERE-T HERE ROT S,-T   ( Lay down name, save NFA )
        LATEST-T @ ,-T  ( Link Field)   DUP LATEST-T !
        128 SWAP THERE SET   128 HERE-T 1- THERE SET ( BUG! )
        ( Set the high order bits at each end of the name )
     THEN ;
  [Bug: second set sets high order bit in link field!]

Now that we can lay down headers in the target system, we need to think about what happens when a word defined in the target system is supposed to be compiled in a colon definition. For example, suppose we have defined DUP and + as code words, and have meta-compiled them so that their headers and their code bodies are resident in the target system. Now we want to define the word 2* as follows:

  : 2*   DUP + ;

What is supposed to happen? Well, we need to make the meta-compiler do the same thing that would ordinarily happen in a running system; namely, : should create a name in the dictionary and should compile the code field addresses of DUP and + and EXIT into the parameter field of 2*. Finally, compiling should be terminated by the ;. Notice that the behavior of DUP and + in the meta : context is totally different from their behavior in the normal Forth context, namely DUP should duplicate something and + should add something. In the meta : context they compile something.

Now that we know what should happen inside a : definition, it is just a matter of implementation. The approach at this point is to construct a symbol table of all the words that are defined, and when building a : definition, to look up each word in this symbol table and compile the code field address corresponding to this word into the target system. In Pascal or BASIC this would probably require twenty pages of code. The skeleton Forth code that does this is written in Figure Two. It probably requires a few words of explanation. First, >IN is the pointer into the input stream, which must be saved and restored because both HEADER and CREATE modify it. IN-SYMBOLS isn’t defined elsewhere, but I will describe its function. It is highly dependent on how your system implements vocabularies. It must make sure the word that CREATE creates is in a sealed and separate vocabulary. This is very important, since almost all of the Forth nucleus words will be defined during the meta-compiling process, and it would be deadly to redefine them all in the Forth vocabulary. Also, it guarantees that when a symbol is looked up, the meta one is found, not the corresponding Forth one. The idea is to use the existing vocabulary structure to implement a symbol table by placing all of the meta names into it and sealing it so it does not chain to any other vocabularies. If we adopt such a structure, the regular CREATE can be used to enter a symbol into the symbol table and the regular FIND can be used to search it. Never buy what you can steal! Similarly the IN-META restores things back to the way they were.

  [Figure Two]
  \ Fig 2.   Create a Target Image and Symbol    13MAY83HHL
  : MAKE-CODE (S addr -- )
     @ ,-T ;
     >IN @   HEADER   >IN !    ( Without moving input stream )
     IN-SYMBOLS CREATE IN-META   HERE-T , ( Save cfa )
     DOES>   MAKE-CODE ;
  : CODE (S -- )

Now let’s look at the next phrase, namely HERE-T, and figure out what it is doing. It is saving the current address in target system into the parameter field of the symbol that was just created. Since the header has already been created, HERE-T is the code field address of the word that has just been created. So by saving it in the parameter field of the symbol, we are remembering the code field address for future reference. This future reference takes place in the run-time portion of the definition. MAKE-CODE does nothing more than fetch the code field we just saved and compile it into the target system. Thus, if we use the meta definition of CODE that appears, we see that what it does is create HEADER in the target system, as well as a symbol in the symbol table. Furthermore, it sets up the code field in the target system to point at its parameter field, just as every good code field should in an ITC system. Finally, it switches to the ASSEMBLER vocabulary to allow compilation of machine-language opcodes. Later, if the word defined by CODE is executed, the DOES> portion comes into play, which simply compiles itself into the target system. Nothing could be simpler or more devious!

Now let’s finish this discussion by looking at what must take place when we meta-compile a : definition. Take a look at the code in Figure Three. First we must create a header and a symbol, just as we did for CODE words. Next we must lay down the address of the runtime for : in the code field of the word being defined. (I have assumed that NEST is a constant that returns a that address for me.) Finally, we must enter a loop that looks words up in the symbol table and compiles their code fields into the target system. That function is performed by the meta version of ]. The function of compiling the code fields is cleverly performed by executing the words that are found in the symbol table. That is why the DOES> portion of TARGET-CREATE compiles the code field that was saved. I have not provided code for the definition of DEFINED and NUMBER-T but I leave to your imagination what it is that they do.

  [Figure Three]
  \ Fig 3.   High Level Meta Definitions         13MAY83HHL
  : ] (S -- )
     AGAIN ;
  : : (S -- )
    TARGET-CREATE   NEST ,-T   ] ;

The code I have presented is very simplistic, and is not really adequate as is. However, it does contain the central ideas that are needed in order to implement a meta-compiler. Next time we will look at some of the subtler issues in meta-compiling such as how to handle IMMEDIATE words, and what about [COMPILE]? Until then, good luck, and may the Forth be with you.

Copyright © 1983 by Henry Laxen. All rights reserved.

Other articles in this series: Laxen meta-compiling one .. Laxen meta-compiling three