The Abstract Machine
by © Istvan Simon Department of Mathematics and Computer Science California State University, Hayward Hayward,CA 94542 - 3092 In this section we describe an ideal machine for a complex block structured language like Pascal. The machine we describe is ideal in several ways:
- It's architecture is well suited to the language, so that the compiler can generate code for it easily;
- It's instruction set and instruction formats are well suited to the programming language;
- The names of operations in its instruction set are easily remembered because they correspond to language structures;
- The semantics of the language is easily understood in terms of the instructions of the machine. This will permit us to describe a set of code generation rules which will permit us to write a correct code generator for the language, that can be proved to generate correct code for the language.
The Machine Architecture
As we have seen in the last chapter, recursive functions and procedures coupled with the block structure of a language like Pascal require variables and parameters to be allocated on the run-time stack in data structures called activation records. Our ideal machine's architecture reflects this situation so that it will facilitate the generation of code for the language. Like all machines, our ideal machine has a memory that consists of a large array of words. Each word in memory can store an integer value. For simplicity, we will assume that the ideal machine's instructions are coded in variable-length instructions of one, two, three or more words. Though we will describe the instructions in an ideal assembly language in which the names and parameters of an instruction remind us of their meaning, the machine language will consist of (n+1) words, where n is the number of parameters of the instruction. The first word of the instruction is an integer, the opcode of the instruction. The remaining n words are integers corresponding to the n parameters of the instruction.
The memory is subdivided always in three sections:
- Program code
- Run-time stack
- Unused area between the top of the run-time stack and the end of the memory array.
When the stack grows during execution of a program, the top of the stack increases, and the stack grows towards the end of the memory array, while the unused area correspondingly shrinks. Conversely, when something is popped from the stack, it shrinks, the top of the stack decreases, and the unused area correspondingly grows. Our ideal machine also has three machine registers:
- pc, is the program counter. It generally points to the next instruction to be executed in the program's code;
- arp, is the activation record pointer. During execution it always points to the base of the top activation record on the stack;
- top, always points to the the first free word in the unused area of memory. The top word on the run-time stack is always the word stored in memory position (top - 1).
Finally, we remark that all instructions manipulate the run-time stack. They find their arguments' values on the stack and replace them with their results on the stack. Variable Addressing
Because of recursive procedures and functions a variable may have multiple incarnations on the run-time stack. Because of the block structure of Pascal, the code of a block may have access to the variables and parameters of the currently executing block, or of any of the surrounding blocks. The activation records where these variables are located on the stack depend only on the sequence of activations that happened during execution, and are therefore unpredictable at compile-time. The result of all this is that the activation record of a surrounding block can be located anywhere below the activation record of the inner block on the run-time stack. This complication requires us to have to compute the address of a variable at run time. The compiler assigns a fixed relative address in the activation record of a block to each variable and parameter defined in that block. To compute the address of a variable or parameter, defined in a block X which is l levels away surrounding the current block Y, we must first compute the base address of the activation record of the most recent activation of X. We can do this by going down the static chain l times. Once we find the correct activation record of X by this process, we can determine the address of the desired variable or parameter by adding its relative address (called its displacement ) to the base address of X we just determined.
The Instruction Set
Simple assignment statements
We start by defining four instructions with which we can generate code for simple assignment statements. - Variable(Level,Displacemnent) Variable addressing computation: This instruction computes the absolute address of a variable or parameter defined in a surrounding block Level levels away and whose displacement within the activation record is Displacement and pushes this address on the run-time stack.
- Value(Length) Often we need not the address of a variable but its value. This instruction replaces the address of a variable it finds on the top of the stack by the value stored at that address. Because Pascal can have assignment statements for an arbitrarily complex type, values may be multiword values. Consequently, we define the instruction with a parameter Length which specifies the number of words in the multiword value.
- Constant(Value) Pushes the integer value Value on the run-time stack.
- Assign(Length) This instruction expects an absolute memory address and a multiword value of Length words on the top of the stack. It pops the address and the value and stores the value at the specified address.
Examples of code
Assume that we have a block X with variables x, y inside a block Y with variables u, v, w inside of the main block, which has variables a, b, and c.
Assume that all variables are integers, so that the displacements of these variables are respectively 3, 4, 3, 4, 5, 3, 4, 5. Assume further that the code
below is in block X. a := v; b := 10; x := c;
Under these assumptions the three assignments statements would generate the code:
Variable(2,3) pushes the address of a on the stack ^a
Variable(1,4) pushes the address of v on the stack ^a ^v
Value(1) replaces the address of v by v ^a v
Assign(1) stores the value v at the address of a
and pops ^a and v
Variable(2,4) pushes ^b ^b
Constant(10) pushes 10 ^b 10
Assign(1) stores 10 at ^b and pops both
Variable(0,3) Pushes ^x ^x
Variable(2,5) Pushes ^c ^x ^c
Value(1) Replaces ^c by c ^x c
Assign(1) Stores c at ^x and pops both.
Note how the stack is used for temporaries. Under the same assumptions the assignment statement w := c inside of Y would generate
Variable(0,5) Pushes ^w ^w
Variable(1,5) Pushes ^c ^w ^c
Value(1) Replaces ^c by c ^w c
Assign(1) Stores c at ^w and pops both
Selecting elements of arrays and structures
- Index(low, high, length, line) This instruction expects the base address of a variable of type array and an index value on the top of the stack, and replaces that address and index value by the address of the indexed element of the array. The instruction first checks if the index value is within the specified range of index values. If it is not, it reports that an indexing run-time error occurred on line number line and aborts execution of the program. length is the number of words allocated for each element of the array.
- Field(displacement) This instruction expects the base address of a record on the top of the stack, and replaces that address by the address of the field of the record whose displacement is displacement.
Evaluating Expressions
The ideal machine has an instruction for each operator of the language. For each such instruction it expects the appropriate number of operands on the top of the stack. The instruction pops the operands, performs the operation, and pushes the result back on the stack. A few examples follow for the operators of Pascal. This makes evaluating expresssions, subexpressions, etc. simple: the compiler essentially generates the reverse polish sequence of instructions for the expression. (See examples below.) All of these instructions have zero parameters, since they expect their operands to be on the stack. - Add Pops two integers, adds them, pushes the result on the stack.
- Subtract Pops two integers, subtracts the first popped from the second, and pushes the result on the stack.
- Multiply Pops two integers, multiplies them, pushes the result on the stack.
- Divide Pops two integers, divides the second popped by the first, and pushes the result back on the stack.
- Mod Pops two integers, computes the remainder of the division of the second popped by the first, and pushes the remainder on the stack.
- Minus This corresponds to the unary minus operator. It pops one integer, and pushes its value with the opposite sign n the stack.
- And Pops two boolean values, computes their short-circuit evaluation of and , and pushes the result back on the stack.
- Or Pops two boolean values, computes the short circuit evaluation of their or , and pushes the result back on the stack.
- Not This corresponds to the logic not operator. It pops a boolean value, computes its logic complement, and pushes it back on the stack.
- Less Pops two values, if the second popped is smaller than the first it pushes 1 on the stack; otherwise it pushes 0 on the stack.
- All six relational operators are treated similarly to Less.
Example
These instructions allow us to generate code for more complex assignment statements. Assume the same setup as before, but make y and c arrays of 10 integers each. Then the following assignment statement in block X, in line number 37 of tyhe source program would generate the code below. c[x] := y[5] + (u+v) mod 17 <- Line No. 37
Variable(2,5) ^c
Variable(0,3) ^c ^x
Value(1) ^c x
Index(1,10,1, 37) ^(c[x])
Variable(0,5) ^(c[x]) ^y
Constant(5) ^(c[x]) ^y 5
Index(1,10,1,37) ^(c[x]) ^y[5]
Value(1) ^(c[x]) y[5]
Variable(1,3) ^(c[x]) y[5] ^u
Value(1) ^(c[x]) y[5] u
Variable(1,4) ^(c[x]) y[5] u ^v
Value(1) ^(c[x]) y[5] u v
Add ^(c[x]) y[5] (u + v)
Constant(17) ^(c[x]) y[5] (u + v) 17
Mod ^(c[x]) y[5] ((u + v) mod 17)
Add ^( c[x]) (y [5] + (u + v) mod 17)
Assign(1)
Input/Output
The standard procedures read and write correspond to two other instructions of the abstract machine. - Read Expects an address on the top of the stack. It pops the address, reads an integer, and stores its value at the address.
- Write Expects an integer value on the stack. Pops the value and outputs it.
Jumps
There are two jump instructions. - Jump(Address) Jumps to the given address in the program code.
- JumpIfFalse(Address) Expects a boolean value on the top of the stack. Pops the value and jumps to the given address if it is false.
Activating Procedures
When a procedure is activated, before the code of its body can be executed, its activation record must be pushed on the stack. The activation record, on the other hand, consists of the parameter section, the context part, and space for the local variables. Thus three things must happen: - The parameters must be passed. Pascal can pass variables by value or by reference. If the parameter is passed by value, the actual parameter is an expression, its value must be evaluated and placed in the variable allocated for the formal parameter. If the variable is passed by reference, space is allocated for the formal parameter to contain an address, the actual parameter is a variable, and its address is passed to the procedure.
- The context part of the procedure's activation record must be constructed;
- Space needs to be allocated to the local variables of the procedure, and the code needs to jump to the beginning of the code for the block.
Code will be generated to pass the parameters. This code constructs the parameter section of the activation record of the called procedure. Next we
will generate a ProcedureCall(Level,Address) instruction. This will construct the context part of the activation record, and jump to the given address.
That address will contain a Procedure(VarLength, BeginAddress) instruction, which will allocate space for the local variables of the procedure, and jump to the code of the body of the procedure. When the procedure ends, the activation record of the procedure that just returned must be popped from the stack, the context for the calling code must be restored and we must jump back to the return address stored in the context
part of the returning procedure. This is done by the EndProc(ParameterLength) instruction. ParameterLength is the number of words of the parameter section of the activation record of the returning procedure.
Activating the main block
The activation of the main block is very similar to a procedure call. The only difference is that there are no parameters to pass, the static chain and the dynamic chain both terminate with the static link and the dynamic link of main, (both null), and there is no return address. Space must be allocated to the local variables of the main block. A Program(VarLength, BeginAddress) instruction, analogous to the Procedure instruction above, reserves space for the local variables, initializes the context part to three zeroes, and jumps to the given address, the beginning of the code for the main block . The EndProgram instruction halts the excution of the abstract machine. We terminate this Chapter by hand generating code for a simple program.
An Example of Code Generation for a Complete Program
1 program P; 0 Program( 2,46) 0 0 0 - -
2 var x,y:integer;
3 procedure xch( 3 Procedure(1,6) ^u ^v sl dl ra -
4 var u:integer;
5 var v:integer );
6 var t:integer;
7 begin 6 Variable(0,3) ^t
8 t:=u; 9 Variable(0,-2) ^t ^^u
9 u:=v; 12 Value(1) ^t ^u
10 v:=t 14 Value(1) ^t u
11 end; 16 Assign(1)
12 begin 18 Variable(0,-2) ^^u
13 x:=2; y:=3; 21 Value(1) ^u
14 xch(x,y); 23 Variable(0,-1) ^u ^^v
15 write x; 26 Value(1) ^u ^v
16 write y 28 Value(1) ^u v
17 end . 30 Assign(1)
32 Variable(0,-1) ^^v
35 Value(1) ^v
37 Variable(0,3) ^v ^t
40 Value(1) ^v t
42 Assign(1)
44 EndProc(2)
46 Variable(0,3) ^x
49 Constant(2) ^x 2
51 Assign(1)
53 Variable(0,4) ^y
56 Constant(3) ^y 3
58 Assign(1)
60 Variable(0,3) ^x
63 Variable(0,4) ^x ^y
66 ProcedureCall(0,3) ^x ^y sl dl 69
69 Variable(0,3) ^x
72 Value(1) x
74 Write
75 Variable(0,4) ^y
78 Value(1) y
80 Write
81 EndProg
Copyright © 1997, Istvan Simon. All rights reserved