Converting Atoms to Instructions in Compiler Design



Code generation is a crucial phase in the compilation process, where an intermediate representation of code is transformed into machine-level instructions. One key concept in this phase is atoms, which represent basic operations in an abstract form, independent of the target machine's architecture. A vital step in code generation is converting these atoms into actual machine instructions that the processor can execute.

In this chapter, we will explore the process of translating atoms into instructions, understand how different types of atoms are handled, and examine real-world examples of conditional branching, memory addressing, and control flow management for a clearer understanding.

What are Atoms in Compiler Design?

In compiler design, atoms are the smallest meaningful operations in an intermediate representation (IR). These operations are abstract. This abstract things are not directly tied to any particular machine architecture. Instead, they serve as a bridge between high-level programming languages and the low-level instructions that a CPU understands. Let us see some of the common types of atoms include −

  • Arithmetic operations (ADD, SUB, MUL, DIV)
  • Data movement (MOV, LOAD, STORE)
  • Conditional branching (TST, JMP, BLE)
  • Labels for control flow (LBL)

Each atom must be converted into one or more machine instructions. This is depending on the architecture of the target machine.

Translating an ADD Atom to Instructions

Let us consider the following atom −

(ADD, a, b, T1)

This represents the following operation −

T1 = a + b

If the target machine follows a Load/Store architecture (for example like MIPS), all arithmetic must be performed using registers. The corresponding machine instructions would be −

lw $t1, a   # Load value of 'a' into register $t1  
lw $t2, b   # Load value of 'b' into register $t2  
add $t3, $t1, $t2   # Perform addition: $t3 = $t1 + $t2  
sw $t3, T1   # Store the result in memory location T1  

Here, we first load the values of a and b into registers. Then perform the addition, and then store the result back into memory.

Handling Conditional Branching

This is an interesting idea with branching instructions. Consider the TST (Test) atom. This is which represents a conditional check −

(TST, a, b, , 4, L1)

It means −

If a <= b, jump to label L1

For a Load/Store architecture, this atom is translated into −

lw $t1, a   # Load 'a' into register $t1  
lw $t2, b   # Load 'b' into register $t2  
ble $t1, $t2, 136   # If $t1 <= $t2, branch to memory location 136 (L1)  

This instruction uses special instruction called BLE (branch if less or equal) to perform the conditional jump.

Example: Translating a Java If-Statement

Consider the following Java statement −

if (a > b)  
   a = b * c;

The intermediate representation in atom form −

(TST, a, b,, 4, L1)   # Branch to L1 if a <= b  
(MUL, b, c, T1)       # T1 = b * c  
(MOV, T1,, a)         # a = T1  
(LBL L1)              # Label L1  

Here, we can see the intermediate representation. Now, translating these atoms into MIPS instructions (we are assuming memory locations for variables are predefined) −

100  lw $t1, 200     # Load 'a' from memory (address 200)  
104  lw $t2, 204     # Load 'b' from memory (address 204)  
108  ble $t1, $t2, 136  # Branch to L1 if a <= b  
112  lw $t1, 204     # Load 'b' into register  
116  lw $t2, 208     # Load 'c' into register  
120  mul $t3, $t1, $t2  # Perform multiplication: T1 = b * c  
124  sw $t3, 200     # Store result back in 'a'  
128  LBL L1          # Label L1  

This set of instructions correctly implements conditional check and multiplication operation.

Addressing and Memory Management

While converting the code we must consider the concept of memory management. To convert atoms to instructions is working with memory addressing. Many CPU architectures use base registers with offsets for efficient memory access. Let us see an example.

If the memory location of a variable is 0x1E (hex) and the base register holds 0x10, the offset would be as follows −

Offset = 1E - 10 = 0E

Thus, the effective address would be computed as −

Base Register + Offset = Operand Address  

The compiler must then calculate and encode these offsets correctly before generating machine instructions.

Handling Jumps and Labels

Jump instructions are special. In code generation, control flow changes (like as jumps operations) this requires careful handling. The JMP (Jump) atom is used for unconditional jumps. But the TST (Test) atoms handle conditional branching.

For example, consider a simple while-loop in Java −

while (i <= x) {  
   x = x + 2;  
   i = i * 3;  
}

The atom representation −

(LBL, L1)  
(TST, i, x,, 3, L2)   # If i > x, jump to L2  
(ADD, x, 2, T1)       # T1 = x + 2  
(MOV, T1,, x)         # x = T1  
(MUL, i, 3, T2)       # T2 = i * 3  
(MOV, T2,, i)         # i = T2  
(JMP, L1)             # Repeat loop  
(LBL, L2)             # End loop  

Translation to MIPS Instructions −

100 LBL L1            # Label for loop start  
104 lw $t1, i  
108 lw $t2, x  
112 ble $t1, $t2, 160  # Branch to L2 if i > x  

116 lw $t1, x  
120 li $t2, 2  
124 add $t3, $t1, $t2  # x = x + 2  
128 sw $t3, x  

132 lw $t1, i  
136 li $t2, 3  
140 mul $t3, $t1, $t2  # i = i * 3  
144 sw $t3, i  

148 jmp L1             # Repeat loop  
160 LBL L2             # End loop

It ensures that the loop correctly updates variables and repeats execution until the condition is false.

From the above code generation process, we can see the following points.

  • Arithmetic operations like ADD and MUL are translated into a load-operation-store sequence.
  • Conditional checks use branching instructions to control execution flow.
  • Memory addressing requires computing offsets based on base registers.
  • Jumps and labels must be handled carefully to maintain control flow integrity.

Conclusion

In this chapter, we explained the concept of converting atoms to instructions in compiler design. Atoms act as an intermediate representation, allowing compilers to generate machine-specific code efficiently.

Through examples like ADD, TST, and MOV, we saw how arithmetic operations, conditional branching, and memory addressing are translated into actual machine instructions. The process of atom conversion is useful in code generation, which ensures that high-level language constructs are mapped correctly onto CPU instructions.

With proper handling of memory, registers, and control flow, compilers can generate optimized and efficient machine code for different system architectures.