Complete compiler for the language COOL

    Tags:
  • Compiler
  • C++
  • Coursework
  • COOL
  • Language
  • Object Oriented
  • Lexing
  • Parsing
  • Semantic Analysis
  • Code Generation
  • x86 Assembly

NMT's capstone C.S. course is Compiler Writing. In it, students are formed into teams, and expected to produce a compiler that takes code from input source code to valid assembly. Our class was assigned the input code COOL, and instructed to produce x86-64 assembly code.

Code is first ran through Flex (a lexer) and then Bison (a parser). Each of these tools required configuration for the input language. Special language features, like COOL's nested comments, had to be dealt with carefully to ensure that the lexing and parsing correctly evaluated the input code and preserved the meaning as defined by the language. As tokens were passed to Bison from Flex, we constructed an AST on the input code. The output AST was then fed through to semantic analysis.

To perform semantic analysis, we used a visitor pattern and a series of visitors to perform a variety of passes. This included type checking (with type-inferencing and symbol-table insertions), checking that dispatches were valid (named an actual method, parameters were of right type, etc), and making sure that variables were declared before use. Data obtained during these passes were stored either in the AST, the symbol table, or a special class table that held class-specific information.

Upon passing semantic analysis, all aquired data was passed to code generation. Here, each node in the AST was used to create pieces of a Linear Intermediate Representation (LIR) that more closely represented the assembly we were outputting to. Our LIR was a quadruples LIR, and held instructions very similar to those available in x86-64 assembly. The produced LIR was then passed over to expand inline variables with runtime data locations. Then, the LIR was transformed into output assembly.

Code Generation also houses a small number of optimizations. These optimizations were

  • Constant Propagation and Folding
  • Dead/Unreachable Code Elimination
  • Loop Unswitching
  • Loop Unrolling
  • Strength Reduction

Special care was given to the order in which each optimization was performed. In addition, all optimizations (except Strength reduction) were performed before transforming the AST into an LIR.

One design decision we made as a team was to generate multiple smaller LIRs rather than one big one. Since COOL is made up entirely of methods (including initialization code of classes, if you wrap them into an 'init' method like we did), we decided each method body would be it's own LIR. We used a simple map data structure to map between methods and LIRs. Breaking the LIRs up like this helped make code generation much smoother. Rearranging method code placement was very straightforward. Evaluating an LIR at a procedural level was implicit. Adding regular procedure prefix and postfix code could be done systematically.

Because this project was done as coursework (and a similar assignment will be made in the future), the code is not available as open source. However, a copy is available upon request.