LLVM is a tool kit used to build an optimized compiler. Building a programming language from scratch is hard. You have humans who want to write code in nice simple syntax than a machine that needs to run it on all sorts of architectures. LLVM standardized the extremely complex process of turning source code into machine code. It was created in 2003 by grad student Chris Lattner at the University of Illinois. And today it’s the magic behind clang for C and C++ as well as languages like rust, swift, Julia, and many more. Most importantly it represents high-level source code in a language-agnostic code called intermediate or representation IR. This means vastly different languages like Cuda and Rube produce the same IR allowing them to share tools for analysis and optimization. Before they are converted to machine code for a specific chip architecture.

How does the compiler work in LLVM?

A compiler can be broken down into three parts the front end purses the source code text and converts it into IR. The middle-end analyses and optimizes this generated code. And the final backend covers the IR into native machine code to build your programming language from scratch.

Start writing code

Install LLVM the create a CPP file. Now envision the programming language syntax of your dream to make that high-level code work. You’ll first need to write a lexer to scan the raw source code. It is then subsequently divided into a number of tokens, such as literals, identifiers, keyword operators, and so forth. Afterwards, you only need to define an abstract syntax tree to represent the actual structure of the code. And how different tokens relate to each other. Which is eventually accomplished by giving each node its class. Third, we need a parser to loop over each token and build out the abstract syntax tree. This is the main hard part that is over.

Import files in LLVM

Now we can import a bunch of LLVM primitives to generate the intermediate representation. Each type in the abstract syntax tree is given a method called cogen which always returns an LLVM value object. It is used to represent a single assignment register which is a variable for the compiler that can only be assigned once.

The fascinating aspect of primitives is their lack of dependence on any one machine architecture, in contrast to assembly. Which automatically makes things simpler for developers. Who no longer need to match the output to a processor’s instruction. Concurrently the front end can generate IR the OPT tool is used to analyze and optimize the generated code. It makes multiple passes over the IR and also with things like dead code elimination and scalar replacement of aggregate. This gets us to the final step, the back end, where we create a module that accepts IR as an input and outputs object code that can be executed on any architecture.

Now finally you are all set to create with your own language and compiler using the LLVM tool kit.


So was this blog helpful enough for you to know LLVM better? We tried to introduce LLVM and working model of it. The modern-day features and performance improvements make it more appealing. Tell us in the comments what are your thoughts.

By Tanmay

Leave a Reply

Your email address will not be published. Required fields are marked *