← Back

Writing My Own Programming Language

4 Dec 2025

How Phase Started

Last summer, I was learning C. I liked the control it provided but I kept being frustrated by its lack of clarity; I frequently found myself having to write many lines of code just to accomplish something that was much simpler in Python.

But then I'd switch back to Python for some experiments. The clarity of the syntax was refreshing, but I became frustrated with the lack of control; I missed the certainty of static typing and performance awareness in C.

I was essentially stuck between Python and C, which inspired me to imagine what I really wanted in a language, which led to me create Phase -- a statically-typed bytecode-interpreted programming language that combines the expressiveness of high-level languages (like Python) with the explicitness of lower-level languages (like C).

I chose the name 'Phase' because it represents the shift between language levels, and pretty much every other unique name was already taken.

Why Build From Scratch?

Despite the existence and popularity of tools like LLVM that could've made this a lot easier, every single part of Phase -- from the lexer to the virtual machine -- is fully handwritten by me.

This is because I wanted to understand each and every part of the pipeline, which allowed me to to shape the language exactly how I envisioned it.

The Importance of Diagnostics

Something I put much thought into during Phase's planning was error messages. Since this was, in some way, my ideal language, I realized that it had to solve the issue of vague errors that I experienced in other languages.

So I created a system that's both informative and visually appealing.

Here's an example:

┏ Fatal Error [102]: Expected ')'.
┃ --> ../tests/missing_paren.phase:2:19-19
┃
┃ 2 |     out("a, b, c:"
┃   |                   ^
┃
┣ Help: Add ')' here.
┃ Suggestion:
┃ -     out("a, b, c:"
┃ +     out("a, b, c:")

This message tells us:

  • What went wrong, in just a few words.
  • Where the problem occurred, with visual markers too.
  • Why it's a problem, including context for what was expected.
  • How we could fix it, with a direct fix to our code suggested.

Which is everything we need to solve the issue.

The Interpreter Itself

Phase's interpreter has several stages:

Lexer Parser Type Checker Bytecode Generator VM

Let's go through each stage now, following this basic line of Phase code:

out("Hello world!")

From beginning to end.

1. Lexer

The lexer (short for lexical analyzer) is the very first stage, taking in raw source code -- an arbitrary character string -- and breaking it down into tokens that represent keywords, operators, types, and literals.

Our previous line of code will get tokenized into this form:

OUT
LPAREN
STRING_LIT 'Hello world!'
RPAREN
NEWLINE

These tokens are now organised and separated -- much easier to process than the original line of code.

The lexer also handles details like skipping whitespace, distinguishing between keywords and identifiers, and handling string literals with escape sequences.

In concept, it's simple, which is what I thought until I realized it was actually one of the most tedious and boring aspects of the whole project to implement correctly.

2. Parser

Next, the parser takes the stream of tokens the lexer produced and constructs an Abstract Syntax Tree (AST) -- a hierarchy of the structure of the program.

I specifically implemented a recursive-descent parser, wherein each grammar rule is designated to a specific function. The parser starts at the top (program) level, and goes through each sublevel until there's none left.

So parsing our tokens gives us these nodes:

STATEMENT (OUT)
        ╰ EXPRESSION (STRING) ["Hello world!"]

Our list of tokens from before is now an organized linear structure that can be easily followed.

The parser is also where syntax errors are caught and raised. For example, if you forget a closing parenthesis in your code, the parser knows this based on the previous tokens it encounters.

3. Type Checker

This is where Phase's static typing is enforced. The type checker walks along the AST and verifies that all operations are correct, so you can't mismatch variable types in assignment or arithmetic.

Semantics are different from syntax: think of it like arranging words in a sentence (syntax) versus what the sentence actually means (semantics).

To demonstrate, this line of code is syntactically correct:

let x: int = "Hi"

But we still get an error:

┏ Fatal Error [108]: Type mismatch.
┃ --> ../tests/type_mismatch.phase:3:5-14
┃
┃ 3 |     let x: int = "Hi"
┃   |     ^^^^^^^^^^
┃
┣ Help: Variable 'x' expects int but got str.
┃ Suggestion:
┃ -     let x: int = "Hi"
┃ +     let x: int = 0

Because the code is semantically wrong, due to a type mismatch. However, our 'hello world' code is a statement and accepts any expression as an argument, so the type checker emits it as is.

4. Bytecode Generator

The type-checked AST is now taken by the bytecode generator and compiled into bytecode: a custom, Assembly-like instruction set that's much simpler to execute than the AST itself.

For Phase, I specifically implemented a stack-based architecture, meaning that operations push and pop values from a storage 'stack' -- in the order of Last-In, First-Out (LIFO).

Our code's AST now compiles into this hexadecimal bytecode:

00 00 00
01
18

Which represents these opcodes:

OP_PUSH_CONST 0   ; Push 'Hello world!' onto the stack
OP_PRINT          ; Print it
OP_HALT           ; Stop the program

I designed Phase's instruction set with about 25 opcodes currently implemented. Bytecode generation was surprisingly interesting, and, in fact, it was one of my favourite aspects of creating Phase due to the total design control it provided.

5. Virtual Machine

The pipeline ends with the virtual machine, which directly executes the bytecode. It maintains:

  • An instruction pointer tracking which instruction to execute next.
  • A stack for temporary values and computations.
  • A global environment for holding variables.

The VM functions like a very simple CPU running a fetch-decode-execute cycle: it reads an instruction, decides what operation to perform, runs it, and moves onto the next instruction.

So, we finally produce an output from our code:

Hello world!

Creating the VM was a great learning experience for interpreter design, and it was also quite satisfying to program the actual outputs for my source code after getting only debug info so far.

Design Decisions

I had to make a lot of tradeoffs when planning and creating Phase. Let's go through the most important ones I made:

Interpreter vs Transpiler

Originally, Phase was meant to be a transpiler that converted source code to C code, which would then be built and executed. That was actually the first functioning implementation of Phase that I wrote.

However, as I tested my first Phase programs, I realised that the build-run pipeline was very tedious, as I had to compile code twice in a row and then execute it. Even though I felt that a transpiler would demonstrate more low-level knowledge, I decided to convert Phase to an interpreter by switching out the backend for a bytecode generator and a VM, while keeping the same lexer and parser.

Static Typing vs Dynamic Typing

Static typing adds complexity with type checking and more sophisticated error handling, but I felt that the benefits outweighed dynamic typing, especially with bug catching.

I originally implemented C-style variable type declarations because they were simple and explicit. However, after some feedback from a person online, I replaced this with Rust-style declarations to better support the updates that came soon after.

Stack-Based VM vs Register-Based VM

I chose a stack-based architecture over register-based because it's much simpler to implement, but it's also slower because of more stack operations.

For Phase, I prioritized modularity over performance, and besides, there wouldn't be any noticeable difference between the two architectures for a language this small.

Python vs C

I created the first prototype of Phase (called Luma at the time) in Python since I knew it much better than C, and got it working in a few days with extremely basic features.

However, writing it in Python made me feel as if I were cheating myself in a way, so I decided to challenge myself by writing it fully in C, which greatly accelerated my learning by forcing me to confront concepts I was unfamiliar with at the time.

What's Next

I consider Phase to be finished, with enough features to write a variety of small practical programs. For example, here is a fibonacci sequence program:

func fibonacci(n: int): void {
    let (a, b): int = (0, 1)
    let (next, count): int = (b, 1)
    
    while count <= n {
        out(next)
        count += 1
        a = b
        b = next
        next = a + b
    }
}

entry {
    fibonacci(10)
}

But if I were to revisit it in the future, these would be the next things I would work on:

  • An arena memory allocator
  • JIT bytecode compilation
  • Standard input

What I Learned

Creating Phase taught me many things like how source code is converted to executable instructions in interpreters, how to plan and write a large project in a low-level language like C, and how to think in a systems-oriented way so you're conscious of things like design decisions and your program's architecture.

It also demonstrated the effectiveness of project-based learning, which is something I really value; I couldn't have learned all these skills if I didn't actually create a programming language.

You can check out Phase on GitHub.