#Backend

Type-based alias analysis in the Toy Optimizer

Tech Essays Reporter
10 min read

A deep dive into implementing type-based alias analysis in a toy compiler optimizer, exploring how to use type information to improve load-store forwarding and reduce unnecessary memory operations.

home blog microblog favorites pl resources bread recipes rss Type-based alias analysis in the Toy Optimizer February 16, 2026 Another entry in the Toy Optimizer series. Last time, we did load-store forwarding in the context of our Toy Optimizer. We managed to cache the results of both reads from and writes to the heap—at compile-time! We were careful to mind object aliasing: we separated our heap information into alias classes based on what offset the reads/writes referenced. This way, if we didn't know if object a and b aliased, we could at least know that different offsets would never alias (assuming our objects don't overlap and memory accesses are on word-sized slots). This is a coarse-grained heuristic. Fortunately, we often have much more information available at compile-time than just the offset, so we should use it. I mentioned in a footnote that we could use type information, for example, to improve our alias analysis. We'll add a lightweight form of type-based alias analysis (TBAA) (PDF) in this post.

Representing types We return once again to Fil Pizlo land, specifically How I implement SSA form. We're going to be using the hierarchical heap effect representation from the post in our implementation, but you can use your own type representation if you have one already. This representation divides the heap into disjoint regions by type. Consider, for example, that Array objects and String objects do not overlap. A LinkedList pointer is never going to alias an Integer pointer. They can therefore be reasoned about separately. But sometimes you don't have perfect type information available. If you have in your language an Object base class of all objects, then the Object heap overlaps with, say, the Array heap. So you need some way to represent that too—just having an enum doesn't work cleanly.

Here is an example simplified type hierarchy:

Any Object ├── Array │ └── String └── Other

Where Other might represent different parts of the runtime's data structures, and could be further segmented into GC, Thread, etc. Fil's idea is that we can represent each node in that hierarchy with a tuple of integers [start, end) (inclusive, exclusive) that represent the pre- and post-order traversals of the tree. Or, if tree traversals are not engraved into your bones, they represent the range of all the nested objects within them.

Any [0, 3) Object [0, 2) ├── Array [0, 1) │ └── String [1, 2) └── Other [2, 3)

Then the "does this write interfere with this read" check—the aliasing check—is a range overlap query. Here's a perhaps over-engineered Python implementation of the range and heap hierarchy based on the Ruby generator and C++ runtime code from JavaScriptCore:

class HeapRange: def init(self, start: int, end: int) -> None: self.start = start self.end = end

def __repr__(self) -> str:
    return f"[{self.start}, {self.end})"

def is_empty(self) -> bool:
    return self.start == self.end

def overlaps(self, other: "HeapRange") -> bool:
    # Empty ranges interfere with nothing
    if self.is_empty() or other.is_empty():
        return False
    return self.end > other.start and other.end > self.start

class AbstractHeap: def init(self, name: str) -> None: self.name = name self.parent = None self.children = [] self.range = None

def add_child(self, name: str) -> None:
    result = AbstractHeap(name)
    result.parent = self
    self.children.append(result)
    return result

def compute(self, start: int) -> None:
    current = start
    if not self.children:
        self.range = HeapRange(start, current + 1)
        return
    for child in self.children:
        child.compute(current)
        current = child.range.end
    self.range = HeapRange(start, current)

Any = AbstractHeap("Any") Object = Any.add_child("Object") Array = Object.add_child("Array") String = Object.add_child("String") Other = Any.add_child("Other")

Any.compute(0)

Where Any.compute(0) kicks off the tree-numbering scheme. Fil's implementation also covers a bunch of abstract heaps such as SSAState and Control because his is used for code motion and whatnot. That can be added on later but we will not do so in this post.

So there you have it: a type representation. Now we need to use it in our load-store forwarding. Recall that our load-store optimization pass looks like this:

def optimize_load_store(bb: Block): opt_bb = Block() # Stores things we know about the heap at... compile-time. # Key: an object and an offset pair acting as a heap address # Value: a previous SSA value we know exists at that address compile_time_heap: Dict[Tuple[Value, int], Value] = {} for op in bb: if op.name == "store": obj = op.arg(0) offset = get_num(op, 1) store_info = (obj, offset) current_value = compile_time_heap.get(store_info) new_value = op.arg(2) if eq_value(current_value, new_value): continue compile_time_heap = { load_info: value for load_info, value in compile_time_heap.items() if load_info[1] != offset } compile_time_heap[store_info] = new_value elif op.name == "load": load_info = (op.arg(0), get_num(op, 1)) if load_info in compile_time_heap: op.make_equal_to(compile_time_heap[load_info]) continue compile_time_heap[load_info] = op opt_bb.append(op) return opt_bb

At its core, it iterates over the instructions, keeping a representation of the heap at compile-time. Reads get cached, writes get cached, and writes also invalidate the state of compile-time information about fields that may alias. In this case, our may alias asks only if the offsets overlap. This means that the following unit test will fail:

def test_store_to_same_offset_different_heaps_does_not_invalidate_load(): bb = Block() var0 = bb.getarg(0) var0.info = Array var1 = bb.getarg(1) var1.info = String var2 = bb.store(var0, 0, 3) var3 = bb.store(var1, 0, 4) var4 = bb.load(var0, 0) bb.escape(var4) opt_bb = optimize_load_store(bb) assert ( bb_to_str(opt_bb) == """ var0 = getarg(0) var1 = getarg(1) var2 = store(var0, 0, 3) var3 = store(var1, 0, 4) var4 = escape(3) """ )

This test is expecting the write to var0 to still remain cached even though we wrote to the same offset in var1—because we have annotated var0 as being an Array and var1 as being a String. If we account for type information in our alias analysis, we can get this test to pass. After doing a bunch of fussing around with the load-store forwarding (many rewrites), I eventually got it down to a very short diff:

+def may_alias(left: Value, right: Value) -> bool:

  • return (left.info or Any).range.overlaps((right.info or Any).range)

+def optimize_load_store(bb: Block):

  • opt_bb = Block()
  • Stores things we know about the heap at... compile-time.

  • @@ -138,6 +210,10 @@
  • def optimize_load_store(bb: Block):
  •    load_info: value for load_info, value in compile_time_heap.items() if load_info[1] != offset
    
  •    or not may_alias(load_info[0], obj)
    
  • } compile_time_heap[store_info] = new_value

If we don't have any type/alias information, we default to "I know nothing" (Any) for each object. Then we check range overlap. The boolean logic in optimize_load_store looks a little weird, maybe. But we can also rewrite (via DeMorgan's law) as:

{ ... for ... if not (load_info[1] == offset and may_alias(load_info[0], obj)) }

So, keeping all the cached field state about fields that are known by offset and by type not to alias. Maybe that is clearer (but not as nice a diff). Note that the type representation is not so important here! You could use a bitset version of the type information if you want. The important things are that you can cheaply construct types and check overlap between them. Nice, now our test passes! We can differentiate between memory accesses on objects of different types. But what if we knew more?

Object provenance / allocation site Sometimes we know where an object came from. For example, we may have seen it get allocated in the trace. If we saw an object's allocation, we know that it does not alias (for example) any object that was passed in via a parameter. We can use this kind of information to our advantage. For example, in the following made up IR snippet:

trace(arg0): v0 = malloc(8) v1 = malloc(16) ...

We know that (among other facts) v0 doesn't alias arg0 or v1 because we have seen its allocation site. I saw this in the old V8 IR Hydrogen's lightweight alias analysis1:

enum HAliasing { kMustAlias, kMayAlias, kNoAlias };

HAliasing Query(HValue* a, HValue* b) { // The same SSA value always references the same object. if (a == b) return kMustAlias; if (a->IsAllocate() || a->IsInnerAllocatedObject()) { // Two non-identical allocations can never be aliases. if (b->IsAllocate()) return kNoAlias; if (b->IsInnerAllocatedObject()) return kNoAlias; // An allocation can never alias a parameter or a constant. if (b->IsParameter()) return kNoAlias; if (b->IsConstant()) return kNoAlias; } if (b->IsAllocate() || b->IsInnerAllocatedObject()) { // An allocation can never alias a parameter or a constant. if (a->IsParameter()) return kNoAlias; if (a->IsConstant()) return kNoAlias; } // Constant objects can be distinguished statically. if (a->IsConstant() && b->IsConstant()) { return a->Equals(b) ? kMustAlias : kNoAlias; } return kMayAlias; }

There is plenty of other useful information such as:

If we know at compile-time that object A has 5 at offset 0 and object B has 7 at offset 0, then A and B don't alias (thanks, CF) In the RPython JIT in PyPy, this is used to determine if two user (Python) objects don't alias because we know the contents of the user (Python) class field Object size (though perhaps that is a special case of the above bullet) Field size/type

Deferring alias checks to run-time Have a branch if (a == b) { ... } else { ... }

… If you have other fun ones, please write in.

Interacting with other instructions We only handle loads and stores in our optimizer. Unfortunately, this means we may accidentally cache stale information. Consider: what happens if a function call (or any other opaque instruction) writes into an object we are tracking? The conservative approach is to invalidate all cached information on a function call. This is definitely correct, but it's a bummer for the optimizer. Can we do anything? Well, perhaps we are calling a well-known function or a specific IR instruction. In that case, we can annotate it with effects in the same abstract heap model: if the instruction does not write, or only writes to some heaps, we can at least only partially invalidate our heap.

known_builtin_functions = { "Array_length": Effects(reads=Array, writes=Empty()), "Object_setShape": Effects(reads=Empty(), writes=Object), "String_setEncoding": Effects(reads=Empty(), writes=String), }

However, if the function is unknown or otherwise opaque, we need at least more advanced alias information and perhaps even (partial) escape analysis. Consider: even if an instruction takes no operands, we have no idea what state it has access to. If it writes to any object A, we cannot safely cache information about any other object B unless we know for sure that A and B do not alias. And we don't know what the instruction writes to. So we may only know we can cache information about B because it was allocated locally and has not escaped.

Storing vs computing on the fly Some runtimes such as ART pre-compute all of their alias information in a bit matrix. This makes more sense if you are using alias information in a full control-flow graph, where you might need to iterate over the graph a few times. In a trace context, you can do a lot in one single pass—no need to make a matrix.

When is this useful? How much? As usual, this is a toy IR and a toy optimizer, so it's hard to say how much faster it makes its toy programs. In general, though, there is a dial for analysis and optimization that goes between precision and speed. This is a happy point on that dial, only a tiny incremental analysis cost bump above offset-only invalidation, but for higher precision. I like that tradeoff. Also, it is very useful in JIT compilers where generally the managed language is a little better-behaved than a C-like language. Somewhere in your IR there will be a lot of duplicate loads and stores from a strength reduction pass, and this can clean up the mess.

Wrapping up Thanks for joining as I work through a small use of type-based alias analysis for myself. I hope you enjoyed.

Thanks Thank you to Chris Gregory for helpful feedback. I made a fork of V8 to go spelunk around the Hydrogen IR. I reset the V8 repo to the last commit before they deleted it in favor of their new Sea of Nodes based IR called TurboFan. ↑

Comments

Loading comments...