The Portal-and-Cell Rendering Engine of Thief: The Dark Project

A deep dive into the software rendering technology behind the 1998 stealth classic, exploring how Looking Glass Studios solved visibility, sorting, and lighting without hardware acceleration, and why these techniques differed from contemporaries like Quake.

When Thief: The Dark Project shipped in 1998, it arrived at a pivotal moment in PC gaming. 3D hardware acceleration was just beginning to take off with cards like the 3dfx Voodoo, but Thief's development cycle meant it was built for a world that didn't yet rely on GPUs. The result was one of the most sophisticated software rendering engines of its era, built on a portal-and-cell visibility system that was radically different from the BSP trees used by id Software's Quake engine.

The core of Thief's renderer was born from a research project aimed at solving a problem that Looking Glass had been circling for years: how to build a seamless indoor-outdoor world. Previous Looking Glass titles like Ultima Underworld and System Shock were confined to grid-based dungeons, while Terra Nova had outdoor terrain. The vision was a game that could transition between these spaces without loading screens or artificial boundaries.

The solution came from Seth Teller's 1992 PhD dissertation on portal-based visibility culling. The Thief engine divided the world into convex polyhedra called "cells"—essentially rooms or open spaces. Each cell contained visible world polygons along its boundaries, and cells that connected to each other had special boundary polygons called "portals." The key insight was that if you could see into a cell, you could see into adjoining cells through their portals.

Unlike Quake, which precomputed a Potentially Visible Set (PVS) for every possible location in a level, Thief performed this visibility calculation at runtime. Starting from the player's viewpoint, the engine conducted a breadth-first traversal of portals and cells. For each portal, it performed a series of 2D bounding checks using what it called "bounding octagons"—a combination of axis-aligned and 45-degree rotated bounding boxes. If a portal's octagon overlapped with the visible region of the current cell, the engine would add the adjacent cell to the visible list and calculate the intersection of their octagons as the new visible region.

This approach had several advantages over Quake's precomputed PVS. In a long corridor with multiple side rooms, Quake would always render all side rooms because they were marked as visible from anywhere in the corridor. Thief, however, could cull entire rooms if the player's view through the corridor's portals didn't actually reach them. The trade-off was computational overhead—Thief's runtime portal traversal became increasingly expensive as polygon counts grew, especially when the engine was later adapted for hardware acceleration.

The Overdraw Problem and Clipping Strategy

Quake reduced overdraw (drawing the same pixel multiple times) using a span-buffering technique that drew surfaces front-to-back, ensuring each pixel was drawn exactly once. Thief, by contrast, drew polygons back-to-front, which naturally caused overdraw. However, Thief's portal-based visibility provided tighter bounds than Quake's naive approach, and the engine further reduced overdraw by clipping each world polygon to the bounding octagon of its cell's visibility.

This clipping was efficient because it was purely 2D along simple axes, and Thief used a clever texture mapping technique where texture coordinates were defined by a 3D vector basis independent of polygon vertices. This allowed texture coordinates for each pixel to be computed directly from the basis vectors, avoiding per-vertex texture coordinate storage and making clipping operations straightforward.

The clipping process often converted polygons into 8-sided shapes, but this wasn't a performance issue. More significantly, this approach required splitting any polygon that extended across multiple cells, whereas Quake could often preserve such polygons as single entities in its BSP tree. The impact of this splitting on performance and visual quality was likely minimal, as both engines also split polygons for surface caching purposes.

Object Culling and the Limits of Runtime Visibility

Objects in Thief were tracked as they moved through the level, with the engine maintaining which cells each object occupied. This incremental tracking allowed the engine to handle arbitrary topologies—spaces that folded back on themselves or had overlapping rooms—though the actual Thief game never exploited this capability.

Once the visible cells were determined, Thief would decide which objects needed rendering. Only objects in visible cells were considered, but the engine went further: it computed 2D bounding octagons for each potentially visible object and compared them against the visible regions of their containing cells. This additional culling step meant Thief could handle more object-dense worlds than a Quake-based renderer. While it couldn't necessarily render more visible objects simultaneously, it could place more objects in the overall level as long as they were only visible from limited viewpoints—resulting in well-stocked kitchens and detailed environments that were a hallmark of Looking Glass design.

The Sorting Challenge: Painter's Algorithm Without a Z-Buffer

Perhaps the most complex aspect of Thief's renderer was its object sorting system. Unlike Quake, which generated a z-buffer from world polygons and used it to test objects, Thief relied entirely on the painter's algorithm—drawing everything back-to-front. This approach was common in the software rendering era but prone to sorting artifacts where objects would appear through walls or disappear behind surfaces they shouldn't.

Thief's sorting algorithm was, by the author's admission, "the most complicated painter's algorithm sorter I've ever heard of." The engine initially sorted cells based on portal-traversal order, which provided a topological front-to-back sequence. However, this proved problematic, so the final game used a BSP tree to sort cells, which could produce counterintuitive draw orders—cells near the root of the BSP tree might be drawn before cells that were geometrically closer to the viewer.

To solve this, Thief assigned each cell a draw order number. An object in cell N would normally render after the inward-facing world polygons of cell N and before the walls of cell N+1. However, objects often spanned multiple cells, creating complex sorting scenarios. Consider a corridor (cell N) with a niche (cell M) containing a torch that protrudes into the corridor. If the viewpoint is in the corridor, the corridor walls are "closer" in back-to-front rendering, but parts of the torch might be occluded by corridor walls while other parts occlude the niche walls.

Thief's solution was to compute valid rendering ranges for each object. For each object, the engine determined the earliest and latest cells in the draw order where it could be rendered without causing sorting errors. It did this by comparing bounding octagons of objects and world polygons to see if a polygon that should be "in front" of an object actually intersected its bounding region. If so, the object couldn't be drawn after that polygon.

For objects spanning multiple cells, Thief attempted to find a rendering order that satisfied all constraints while keeping the cell draw order fixed. This was where the mathematical proof mentioned by the author came in—ensuring that a valid ordering always existed for the given constraints.

However, some cases were impossible to resolve. In the torch-in-niche example, one wall polygon in the corridor might be occluded by the torch, while another wall polygon in the same corridor might occlude part of the torch. No single rendering order could satisfy both conditions. Thief's fallback solution was to split objects dynamically: each object would be rendered multiple times, once for each cell it occupied, using user clip planes to clip the object to the appropriate region. This was expensive—especially for skinned characters, which required multiple skinning transformations—and could create t-junctions at portal boundaries, potentially causing visual cracks.

Light, Shadow, and the Surface Cache

Thief's lighting system was conceptually similar to Quake's. Both used light mapping and surface caching to store precomputed lit surfaces. The author recalls that Thief's team added this technology after seeing Quake's QTest release in February 1996, though the timeline suggests the engine was already in development.

The surface cache stored lit textures in 8x8 blocks, which aligned well with the engine's texture mapping architecture. Thief initially experimented with paletted lighting—a technique where lighting values were applied via color lookup tables rather than per-pixel calculations. This allowed the engine to achieve 20 fps at 640x400 resolution on a Pentium 90 in early demos, but was ultimately abandoned due to artistic limitations and the game's shadow-oriented design.

Objects could raycast to light sources to determine visibility, though it's unclear if this made it into the final game. The team also considered using lightmap values on the floor to determine if the player was in shadow, which could affect guard AI detection—a clever integration of rendering technology with gameplay systems.

Constructive Solid Geometry: A Temporal Approach

Thief's level editor used Constructive Solid Geometry (CSG), but with a fundamentally different model than Quake. Quake started with an empty space and placed solid brushes, using explicit subtraction operations. Thief started with a solid world and carved out space using operations that could change the medium (solid, air, water, or lava) in a given region.

The author describes Thief's CSG as "temporal," analogous to Photoshop layers. Each brush operation was processed in sequence, with later operations affecting the result of earlier ones. This allowed complex sequences like carving a room (air), then placing a pillar (solid), then filling part of it with water, then carving a hole in the water to create an air pocket.

While powerful, this temporal model was harder for designers to visualize than Quake's layer-based approach. The author admits to making "a terrible choice" by implementing the CSG system using a BSP tree. While this provided spatial acceleration, it introduced uncontrollable BSP split planes that could intersect unrelated brushes, causing epsilon problems and compilation failures. The editor became notorious for producing levels that failed to compile due to these geometric precision issues.

In hindsight, the author believes a direct brush-intersection approach—like Quake's—would have been more robust. The BSP-based CSG also complicated the generation of t-junction-free meshes, which Quake handled with a post-processing step that Thief avoided through more complex invariants.

Perspective Texture Mapping and the Pentium Optimization

Thief used a custom perspective texture mapper for all polygons, unlike earlier Looking Glass games that used affine mapping for distant surfaces. The mapper was heavily optimized for the Pentium processor, using a technique from Michael Abrash's Quake documentation: issuing a floating-point divide for perspective correction every 8 pixels, which would execute in parallel with integer instructions for the next 8 pixels.

The engine supported two texture formats: one restricted to 256x256 textures with power-of-two sizes and fixed stride, and another for arbitrary non-power-of-two textures without wrapping. The latter was used for surface-cached lighting to avoid wasting memory on padding textures to 256-pixel widths.

The 8-pixel unrolled inner loops achieved 4 cycles per pixel on the Pentium, using careful instruction scheduling to avoid AGI (Address Generation Unit) stalls. The code used self-modifying constants—a common trick on 486 and Pentium processors—to avoid the slower "and register, memory" instruction in favor of the single-cycle "and register, immediate."

Texture-Mapping Effects and the Chaining Architecture

Thief's texture mapping system was designed for flexibility, using a chaining architecture where the core mapper wrote pixels to a temporary buffer, then invoked a series of processing functions. These functions could apply lighting, color lookup tables, transparency, or other effects before writing to the framebuffer.

This architecture supported:

Paletted lighting with per-portal color lookup tables
Transparent water surfaces (rendered by coloring underwater surfaces with a CLUT rather than using expensive translucency operations)
Force field effects (via per-portal CLUTs)
Gouraud shading (by generating lighting values into a temporary buffer)

The CLUT system was particularly clever: instead of applying color transformations as a post-process (which would increase fill rate costs), Thief applied CLUTs during the texture mapping itself. This allowed efficient underwater fog effects, where the CLUT would become progressively more opaque the deeper the portal chain went. However, this revealed portal boundaries and was eventually replaced with fixed underwater fog.

A limitation emerged when a single cell was visible through multiple portal paths with different CLUT sequences—Thief would randomly choose one, though this scenario was never exercised in the final game.

What Wasn't Written by the Author

The author emphasizes that Thief's renderer was a collaborative effort. The 3D camera system, vertex transformation, and clipping were part of a shared technology library used across Looking Glass products. Thief batched vertices from entire cells for transformation, similar to modern compiled vertex arrays.

Character skinning with skeletal animation was handled by a separate system the author never touched. Object polygon sorting used a clever BSP-based system that reduced node decisions by recognizing that single-sided polygons often couldn't occlude each other from any angle.

Hardware acceleration support was added during the author's absence from Looking Glass, and he later contributed colored lighting support to the surface cache.

Legacy and Lessons

Thief's software renderer represents a fascinating alternative to the BSP-based approaches that dominated the era. Its runtime portal traversal offered tighter visibility culling than precomputed PVS, but at the cost of increased CPU overhead. The sorting system, while complex, solved many of the artifacts that plagued other software-rendered games.

The engine's greatest limitation was its scalability. As polygon counts increased, the portal traversal and sorting overhead became prohibitive, especially when adapted for hardware acceleration. The CSG system's BSP-based implementation created persistent stability issues in the editor.

Yet Thief's rendering technology enabled something remarkable: a game that felt more spatially coherent than its contemporaries, with seamless transitions between spaces and dense, believable environments. The portal-and-cell approach, while ultimately superseded by hardware-accelerated rendering and modern visibility systems, remains a testament to the creative problem-solving of the Looking Glass team.

For modern developers, Thief's renderer offers lessons in the trade-offs between runtime and precomputed visibility, the importance of robust CSG tools, and the value of designing rendering systems that align with artistic goals—in Thief's case, creating a shadowy, atmospheric world where visibility and occlusion were central to the gameplay experience.

#software rendering #portal culling #BSP #texture mapping #game engine