#Security

printfasm: An Assembler That Compiles to printf

Tech Essays Reporter
7 min read

A compact assembler written in fewer than a thousand lines translates a declarative source language into C code that relies exclusively on printf and its %n specifier, turning a familiar library routine into a Turing‑complete substrate for computation.

The project ~sebsite/printfasm demonstrates that a conventional library function can serve as the foundation for an assembly language. By generating C code that invokes only printf and exploits the %n specifier, the assembler creates a loop in which the side‑effects of printing become the sole mechanism for updating state. This approach challenges the usual separation between low‑level instruction sets and high‑level library calls, suggesting that the boundaries of what counts as an assembler are more fluid than commonly assumed.

The %n specifier, defined in the C standard for the printf family, stores the number of bytes printed so far into a pointer argument. The official specification of this behavior can be consulted at https://en.cppreference.com/w/c/io/fprintf. Because the specifier writes to memory, a program can use it to increment counters or toggle flags without any explicit assignment statements. The printfasm source language encodes such updates as outref statements, which the assembler translates into expressions that appear as arguments to printf. For example, the line ->fizz ([i] + 1) % 3 == 0 becomes a C expression that evaluates the remainder and stores the resulting 0 or 1 into the slot that corresponds to the fizz variable. The generated code therefore contains no traditional assembly instructions; instead, it consists of a while loop that repeatedly calls printf with a carefully constructed format string.

Memory in printfasm is modeled as a static unsigned char array of size 65536, indexed by integer expressions. The assembler reserves a few slots for special purposes: the exit slot at index 65534, the accumulator slot at index 65533, and a few others that remain undocumented. The exit slot is checked at the start of each iteration, and a non‑zero value causes the loop to terminate. The accumulator slot is used internally to keep track of the byte count that %n writes, which in turn drives the next iteration of the loop. Because the format string relies on POSIX positional specifiers such as %2$d, the generated program requires a libc implementation that follows the POSIX specification. The documentation for these specifiers is available at https://pubs.opengroup.org/onlinepubs/9699919799/functions/printf.html. The author notes that glibc works reliably, while musl’s current implementation exhibits bugs that prevent the program from running correctly.

The source language of printfasm is deliberately declarative. Alias declarations give human‑readable names to integer slots, and the outref syntax (->foo expr) indicates that the result of expr should be stored into the slot named foo at the end of the current iteration. The inref syntax (<-foo) reads from a slot that may have been updated earlier in the same iteration, enabling constructs such as string printing that depend on previously stored data. The grammar governing these constructs is described in detail in grammar.txt, which resides in the repository at https://git.sr.ht/~sebsite/printfasm/blob/main/grammar.txt. The language also supports conditional printing through the if clause, where the expression following if determines whether the preceding value is emitted. Boolean logic is expressed using bitwise operators & and |, with the convention that !! converts any non‑zero operand to 1 and unary - on a boolean operand yields -1, thereby ensuring that the intended operand participates in the bitwise operation. This design mirrors C’s operator precedence, which the author treats as a useful quirk rather than a historical mistake.

Several constraints shape the practical experience of using printfasm. First, the generated code is extremely slow because each iteration invokes printf, which performs formatting, buffering, and terminal handling. The CPU cost of repeatedly calling a library routine outweighs any benefit of a minimal instruction set. Second, the reliance on signed integer arithmetic introduces undefined behavior when overflow occurs, as noted in the project’s README at https://git.sr.ht/~sebsite/printfasm/blob/main/README.md. The assembler does not perform overflow checks, so programs must avoid calculations that exceed INT_MAX. Third, the use of the terminal’s alternate screen buffer means that the output cannot be meaningfully redirected to a file or piped to another program. This limitation stems from the fact that the format string includes escape sequences that manipulate the screen buffer, and the author explicitly warns that a terminal emulator must support synchronized output for the visual effect to appear correctly. Fourth, input handling is currently limited to raw mode and non‑blocking reads, and the language does not yet provide a direct way to print characters received from the user without storing them in memory first. The termios interface that powers this feature is documented at https://man7.org/linux/man-pages/man3/termios.3.html. These constraints together make printfasm unsuitable for production workloads but well suited for experimental or pedagogical contexts.

From an educational standpoint, printfasm offers a concrete illustration of how a single library routine can embody a full computational model. The project’s README encourages newcomers to experiment with mandelbrot.pfs, a program that computes and prints the Mandelbrot set using only printf. Such examples reveal the expressive power of the %n specifier and the way in which side‑effects can replace explicit control flow. The approach aligns with broader efforts in esoteric programming that seek to minimize language features while preserving Turing completeness. A notable discussion of printf’s Turing‑complete nature appears in a series of notes hosted by Carnegie Mellon University at https://www.cs.cmu.edu/~15130/notes/printf_turing_complete.html, which the author references to justify the claim that the generated code can simulate any computable function given sufficient memory and time.

Security considerations also arise naturally from the use of %n. Format string vulnerabilities have been a persistent issue in C programs, as documented by the Open Web Application Security Project at https://owasp.org/www-community/attacks/Format_string_attack. By repurposing %n for intentional state updates, printfasm demonstrates both the flexibility and the danger of this specifier. The generated code does not perform bounds checks on the memory slots, and any out‑of‑range access leads to undefined behavior, which could manifest as crashes or, in less controlled environments, as exploitable memory corruption. The author acknowledges that the assembler does not currently validate I/O errors, leaving room for future hardening measures.

Critics may argue that the project’s novelty is superficial. The generated programs are not true assemblers in the traditional sense because they lack explicit instruction encoding, registers, or immediate addressing modes. Instead, they are macro‑expanded C fragments that rely on the host compiler and libc. The performance penalty of invoking printf in a tight loop makes the approach impractical for any real‑world application, and the requirement for a terminal that supports the alternate screen buffer limits portability. Moreover, the reliance on undefined behavior for signed overflow and on the precise implementation of %n in a particular libc introduces fragility that many developers would find unacceptable.

Another line of critique focuses on the conceptual classification of the tool. Some observers view printfasm as a domain‑specific language embedded within C rather than a genuine assembler. The language’s declarative syntax and the fact that memory stores are deferred until after the printf call blur the line between compile‑time and run‑time semantics. The presence of inref statements, which read from slots that have been updated earlier in the same iteration, further complicates the mental model of a simple loop. These features suggest that the project is more about exploring the expressive limits of printf than about providing a faithful representation of assembly language.

Looking ahead, the author has outlined several avenues for expansion. Adding support for printing user input directly would close the gap between reading and writing, enabling interactive programs such as Tetris. Incorporating floating‑point literals would broaden the range of numeric operations, though the author cautions that this would likely exceed the 1000‑line source limit. Improving error handling, especially for I/O failures and for malformed format strings, would increase robustness without sacrificing the minimalist spirit. The inclusion of a Vim plugin, located at https://git.sr.ht/~sebsite/printfasm/blob/main/vim/printfasm.vim, already provides syntax highlighting and basic editing assistance, indicating that the community may be willing to adopt the tool for exploratory programming.

The philosophical significance of printfasm lies in its reminder that computation need not be confined to explicit instructions. By leveraging a side‑effect of a well‑known library function, the project shows that any Turing‑complete substrate can be repurposed to host a language. This perspective invites reflection on the role of side‑effects in defining control flow, on the minimal resources required to express arbitrary computation, and on the creative possibilities that arise when constraints are deliberately imposed. The assembler’s compact source code, its reliance on a single C construct, and its ability to generate visual output in the terminal together form a compelling case study for anyone interested in the intersection of language design, runtime behavior, and the limits of abstraction.

Comments

Loading comments...