BREX: The Domain-Specific Language for Mastering Binary Data Extraction
Share this article
In the realm of network analysis, cybersecurity, and low-level data processing, the ability to precisely extract and interpret information from raw binary data is a critical skill. Traditional string manipulation and regular expressions often fall short when dealing with the structured, byte-oriented nature of network packets, file formats, and other binary streams. Enter Binary Range Expression (BREX), a sophisticated Domain-Specific Language (DSL) engineered to solve this exact problem.
Developed by Proxylity, BREX provides a declarative and expressive syntax for extracting sub-ranges from binary data based on position, conditions, or complex logic. While its primary design target is network packet analysis and processing for the company's UDP Gateway, its utility extends far beyond, making it a valuable tool for security scanning, file identification, and any task requiring granular binary data manipulation. At its core, BREX allows developers to define exactly what data they need and where to find it, turning opaque binary blobs into structured, accessible information.
The Fundamentals: Addressing and Slicing
The building blocks of BREX are its range expressions, which provide direct access to bytes within a data buffer. The syntax is both intuitive and powerful, allowing for simple offsets and complex slicing operations.
A single bracket pair [] is used to define a range. The simplest form specifies a single byte by its absolute offset from the start of the buffer:
[5] - Returns byte at offset 5
To extract a contiguous sequence of bytes, you can specify a start and end offset, separated by a comma. This is known as a slice:
[2, 4] - Returns 4 bytes starting at offset 2
[0, 10] - Returns first 10 bytes
BREX also supports a Python-like slicing syntax using a colon, which can be more expressive. The format is [start:end], where end is exclusive.
[2:6] - Returns 4 bytes starting at offset 2 up to but not including byte 6
[0:10] - Returns first 10 bytes
[2:] - Return all bytes starting with the byte at offset 2
[:] - Returns all bytes
[] - Returns an empty range
This slicing capability forms the bedrock of more complex data extraction, allowing developers to isolate specific headers, payloads, or fields within a larger binary structure.
Dynamic Offsets and Pointer Chaining
Static offsets are useful, but the true power of BREX emerges when you need to calculate offsets dynamically at runtime. This is essential for parsing data structures where the location of a field depends on the value of another field—a common scenario in formats like Type-Length-Value (TLV).
BREX allows you to read a value from the data and use it as an offset or length. This is achieved using a type-prefixed syntax, such as u8[0] to read an unsigned 8-bit integer from offset 0, or u16le[0:] to read a 16-bit unsigned integer in little-endian format starting at offset 0.
[u8[0] + 1] - Read byte at (value of byte 0) + 1
[u16le[0] * 2, 4] - Read 4 bytes at ((16-bit value at byte 0) * 2)
[u8[0] + 1, 10] | [5] - Read 10 bytes from calculated offset, get byte 5
This syntax supports a full range of integer types (u8, i8, u16le, u16be, etc.) and endianness, ensuring accurate interpretation of binary data from any system. Furthermore, BREX supports pointer chaining, where the result of one dereference can be used in another:
[u8[u8[0]]] - Three-level pointer following
[u8[u16le[0]]] - Mixed types in pointer chain
This enables the traversal of complex pointer structures, a common requirement in reverse engineering and malware analysis.
Expression Chaining: Building Data Pipelines
Real-world data extraction is rarely a single-step process. Often, you must first locate a piece of data, then extract a sub-part of it, and finally interpret that sub-part. BREX elegantly handles this through expression chaining, using the pipe operator | to pass the output of one expression as the input to the next.
[10:20] | [5] - Get bytes 10-19, then get byte 5 of that result
[0:100] | [50:60] - Get bytes 0-99, then get bytes 50-59 of that result
Chaining evaluates from left to right, with each operation operating on the result of its predecessor. This creates a powerful, readable data processing pipeline.
[20:30] | [2:8] | [1]
Step-by-step evaluation:
1. [20:30] → produces 10 bytes (original buffer bytes 20-29)
2. | [2:8] → produces 6 bytes (bytes 2-7 of the 10-byte result)
3. | [1] → produces 1 byte (byte 1 of the 6-byte result)
This chaining mechanism is particularly potent when dealing with structured data. For example, to find a timestamp field in a packet:
[0:20] | [16:] | u32le[0:] - Get header, skip to timestamp field, read as uint32
To simplify common chaining patterns, BREX offers a concise syntax where adjacent brackets imply a pipe operation, making complex expressions more readable:
Verbose: [10:20] | [5]
Concise: [10:20][5]
Verbose: [20:, [0]==31] | [2:]
Concise: [20:, [0]==31][2:]
Mastering Structured Data: TLV and Slice Search
One of BREX's most advanced features is its native support for parsing Type-Length-Value (TLV) records and other structured binary formats. The slice search operation allows you to iterate through a data buffer to find slices that match specific conditions.
The syntax is [start:, condition]. By default, BREX assumes a standard TLV format where the first byte is the type and the second is the length.
[20:, ==31] - Find entire TLV slice where type=31 starting at position 20
[0:, ==0x42] - Find entire TLV slice where type=0x42 from start of buffer
The concise form ==31 is shorthand for [0]==31, assuming the type is at the first byte of the slice. Once a matching TLV is found, you can use chaining to extract specific parts of it:
[20:, [0]==31] | [2:] - Find TLV type 31, return value part (skip type+length)
[20:, [0]==31] | [0] - Find TLV type 31, return just the type byte
[20:, [0]==31] | [1] - Find TLV type 31, return just the length byte
BREX's slice search is highly flexible, supporting complex logical conditions:
[20:, [0]>10] - Find slice where type > 10
[20:, [0]==31 && [2]<5] - Type=31 AND first value byte < 5
[20:, [0]==31 || [0]==32] - Type 31 OR type 32
[20:, [0] & 0x80 == 0] - Type field bitmask check
Recognizing that TLV formats vary, BREX allows you to customize the type and length field positions. You can also specify custom length-parsing logic, such as reading a 32-bit big-endian length from a specific offset:
[20:, [0]==31, u32be[4:]] - Type at [0], u32be length at relative position 4
Finally, for fixed-record formats, BREX can search for records where a specific field matches a condition:
[20:, [0]==31, 8] - 8-byte records, find where first byte = 31
[0:, [2]>100, 12] - 12-byte records, find where 3rd byte > 100
Arithmetic, Logic, and Control Flow
To handle truly dynamic data extraction, BREX incorporates a rich set of arithmetic and logical operators. You can perform calculations on extracted values to determine offsets, lengths, or conditions.
[u8[0] + u8[1]] - Add two byte values
u16le[0:] - 100 - Subtract a constant
u8[0] * 4 - Multiply by a constant
u32le[0:] / u16le[4:] - Divide two values
u8[0] % 16 - Modulo operation
All arithmetic operations automatically convert to the largest type involved, preventing overflow and ensuring correctness. For conditions, BREX supports standard comparisons and logical operators:
u8[0] == 42 - Equality
u16le[0:] > 1000 - Greater than
u8[0] != 0 - Inequality
!(u8[0] > 128) - Logical NOT
BREX also includes a ternary conditional operator (? :) for inline logic, allowing for more expressive and compact expressions:
u8[0] == 1 ? [1, 4] : u8[0] == 2 ? [5, 8] : [0, 1] - Nested conditions
A crucial safety feature is the null-coalescing operator (??), which provides a fallback value if an offset is out of bounds, preventing runtime errors:
u8[100] ?? [0, 1] - Return [0,1] if offset 100 is out of bounds
For bitwise operations on ranges, BREX provides && (AND), || (OR), and ^^ (XOR), which operate byte-wise between two ranges:
[0:2] && [2:4] - Bitwise AND between two ranges
Operator Precedence and Best Practices
To ensure predictable behavior, BREX follows a strict order of operations, from highest to lowest precedence: parentheses, array indexing, type casting, arithmetic operators (*, /, %), arithmetic operators (+, -), bitwise shifts, relational comparisons, bitwise AND, bitwise XOR, bitwise OR, logical AND, logical OR, and finally the ternary conditional operator.
u8[0] + u8[1] * 2 is equivalent to u8[0] + (u8[1] * 2)
When crafting complex BREX expressions, clarity and robustness are paramount. The documentation offers several best practices:
- Safe Access: Always use null coalescing or conditional checks to prevent errors from invalid offsets.
u8[1] >= 6 ? u32le[u8[2]:] : 0 u32le[u8[2]:] ?? 0
- Clear Intent: Use explicit syntax when it makes the expression's purpose more obvious, even if a concise alternative exists.
u8[0] == 0x04 ? [1, u8[0]] : [0, 1]
- Break It Down: For highly complex logic, consider breaking the expression into multiple chained steps. This improves readability and makes debugging significantly easier.
// Instead of one massive expression: [0:, ==0x42][u8[1]:][u8[0]:] // Break it down: [0:, ==0x42] // Find the TLV [u8[1]:] // Use its length to get the value [u8[0]:] // Use the first byte of the value as an offset
The Broader Impact
The introduction of a language like BREX represents a significant step forward for developers and security professionals who regularly grapple with low-level data. It abstracts away the tedious and error-prone work of manual byte-by-byte parsing, allowing them to focus on the logic of their analysis. Whether it's for dissecting a new network protocol, identifying file signatures within a stream, or building a robust security scanner, BREX provides a versatile and powerful toolset.
As binary data continues to be the lifeblood of modern computing, languages that offer a higher level of abstraction for its manipulation will become increasingly valuable. BREX stands as a prime example of this trend, demonstrating how a well-designed DSL can empower developers to tackle complex challenges with greater efficiency and precision.