#Infrastructure

LOB‑Enabled Regex Functions in SQL Server 2025 CU5 and Azure SQL – What Changed, How It Compares, and Business Impact

Cloud Reporter
6 min read

SQL Server 2025 CU5 and Azure SQL now allow all seven native T‑SQL regex functions to accept varchar(max) and nvarchar(max) inputs up to 2 MB. This removes the historic 8 KB limit, eliminates chunk‑splitting code, and makes pattern‑matching on logs, HTML, JSON and other large blobs a fully in‑engine operation.

What changed

Microsoft shipped native regular‑expression support in SQL Server 2025 and Azure SQL, but the initial GA release only allowed large‑object (LOB) inputs for three scalar functions – REGEXP_LIKE, REGEXP_COUNT and REGEXP_INSTR. The remaining four functions (REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE) required a non‑LOB string, which forced developers to truncate data with LEFT(...,8000) or to pre‑process payloads in the application tier.

Cumulative Update 5 (CU5) for SQL Server 2025 closes that gap. All seven regex functions now accept varchar(max) and nvarchar(max) arguments up to 2 MB per call. The change is already live in Azure SQL Database, Azure SQL Managed Instance (Always‑up‑to‑date), and SQL Database in Fabric. The only remaining limitation is an 8 KB ceiling on the pattern itself, which is generous for any maintainable expression.

Key point – The 2 MB limit applies to the input string of a single function call, not to the column definition. Columns can still store up to 2 GB; a regex evaluation simply cannot consume more than 2 MB of that value.


Provider comparison

Feature Azure SQL (PaaS) SQL Server 2025 on‑prem / VM
LOB support for all regex functions ✅ (GA) ✅ (CU5)
Update delivery model Rolling regional rollout, always‑up‑to‑date policy Manual CU download or automatic patching via WSUS/ConfigMgr
Compatibility level requirement 170 for TVFs (REGEXP_MATCHES, REGEXP_SPLIT_TO_TABLE) – can be relaxed with ALLOW_BUILTIN_TVF_IN_ALL_COMPAT_LEVELS preview Same requirement; can be set per database with ALTER DATABASE … SET COMPATIBILITY_LEVEL = 170
Pricing impact No extra compute charge; regex runs on the same DTU/vCore pool as the rest of the query No extra licensing cost; same as existing SQL Server license
Migration considerations Ensure databases are on the latest service tier and have the Always‑up‑to‑date policy enabled. Verify regional rollout status via the Azure portal. Apply CU5 (or later) to every instance that runs regex workloads. Test compatibility level changes in a staging environment before promoting to production.

Why the provider differences matter

  • Azure SQL delivers the feature instantly to any database that opts‑in to the Always‑up‑to‑date policy. Organizations that already use this policy see the capability without any downtime.
  • On‑prem environments must schedule a CU rollout, which may involve a brief restart window. However, the on‑prem model gives tighter control over the exact CU version, useful for regulated industries that need to lock the stack.
  • Both platforms share the same underlying RE2 engine, so query plans, performance characteristics, and result semantics are identical. This makes code portable across cloud and on‑prem.

Business impact

1. Eliminates data‑movement overhead

Before CU5, processing a 1‑MB log file required either:

  • Splitting the blob into 8 KB chunks in T‑SQL (LEFT(...,8000)) and stitching results client‑side, or
  • Pulling the column into an application, running a regex library, and writing the transformed data back.

Both approaches added latency, increased network traffic, and complicated error handling. With LOB‑enabled regex, the entire operation runs inside the engine, leveraging the same parallelism and memory management that SQL Server already provides.

2. Reduces reliance on CLR or external services

Many shops built custom CLR assemblies for pattern matching or called out to Azure Functions for log parsing. Those solutions introduced security surface area, version‑management headaches, and additional cost. Native REGEXP_REPLACE or REGEXP_MATCHES now handle the same scenarios, simplifying the architecture and lowering the total cost of ownership.

3. Enables set‑based analytics on massive text blobs

The TVFs REGEXP_MATCHES and REGEXP_SPLIT_TO_TABLE turn a megabyte‑scale document into a relational result set in a single query. This opens up use cases such as:

  • Security monitoring – extract every IP address, email, or credit‑card‑like token from web‑server logs and feed the result directly into a threat‑intel table.
  • Content compliance – scan large HTML or JSON payloads for prohibited phrases and flag rows with a CHECK constraint that uses REGEXP_LIKE.
  • Data lake ingestion – shred raw event files stored in a varchar(max) column into normalized rows without leaving SQL Server, reducing ETL pipeline complexity.

4. Predictable performance and safety

The RE2 engine guarantees linear‑time execution and prevents catastrophic backtracking (ReDoS). When processing a 2 MB log, the engine’s memory usage grows proportionally to the input size, and the query planner can estimate costs accurately. This predictability is essential for capacity planning in high‑throughput environments.

5. Migration checklist

  1. Verify compatibility level – Run SELECT name, compatibility_level FROM sys.databases WHERE name = DB_NAME();. If the level is below 170, upgrade with ALTER DATABASE <db> SET COMPATIBILITY_LEVEL = 170; (or enable the preview config).
  2. Update to CU5 or later – For on‑prem, download the latest cumulative update from the SQL Server 2025 release page. For Azure, confirm the Always‑up‑to‑date policy is active.
  3. Run a smoke test – Execute a simple SELECT REGEXP_REPLACE(RawPayload, '\d+', 'X') FROM dbo.LogEntries; on a representative LOB row to ensure the engine accepts the input size.
  4. Adjust existing code – Remove any LEFT(...,8000) or CLR calls. Replace them with the appropriate native function.
  5. Performance baseline – Capture SET STATISTICS IO, TIME ON; before and after the change to quantify I/O reduction and latency improvement.

Quick reference table

Function Returns LOB support (CU5) Typical use case
REGEXP_LIKE Boolean predicate Filter rows in WHERE, CHECK, or CASE
REGEXP_COUNT int Count occurrences of a pattern
REGEXP_INSTR int Locate the N‑th match position
REGEXP_REPLACE varchar(max) / nvarchar(max) Redact PII, normalize data
REGEXP_SUBSTR varchar(max) / nvarchar(max) Extract a single token
REGEXP_MATCHES (TVF) Table (match_id, start_position, end_position, match_value, substring_matches) Set‑based extraction of all matches and capture groups
REGEXP_SPLIT_TO_TABLE (TVF) Table (value, ordinal) Break a LOB into rows by delimiter

Closing thoughts

Native regex was already a major productivity boost when it arrived in SQL Server 2025. CU5 removes the last barrier: developers can now run any of the seven functions against real‑world LOB data without hitting the 8 KB ceiling. The result is fewer moving parts, lower latency, and a safer execution model.

If your workloads regularly exceed the 2 MB limit, consider preprocessing at ingestion time—store each logical unit (log line, document section, etc.) as a separate row. This keeps every regex call comfortably under the limit and preserves the set‑based advantage.

For the latest details, see the official documentation:

Take the opportunity to replace ad‑hoc grep scripts with a single, in‑engine query—your analysts and security teams will thank you.

Comments

Loading comments...