#Python

mssql-python 1.5: Apache Arrow, sql_variant, and Native UUIDs

Cloud Reporter
3 min read

Microsoft's official Python driver for SQL Server gains Apache Arrow fetch support, sql_variant type handling, and native UUIDs, delivering significant performance improvements for data workflows.

Microsoft has released mssql-python 1.5.0, the latest version of its official Python driver for SQL Server, Azure SQL Database, and SQL databases in Fabric. This update delivers three major capabilities that significantly enhance data workflows: Apache Arrow fetch support for high-performance data processing, first-class sql_variant type support, and native UUID handling.

Apache Arrow Integration Changes Data Workflows

The most transformative addition in this release is Apache Arrow fetch support. For teams working with pandas, Polars, DuckDB, or any Arrow-native data framework, this changes how data moves from SQL Server to Python applications.

Traditional fetch methods like fetchall() convert every database value through Python objects, creating overhead and memory copies. The new Arrow fetch API returns query results as native Apache Arrow structures using the Arrow C Data Interface for zero-copy handoff directly from the C++ layer to Python.

Three methods provide flexibility for different workflows:

  • cursor.arrow() fetches the entire result set as a PyArrow Table, enabling direct conversion to pandas or Polars with zero-copy where possible
  • cursor.arrow_batch() fetches single RecordBatches of specified size for fine-grained memory control
  • cursor.arrow_reader() returns a streaming RecordBatchReader that integrates directly with frameworks accepting readers

Under the hood, the Arrow integration is built directly into the C++ pybind11 layer. When you call any Arrow fetch method, the driver allocates columnar Arrow buffers based on the result set schema, fetches rows from SQL Server in batches using bound column buffers, converts and packs values directly into Arrow columnar format, and exports the result via the Arrow C Data Interface as PyCapsule objects.

Every SQL Server type maps to the appropriate Arrow type: INT to int32, BIGINT to int64, DECIMAL(p,s) to decimal128(p,s), DATE to date32, TIME to time64[ns], DATETIME2 to timestamp[us], UNIQUEIDENTIFIER to large_string, VARBINARY to large_binary, and so on. LOB columns are handled transparently by falling back to row-by-row GetData fetching while still assembling the result into Arrow format.

This feature was contributed by @ffelixg, representing a substantial contribution spanning the C++ pybind layer, Python cursor API, and comprehensive tests.

sql_variant Type Support

SQL Server's sql_variant type stores values of various data types in a single column, commonly used in metadata tables, configuration stores, and EAV (Entity-Attribute-Value) patterns. Version 1.5 adds full support for reading sql_variant values with automatic type resolution.

The driver reads the inner type tag from the sql_variant wire format and returns the appropriate Python type. All 23+ base types are supported, including int, float, Decimal, bool, str, date, time, datetime, bytes, uuid.UUID, and None.

Native UUID Handling

Previously, UNIQUEIDENTIFIER columns were returned as strings, requiring manual conversion to uuid.UUID. Version 1.5 changes the default: UUID columns now return native uuid.UUID objects. This eliminates the need for manual conversion and enables direct use of UUIDs as input parameters.

For migration compatibility, you can opt out at three levels: module level (affects all connections), connection level (affects all cursors on this connection), or cursor level. When native_uuid=False, UUID columns return strings as before.

Additional Improvements

The release includes several bug fixes: qmark false positive detection in SQL comments and string literals, NULL VARBINARY parameter mapping, bulk copy authentication issues, credential cache management, and datetime.time microseconds handling.

The Row class is now publicly exported from the top-level mssql_python module, making it easy to use in type annotations and isinstance checks.

Getting Started

You can install the updated driver with pip install --upgrade mssql-python. Documentation is available at github.com/microsoft/mssql-python/wiki, with release notes at github.com/microsoft/mssql-python/releases and the roadmap at github.com/microsoft/mssql-python/blob/main/ROADMAP.md.

This release represents a significant step forward for Python data workflows with SQL Server, particularly for teams leveraging modern data frameworks that benefit from Arrow's columnar format and zero-copy data transfer.

Comments

Loading comments...