Building Scalable Geographic Search: LINE's Approach to Location-Based Services

LINE's engineering team shared their battle-tested strategies for handling geographic data at scale, covering everything from K-D trees and geohashing to the trade-offs between different spatial indexing approaches.

LINE's SPOT team recently shared their experiences building location-based services that handle real-world geographic queries at scale. The presentation, delivered at their third developer meetup by Julian Shen, provides a practical look at the challenges of finding nearby points of interest and the caching strategies that make it feasible.

The Core Problem: Geographic Queries Don't Scale Linearly

When users search for "restaurants within 3 kilometers," the naive approach quickly becomes computationally expensive. You can't simply scan through millions of database records and calculate distances to every single point. The math alone—computing haversine distances between coordinate pairs—becomes a bottleneck.

The fundamental issue is that geographic data lives on a sphere (Earth), but most indexing structures assume flat planes. This mismatch creates edge cases and performance problems that compound at scale.

K-D Trees: Divide and Conquer

LINE's team introduced K-D trees as their initial spatial indexing strategy. A K-D tree recursively partitions the map plane by alternating between latitude and longitude at each level of the tree. This creates a binary space partitioning that allows for efficient nearest-neighbor searches.

The key insight is that you don't need to check every point. Once you've found the closest points in one partition, you can prune entire branches of the tree that are guaranteed to be farther away. This reduces search complexity from O(n) to O(log n) in practice.

However, K-D trees have limitations:

They work best on flat planes, not spheres
Performance degrades with uneven data distribution (dense urban vs. sparse rural areas)
Updates and inserts require tree rebalancing

Caching with Geohashing

To avoid recalculating searches for popular locations, LINE implemented a caching layer using geohashing. The concept is straightforward: convert latitude/longitude coordinates into a string representation that can be used as a cache key.

Geohash works by encoding coordinates into base32 strings. Points close together share similar prefixes, making it easy to find cached results for nearby searches. If someone searches for "restaurants near Taipei 101" and you've already cached that query, subsequent users get instant results.

But geohashing introduces its own problems:

Boundary issues: Two points that are physically adjacent might fall on different sides of a geohash cell boundary, resulting in completely different hash strings. This means a cache miss when there should be a hit.

Similar results problem: The zig-zag traversal pattern used in geohash encoding can produce very similar hash strings for points that are actually far apart, leading to cache pollution.

Resolution trade-offs: Higher precision geohashes mean more granular caching but also more cache entries and less reuse.

Alternative Approaches: S2 and H3

LINE's presentation also covered two industry-standard alternatives that address geohash's limitations:

Google's S2 Geometry Library

S2 uses a Hilbert curve to map the sphere's surface to a 1D space-filling curve. The resulting S2 Cell IDs provide:

Better preservation of spatial locality
Hierarchical cell subdivision (parent-child relationships between cells)
Consistent cell sizes across the sphere

The Hilbert curve's property of keeping nearby points close in the 1D ordering makes S2 particularly effective for spatial queries.

Uber's H3 Grid System

Uber open-sourced H3, which divides the world into hexagonal cells. Hexagons offer several advantages:

More uniform distances from cell centers to edges compared to squares
Each cell has exactly 6 neighbors at the same resolution
Better representation of circular search areas (like 3km radius queries)

H3 provides 15 resolution levels, allowing applications to balance precision with performance. At high resolutions, cells are about 1 square meter; at low resolutions, they can cover entire cities.

Trade-offs in Production

Each approach represents different trade-offs:

K-D Trees:

✅ Fast for static datasets
✅ Memory-efficient for moderate scales
❌ Poor update performance
❌ Doesn't handle spherical geometry natively

Geohash:

✅ Simple to implement
✅ Great for caching
✅ Human-readable (somewhat)
❌ Boundary problems
❌ Non-uniform cell shapes

S2:

✅ Excellent spatial locality
✅ Hierarchical structure
✅ Well-tested (Google Maps)
❌ More complex implementation
❌ Less intuitive than geohash

H3:

✅ Uniform cell sizes
✅ Predictable neighbor relationships
✅ Great for circular queries
❌ Larger ecosystem (C/Java/Python bindings)
❌ Overkill for simple use cases

Real-World Application: LINE Hotspot

LINE uses these techniques to power features like their Hotspot service, which recommends local businesses and attractions. The system needs to:

Quickly find points of interest near a user's location
Cache popular searches to reduce database load
Handle millions of concurrent users
Provide consistent results across Taiwan's dense urban and sparse rural areas

The choice of algorithm depends on the specific query pattern. For "find all restaurants within 3km," H3's hexagonal grid might be optimal. For caching user searches, geohash provides a simple key. For static data analysis, K-D trees offer good performance.

Lessons from the Field

The LINE team's experience highlights that geographic data handling isn't a one-size-fits-all problem. The right solution depends on:

Query patterns (radius vs. nearest neighbor vs. bounding box)
Data distribution (uniform vs. clustered)
Update frequency (static vs. real-time)
Scale (thousands vs. millions of points)

Most production systems end up using a hybrid approach: a spatial index for the initial query, followed by precise distance calculations on the candidate set, with caching layers at multiple levels.

The presentation also emphasized that geographic calculations should always account for the Earth's curvature. Simple Euclidean distance works for small areas but breaks down over longer distances. Libraries like S2 and H3 handle this correctly, while custom implementations often introduce subtle bugs.

Looking Forward

As location-based services continue to grow, the need for efficient geographic data processing becomes more critical. The techniques LINE shared—combining spatial indexing, caching strategies, and appropriate data structures—provide a blueprint for building scalable LBS applications.

For developers working on similar problems, the key takeaway is to match the tool to the problem. Don't default to geohash because it's simple, or H3 because it's trendy. Understand your query patterns, data distribution, and performance requirements, then choose accordingly.

LINE's SPOT team continues to refine their approach, balancing the theoretical elegance of spatial algorithms with the practical demands of serving millions of users in production.

Related Resources:

LINE Developer Community: Join the LINE Developer Official Community account (@line_tw_dev) for updates on future meetups and technical presentations.

#geospatial #spatial-indexing #k-d-tree #geohash #S2 #H3