A program‑integrity analyst shares six SQL‑based detection patterns—velocity, impossible travel, amount anomalies, suspicious merchants, off‑hours activity, and reusable window‑function scaffolding—explaining how each works, how to tune them, and what limitations to expect.
What’s claimed
The author argues that most fraud detection on transaction logs can be done with pure SQL, not with machine‑learning black boxes or graph databases. Six reusable query patterns are presented, each targeting a different fraud shape. The claim is that, once these patterns are in place, analysts can iterate on new hypotheses quickly and keep the alert volume manageable.
What’s actually new
The post does not introduce a novel algorithm; it packages a set of well‑known analytical tricks into a pragmatic workflow that can be dropped into any warehouse that supports standard window functions. The contribution is the practical orchestration of these tricks:
- Velocity – counts of transactions per cardholder in sliding time windows (1 min, 5 min, 1 h). Uses
COUNT(*) OVER (PARTITION BY … RANGE BETWEEN …)and aQUALIFYclause for Snowflake/BigQuery, or a CTE for Postgres. - Impossible travel – compares successive locations with a haversine distance function and flags pairs that would require speeds > 600 mph. The query shows how to compute speed on‑the‑fly and adjust the threshold for ground travel.
- Amount anomalies – looks for round‑number amounts that sit just below regulatory or merchant limits (e.g., $99.99, $499.99, $1, $5, $10). The pattern exploits the fact that fraudsters often stay under known caps.
- Suspicious merchants – detects spikes in the number of distinct cards seen at a merchant by comparing the current hour’s count to a 7‑day rolling average (
AVG(...) OVER (PARTITION BY merchant_id ORDER BY hour_bucket ROWS BETWEEN 168 PRECEDING AND 1 PRECEDING)). A three‑times‑baseline rule is suggested. - Off‑hours – builds a per‑cardholder “normal hours” profile from at least two transactions per hour over the last 90 days, then flags any transaction outside that window. The query demonstrates how to filter out one‑off outliers.
- Window‑function scaffolding – a single SELECT that materialises useful columns (
time_since_last,merchant_change,running_24h_total,tx_of_day). Subsequent fraud rules become simple Boolean filters on this materialised view, dramatically shortening the development cycle.
The article also supplies concrete SQL snippets for Snowflake/BigQuery/Databricks, Postgres, and generic ANSI SQL, plus a short note on implementing a haversine function when the warehouse does not provide one.
Limitations and practical concerns
| Pattern | Main limitation | Mitigation |
|---|---|---|
| Velocity | High false‑positive rate for legitimate bulk reloads (e.g., vending‑machine operators). | Maintain a whitelist and tune thresholds per use‑case. |
| Impossible travel | Requires reliable location data; GPS errors or delayed batch loads can produce spurious flags. | Use a conservative speed threshold (e.g., 600 mph) and combine with other signals. |
| Amount anomalies | Works best for card‑testing scenarios; benefits‑type programs may not exhibit round‑number behaviour. | Complement with duplicate‑recipient checks for benefits data. |
| Suspicious merchants | Static thresholds ignore merchant size; a busy retailer can trigger alerts. | Compare against a rolling baseline per merchant, as shown, and adjust the multiplier (e.g., 3×). |
| Off‑hours | Needs sufficient history; new accounts have no baseline. | Fall back to global hour‑of‑day distributions or suspend this rule for accounts younger than a few weeks. |
| Window‑function scaffolding | Window functions over large partitions are expensive. | Filter the date range first, materialise intermediate tables, and consider partitioning by day or month. |
Additional operational concerns mentioned:
- NULL handling – legacy systems often use sentinel dates; always check the schema before using
IS NULL. - Human review – no single rule should auto‑block; combine signals into a score and route borderline cases to analysts.
- Privacy – ensure queries comply with PII policies; work on de‑identified or sampled data when possible.
- Cost – large scans can exhaust warehouse credits; keep windows tight and prune early.
Putting it together
The recommended workflow is to run all five detection patterns in parallel, assign each transaction a binary flag per pattern, and compute a simple fraud score (e.g., sum of flags). Transactions hitting three or more flags are highly likely to be fraudulent, while single‑flag cases merit manual review. Once the scaffolding query (pattern 6) is materialised, adding new hypotheses reduces to a few WHERE clauses, cutting the iteration time from weeks to hours.
Take‑away
While the individual queries are straightforward, the real value lies in standardising a reusable SQL toolkit that can be adapted to any transaction‑log schema. By treating window functions as building blocks, analysts can prototype, test, and deploy new fraud rules without involving engineering resources, keeping the detection pipeline both fast and cost‑effective.
The author, Fixel Smith, is a program‑integrity analyst in the public sector. For more examples or custom advice, visit fixelsmith.com.
Comments
Please log in or register to join the discussion