The Splink Ecosystem: Every App Tied to It

Part nine — the technical map. Having traced who uses Splink and what sits beside it, this answers the supply-chain question: every app, library, backend and system genuinely tied to Splink — and, fenced off so the map can’t be misread, the rivals that are not.

Core Splink

Splink library (v4)Fellegi–Sunter engineEM trainingBuilt-in charts / visualisations

Official MoJ companions

splink_datasetssplink_graphsplink_udfs (DuckDB)splink_demoscluster_studiocomparison_vieweruk_address_matcherlive_splink (in-browser WASM)

Backends it runs on

DuckDBApache SparkAWS AthenaPostgreSQLSQLite+ sqlglot · Altair · igraph · Jinja2

Dependents / forks

~110 GitHub 'Used by' (upper bound)op_splink (Antigranular)healmatcherTuva EMPIentifysplink_testing

UK government consumers

Data FirstSplink Master RecordJustLinkCore Person RecordNorth Essex PNC→police dashboardONS Business IndexONS 2021 Census → NHS PDSNHS England linkage (in dev)

Third-party productised (only 2)

Databricks ARC (deprecated)Antigranular op_splink

Lineage (what it descends from)

fastLink (R)Fellegi–Sunter (1969)'sparklink' (old name)

NOT tied to Splink — rivals used instead, & name collisions

BigMatch (US Census)G-Link (StatCan)DALI (AIHW)ChoiceMaker (CHeReL)WA DLS3FEBRLDatavantMatch*Prodedupe / ZinggrecordlinkageQuantexaPalantir FoundrySenzing / Tamr / Reltiogoldenmatch (benchmarks against)splink.io payments appSplunk (name only)

The Splink ecosystem: broad reach, narrow genuine dependency, concentrated in UK government. ~890k downloads/month, but only a handful of official companions, two real third-party wrappers, and ~8 government systems consuming its output. Everything in the red box is a rival used instead — not built on Splink.

↗ Read the full ecosystem map — 75 items, every primary source (GitHub)

Broad reach, narrow dependency

Splink is downloaded roughly 890,000 times a month (~19 million all-time), and GitHub’s “Used by” graph claims ~110 dependent repositories. But that number is an upper bound that flatters the truth: it conflates lockfile pins, passing mentions and competitor benchmarks. The single most-starred “dependent,” goldenmatch, has no Splink dependency at all — it benchmarks against Splink and markets itself as faster. The genuinely load-bearing core is tiny.

• Core — 4: the Splink library, its Fellegi–Sunter engine, the EM trainer, and the built-in charts.

• Official MoJ companions — ~10: splink_graph, splink_datasets, splink_udfs, the cluster-studio and comparison-viewer dashboards, uk_address_matcher, and an in-browser WASM build. All built by the same team that made Splink.

• Backends — it runs on DuckDB, Apache Spark, AWS Athena, PostgreSQL and SQLite (plus runtime libraries sqlglot, Altair, igraph).

• Genuine third-party productised wrappers — exactly 2: Databricks ARC (now deprecated) and Antigranular’s differential-privacy op_splink. That is the entire commercial-integration surface.

The real surface is government, not software

The most consequential thing tied to Splink is not other code — it is government systems consuming its output. At least eight: MoJ’s JustLink, the Splink Master Record, the real-time Core Person Record pilot, the North Essex probation–to–police arrest-detection dashboard, the ADR UK / MoJ Data First research datasets, ONS’s Business Index and its 2021 Census → NHS PDS linkage, and NHS England’s in-development linkage service. The ecosystem’s weight sits there — in the operational use of a statistics tool on named people — exactly the thread this series has pulled from the start.

What is NOT tied to Splink

Because “entity resolution” roundups lump everything together, the disambiguation matters. None of these embeds, wraps or depends on Splink — they are rivals governments and firms use instead: US Census BigMatch, StatCan G-Link, AIHW DALI (which AIHW reportedly considered replacing with Splink — confirming it’s a competitor), ChoiceMaker, FEBRL, Datavant, Match*Pro, dedupe, Zingg, Quantexa (£175m of HMRC work), and Palantir Foundry. And the pure name-collisions: splink.io, a Dublin payments app, and Splunk, the log-analytics product — neither has anything to do with the MoJ tool.

Splink and the Python recordlinkage toolkit both reuse FEBRL’s bundled demo datasets — shared sample data, not a code dependency. That does not make FEBRL “tied to” Splink. The distinction is the whole point of mapping it honestly.

Method: 6 ecosystem segments researched against primary sources (the moj-analytical-services GitHub org, the Splink docs, PyPI, gov.uk), then adversarially verified — which is how the inflated “110 dependents” and the mis-tagged goldenmatch were caught and corrected. 75 items, every one tiered and sourced. Full map, low-confidence flags and sources: GitHub.

← PreviousThe Joined-Up State

The Investigations — one report, every part

Start here The Investigations (index) · The Whole Picture (capstone)

The data state The Splink Investigation · Palantir: The Global Footprint · The US Database · The City of London

The influence machine The Parscale Investigation · The First Wave · The Complete Record · Foreign Agent for Israel · The Machine Goes South · Media kit

← All Writing

The Splink Ecosystem: Every App Tied to the Tool