Part nine — the technical map. Having traced who uses Splink and what sits beside it, this answers the supply-chain question: every app, library, backend and system genuinely tied to Splink — and, fenced off so the map can’t be misread, the rivals that are not.
↗ Read the full ecosystem map — 75 items, every primary source (GitHub)
Broad reach, narrow dependency
Splink is downloaded roughly 890,000 times a month (~19 million all-time), and GitHub’s “Used by” graph claims ~110 dependent repositories. But that number is an upper bound that flatters the truth: it conflates lockfile pins, passing mentions and competitor benchmarks. The single most-starred “dependent,” goldenmatch, has no Splink dependency at all — it benchmarks against Splink and markets itself as faster. The genuinely load-bearing core is tiny.
• Core — 4: the Splink library, its Fellegi–Sunter engine, the EM trainer, and the built-in charts.
• Official MoJ companions — ~10: splink_graph, splink_datasets, splink_udfs, the cluster-studio and comparison-viewer dashboards, uk_address_matcher, and an in-browser WASM build. All built by the same team that made Splink.
• Backends — it runs on DuckDB, Apache Spark, AWS Athena, PostgreSQL and SQLite (plus runtime libraries sqlglot, Altair, igraph).
• Genuine third-party productised wrappers — exactly 2: Databricks ARC (now deprecated) and Antigranular’s differential-privacy op_splink. That is the entire commercial-integration surface.
The real surface is government, not software
The most consequential thing tied to Splink is not other code — it is government systems consuming its output. At least eight: MoJ’s JustLink, the Splink Master Record, the real-time Core Person Record pilot, the North Essex probation–to–police arrest-detection dashboard, the ADR UK / MoJ Data First research datasets, ONS’s Business Index and its 2021 Census → NHS PDS linkage, and NHS England’s in-development linkage service. The ecosystem’s weight sits there — in the operational use of a statistics tool on named people — exactly the thread this series has pulled from the start.
What is NOT tied to Splink
Because “entity resolution” roundups lump everything together, the disambiguation matters. None of these embeds, wraps or depends on Splink — they are rivals governments and firms use instead: US Census BigMatch, StatCan G-Link, AIHW DALI (which AIHW reportedly considered replacing with Splink — confirming it’s a competitor), ChoiceMaker, FEBRL, Datavant, Match*Pro, dedupe, Zingg, Quantexa (£175m of HMRC work), and Palantir Foundry. And the pure name-collisions: splink.io, a Dublin payments app, and Splunk, the log-analytics product — neither has anything to do with the MoJ tool.
Splink and the Python recordlinkage toolkit both reuse FEBRL’s bundled demo datasets — shared sample data, not a code dependency. That does not make FEBRL “tied to” Splink. The distinction is the whole point of mapping it honestly.
Method: 6 ecosystem segments researched against primary sources (the moj-analytical-services GitHub org, the Splink docs, PyPI, gov.uk), then adversarially verified — which is how the inflated “110 dependents” and the mis-tagged goldenmatch were caught and corrected. 75 items, every one tiered and sourced. Full map, low-confidence flags and sources: GitHub.