Why bank reconciliation becomes an NP-hard search problem
A rigorous model for reconciliation as bipartite matching plus constrained set packing, and the hybrid heuristic that makes it practical.
Bank reconciliation looks like a matching problem until you try to make it correct.
If every bank movement had a perfect ledger counterpart, the solution would be boring: build an index, join on a stable key, and move on. The real system is harder. Payments arrive late, fees are netted out, one transfer settles several invoices, several transfers settle one invoice, descriptions are inconsistent, and external systems disagree about when the money “happened.”
That changes the algorithmic shape of the problem. The hard part is not scoring one pair. The hard part is choosing a consistent set of explanations, where each record can be used once, group sums have to balance, and every accepted match has accounting consequences.
The boundary: easy matching vs hard reconciliation
There are three different problems hiding under the word “match.”
- Exact join
- Reference IDs are trustworthy. Use a symbol table or hash table and the problem is basically lookup.
- One-to-one fuzzy match
- Build a weighted bipartite graph between bank rows and ledger rows, then choose non-overlapping edges.
- Grouped reconciliation
- Generate candidate subsets on both sides, score each explanation, then choose disjoint explanations.
The first two are not the NP-hard core. Exact lookup is an indexing problem. One-to-one fuzzy matching is a graph optimization problem: bank records on one side, ledger records on the other, weighted edges for plausible matches. If the business rule is “each record can match at most one record,” then the right tool is a maximum-weight bipartite matching, assignment, or min-cost-flow formulation. Those are polynomial-time families.
The difficulty arrives when one record can explain many records, or many records can jointly explain one. At that point, candidate generation contains a subset-sum-style question:
Which small subset of ledger entries sums to this bank amount, after fees, tolerances, and date constraints?
Then candidate selection becomes a set-packing-style question:
Which high-scoring explanations can we accept without reusing any bank or ledger row?
That is where the combinatorial explosion lives. A candidate explanation is no longer an edge; it is a hyperedge that may touch several records. Selecting the best disjoint hyperedges is a constrained combinatorial optimization problem. In the general case, it is NP-hard.
| Reconciliation shape | Useful model | Practical status |
|---|---|---|
| Exact reference match | Hash table / symbol table | Fast and deterministic |
| Fuzzy one-to-one | Weighted bipartite matching | Tractable after blocking |
| One-to-many / many-to-one | Bounded subset search | Exponential without caps and pruning |
| Many-to-many grouped selection | Weighted set packing / hypergraph | NP-hard in general; heuristic in production |
What the algorithms book gives us
I used the generated index for Sedgewick and Wayne’s Algorithms, 4th Edition as the local reference point. The book does not hand us “bank reconciliation” as a named algorithm, and it does not cover a full Hungarian or min-cost-flow solver. What it does give is the toolkit for making the production system credible:
- Analysis of Algorithms gives the discipline: identify the problem size, model the cost, measure it, and validate the model against real data.
- Hash tables and symbol tables give the indexing layer: turn candidate discovery from a global scan into keyed lookup.
- Bipartite graphs give the right mental model for the one-to-one candidate graph.
- Connected components and union-find give a way to split a large candidate graph into independent subproblems.
- Priority queues and indexed priority queues give the engine for best-first search, beam search, and “process the most promising unresolved state next.”
- Shortest-path thinking gives the discipline of relaxation: keep the best known state, improve it when a better path appears, and avoid revisiting work that cannot improve the answer.
The result is not one textbook algorithm. It is a hybrid solver built from standard algorithmic parts.
The production strategy
The best practical strategy is staged. Do not throw every row into a generic optimizer. Use accounting constraints to reduce the search space before the expensive work starts.
1. Normalize and index
Normalize amounts, currencies, account IDs, posting dates, value dates, counterparties, reference numbers, and description tokens. Then build several indices, not one:
type BlockKey = {
account: string;
currency: string;
counterpartyKey?: string;
amountBand?: string;
dateWindow: string;
referenceToken?: string;
};
const byReference = new Map<string, LedgerRecord[]>();
const byCounterpartyDate = new Map<string, LedgerRecord[]>();
const byAmountBand = new Map<string, LedgerRecord[]>();
This is the hash-table move: spend memory so candidate lookup is cheap. It is also the first correctness move, because a block key should encode “could possibly match,” not “looks convenient to compare.”
2. Build a sparse candidate graph
For each bank row, query only the relevant blocks and score plausible ledger rows. The score should combine hard gates and soft evidence:
- hard gates: currency, account, legal date window, posted/open status;
- numeric evidence: amount difference, fee model, tolerance, rounding behavior;
- text evidence: reference token, counterparty similarity, invoice IDs;
- operational evidence: source system, settlement batch, reversal markers.
The output is a sparse weighted bipartite graph, not a dense n * m matrix.
Most rows should never be compared.
function candidatesFor(bank: BankRecord): CandidateEdge[] {
const pool = union(
byReference.get(bank.referenceToken),
byCounterpartyDate.get(counterpartyDateKey(bank)),
byAmountBand.get(amountBandKey(bank)),
);
return [...pool]
.filter((ledger) => passesHardGates(bank, ledger))
.map((ledger) => ({ bank, ledger, score: scorePair(bank, ledger) }))
.filter((edge) => edge.score >= MIN_PAIR_SCORE);
}
3. Split the graph into components
Once candidate edges exist, compute connected components. If bank row b1 only
touches ledger rows l1 and l2, and those ledger rows only touch b1, that is
an independent subproblem. A different component can be solved separately.
This matters because worst-case complexity is brutal, but real financial data is not usually one giant connected graph. Good blocking turns a global problem into many small local problems.
| Component size | Solver choice |
|---|---|
| Exact key / amount pair | Accept deterministically |
| Small one-to-one component | Exact weighted matching |
| Small grouped component | Bounded subset search, then exact set selection |
| Medium ambiguous component | Beam search plus local repair |
| Large low-confidence component | Split by stronger keys or defer to human review |
4. Solve the easy part exactly
Take deterministic matches first:
- same external reference;
- same amount after known fee rule;
- same currency and account;
- no competing candidate above the confidence margin.
Then run exact one-to-one matching inside clean components. This is where a Hungarian-style assignment or min-cost-flow solver makes sense. It gives a principled answer for the tractable part of the problem.
The important constraint: do not let one-to-one matching pretend to solve grouped settlement. That is how “impressive” demos become wrong accounting.
5. Generate grouped explanations with bounded search
For unresolved records, generate candidate groups. This is the dangerous part. Without limits, searching all subsets is exactly how a minutes-long job becomes a centuries-long job.
Use bounds:
- maximum group size, usually small and domain-specific;
- maximum date spread;
- amount tolerance based on currency, fee model, and rounding;
- counterparty or reference-token compatibility;
- cap on candidates emitted per anchor record.
Useful generation strategies:
- meet-in-the-middle for small subset-sum windows;
- dynamic programming when amounts can be represented as bounded integer cents and the range is narrow;
- branch-and-bound when partial sums already exceed the possible amount window;
- beam search when metadata score matters as much as amount equality.
type Explanation = {
bankIds: string[];
ledgerIds: string[];
amountResidual: number;
score: number;
reasons: string[];
};
function generateGroupedExplanations(component: Component): Explanation[] {
const queue = new MaxPriorityQueue<SearchState>((s) => s.upperBound);
queue.enqueue(seed(component));
const out: Explanation[] = [];
while (!queue.isEmpty() && out.length < MAX_EXPLANATIONS) {
const state = queue.dequeue();
if (state.upperBound < MIN_GROUP_SCORE) continue;
if (isBalanced(state)) out.push(toExplanation(state));
for (const next of expand(state)) {
if (withinHardBounds(next)) queue.enqueue(next);
}
}
return out;
}
The priority queue is doing real work here. It keeps the most promising partial explanations at the front and lets weak branches die early. That is a heuristic, not a proof of optimality, but it is the right kind of heuristic: constrained, measurable, and explainable.
6. Select disjoint explanations
Now the problem is set packing. Each explanation consumes some bank IDs and some ledger IDs. We want the highest-value subset with no overlaps.
For production, I would use a hybrid:
- Accept only explanations above a strict precision threshold.
- Prefer explanations with a large confidence margin over their nearest rival.
- Use greedy selection for obvious winners.
- Run local repair: swap one accepted explanation for two better non-overlapping explanations, or remove a weak explanation that blocks a strong one.
- Use exact branch-and-bound only for small components.
- Leave ambiguous residue for review.
That last step is not failure. It is the safety valve. In finance, the system should be allowed to say, “I cannot reconcile this automatically without creating risk.”
Why plain greedy is not enough
A naive greedy matcher sorts candidate pairs by score and takes the first one that does not conflict. That works when the score distribution has obvious winners. It fails when the best local pair blocks a better group.
Example:
| Candidate | Uses | Score |
|---|---|---|
| A | b1 + l1 | 0.94 |
| B | b2 + l2 | 0.93 |
| C | b1 + b2 + l1 + l2 | 0.99 |
If the accounting truth is grouped, taking A first can make C impossible. Greedy is acceptable only with margin rules and repair passes. Otherwise it turns high local confidence into global inconsistency.
What “minutes” really means
The performance improvement is not one trick. It is a cascade of reductions:
-
1
Blocking / indexing
-
2
Connected components
-
3
Exact deterministic
-
4
One-to-one matching
-
5
Bounded group search
-
6
Abstention
| Stage | Without the stage | With the stage |
|---|---|---|
| Blocking/indexing | Global pair scan | Sparse candidate retrieval |
| Connected components | One massive optimization problem | Many independent local problems |
| Exact deterministic pass | Solver wastes time on obvious rows | Obvious rows disappear immediately |
| One-to-one matching | Fuzzy pairs mixed with groups | Tractable matching solved cleanly |
| Bounded group search | Exponential subset explosion | Small, scored explanation set |
| Abstention | Wrong forced matches | Human review only where ambiguity remains |
This is also why the output can be trusted. The solver is not pretending every case is equally knowable. It separates:
- accepted matches, with reasons and idempotent write keys;
- candidate explanations, with scores and conflicting alternatives;
- unresolved residue, with enough context for a human to clear it.
The write path matters too
Algorithmic speed is worthless if a retry double-posts a settlement. Every accepted explanation needs an idempotency key built from the records it consumes and the rule version that accepted it.
function idempotencyKey(explanation: Explanation, ruleVersion: string) {
return hashStable({
bankIds: [...explanation.bankIds].sort(),
ledgerIds: [...explanation.ledgerIds].sort(),
ruleVersion,
});
}
That makes reruns safe. It also makes audits possible: given a posted match, we can reconstruct why the system believed it, what records it consumed, and which rule version made the decision.
The strategy I would ship
The practical solver is:
- Normalize records and build multiple hash-backed indices.
- Generate a sparse candidate graph from hard accounting gates and soft similarity scores.
- Split the graph into connected components.
- Accept deterministic exact matches.
- Use exact bipartite matching for clean one-to-one components.
- Use bounded subset search to generate grouped explanations.
- Select non-overlapping explanations with greedy margins, local repair, and exact search on small components.
- Defer ambiguous residue instead of forcing low-confidence matches.
- Apply accepted matches through idempotent writes.
That is how an intractable reconciliation job becomes operationally routine. Not because the hard problem disappeared, but because the system stops asking the hard question everywhere at once.
The model is the leverage: index aggressively, isolate independent components, solve the tractable subproblems exactly, search the NP-hard boundary only inside small bounded windows, and make every uncertain case visible instead of dangerous.