Why bank reconciliation becomes an NP-hard search problem

19 de junio de 2025 · 10 min de lectura · Ingeniería

A rigorous model for reconciliation as bipartite matching plus constrained set packing, and the hybrid heuristic that makes it practical.

Bank reconciliation looks like a matching problem until you try to make it correct.

If every bank movement had a perfect ledger counterpart, the solution would be boring: build an index, join on a stable key, and move on. The real system is harder. Payments arrive late, fees are netted out, one transfer settles several invoices, several transfers settle one invoice, descriptions are inconsistent, and external systems disagree about when the money “happened.”

That changes the algorithmic shape of the problem. The hard part is not scoring one pair. The hard part is choosing a consistent set of explanations, where each record can be used once, group sums have to balance, and every accepted match has accounting consequences.

Core model Reconciliation is best treated as a weighted search over a candidate graph and then, for grouped settlements, as weighted set packing over candidate explanations. The production goal is not a magical global optimum; it is a high-precision solution that solves the obvious cases automatically and leaves the ambiguous residue small enough for review.

The boundary: easy matching vs hard reconciliation

There are three different problems hiding under the word “match.”

Exact join
Reference IDs are trustworthy. Use a symbol table or hash table and the problem is basically lookup.
One-to-one fuzzy match
Build a weighted bipartite graph between bank rows and ledger rows, then choose non-overlapping edges.
Grouped reconciliation
Generate candidate subsets on both sides, score each explanation, then choose disjoint explanations.

The first two are not the NP-hard core. Exact lookup is an indexing problem. One-to-one fuzzy matching is a graph optimization problem: bank records on one side, ledger records on the other, weighted edges for plausible matches. If the business rule is “each record can match at most one record,” then the right tool is a maximum-weight bipartite matching, assignment, or min-cost-flow formulation. Those are polynomial-time families.

Weighted bipartite matching between bank and ledger rows Bank rows as circles on the left, ledger rows as rounded squares on the right; thin edges are candidate pairs, thick cyan edges are the selected one-to-one matching, a dashed edge is rejected, b4 is left unmatched. [ one edge per node ] b1 b2 b3 b4 l1 l2 l3 l4
chosen matching candidate edge rejected/unmatched
HARD GATES currency account legal date window posted/open
SOFT EVIDENCE amount diff fee model tolerance rounding reference tokens counterparty invoice IDs source system settlement batch reversal
Picking the best set of non-overlapping edges, one per row, is a bipartite matching problem. Hard gates prune the graph; soft evidence weights what survives.

The difficulty arrives when one record can explain many records, or many records can jointly explain one. At that point, candidate generation contains a subset-sum-style question:

Which small subset of ledger entries sums to this bank amount, after fees, tolerances, and date constraints?

Then candidate selection becomes a set-packing-style question:

Which high-scoring explanations can we accept without reusing any bank or ledger row?

That is where the combinatorial explosion lives. A candidate explanation is no longer an edge; it is a hyperedge that may touch several records. Selecting the best disjoint hyperedges is a constrained combinatorial optimization problem. In the general case, it is NP-hard.

Reconciliation shapeUseful modelPractical status
Exact reference matchHash table / symbol tableFast and deterministic
Fuzzy one-to-oneWeighted bipartite matchingTractable after blocking
One-to-many / many-to-oneBounded subset searchExponential without caps and pruning
Many-to-many grouped selectionWeighted set packing / hypergraphNP-hard in general; heuristic in production

What the algorithms book gives us

I used the generated index for Sedgewick and Wayne’s Algorithms, 4th Edition as the local reference point. The book does not hand us “bank reconciliation” as a named algorithm, and it does not cover a full Hungarian or min-cost-flow solver. What it does give is the toolkit for making the production system credible:

  • Analysis of Algorithms gives the discipline: identify the problem size, model the cost, measure it, and validate the model against real data.
  • Hash tables and symbol tables give the indexing layer: turn candidate discovery from a global scan into keyed lookup.
  • Bipartite graphs give the right mental model for the one-to-one candidate graph.
  • Connected components and union-find give a way to split a large candidate graph into independent subproblems.
  • Priority queues and indexed priority queues give the engine for best-first search, beam search, and “process the most promising unresolved state next.”
  • Shortest-path thinking gives the discipline of relaxation: keep the best known state, improve it when a better path appears, and avoid revisiting work that cannot improve the answer.

The result is not one textbook algorithm. It is a hybrid solver built from standard algorithmic parts.

The production strategy

The best practical strategy is staged. Do not throw every row into a generic optimizer. Use accounting constraints to reduce the search space before the expensive work starts.

1. Normalize and index

Normalize amounts, currencies, account IDs, posting dates, value dates, counterparties, reference numbers, and description tokens. Then build several indices, not one:

type BlockKey = {
  account: string;
  currency: string;
  counterpartyKey?: string;
  amountBand?: string;
  dateWindow: string;
  referenceToken?: string;
};

const byReference = new Map<string, LedgerRecord[]>();
const byCounterpartyDate = new Map<string, LedgerRecord[]>();
const byAmountBand = new Map<string, LedgerRecord[]>();

This is the hash-table move: spend memory so candidate lookup is cheap. It is also the first correctness move, because a block key should encode “could possibly match,” not “looks convenient to compare.”

2. Build a sparse candidate graph

For each bank row, query only the relevant blocks and score plausible ledger rows. The score should combine hard gates and soft evidence:

  • hard gates: currency, account, legal date window, posted/open status;
  • numeric evidence: amount difference, fee model, tolerance, rounding behavior;
  • text evidence: reference token, counterparty similarity, invoice IDs;
  • operational evidence: source system, settlement batch, reversal markers.

The output is a sparse weighted bipartite graph, not a dense n * m matrix. Most rows should never be compared.

O(n x m): compare all pairs
n*m = 100 comparisons
Sparse: indexed retrieval
~9 candidates
byReference byCounterpartyDate byAmountBand union pool hard gates score >= MIN_PAIR_SCORE
Three blocking indices (byReference, byCounterpartyDate, byAmountBand) replace the full n x m scan with a sparse candidate set. Most rows are never compared.
function candidatesFor(bank: BankRecord): CandidateEdge[] {
  const pool = union(
    byReference.get(bank.referenceToken),
    byCounterpartyDate.get(counterpartyDateKey(bank)),
    byAmountBand.get(amountBandKey(bank)),
  );

  return [...pool]
    .filter((ledger) => passesHardGates(bank, ledger))
    .map((ledger) => ({ bank, ledger, score: scorePair(bank, ledger) }))
    .filter((edge) => edge.score >= MIN_PAIR_SCORE);
}

3. Split the graph into components

Once candidate edges exist, compute connected components. If bank row b1 only touches ledger rows l1 and l2, and those ledger rows only touch b1, that is an independent subproblem. A different component can be solved separately.

This matters because worst-case complexity is brutal, but real financial data is not usually one giant connected graph. Good blocking turns a global problem into many small local problems.

Connected-component decomposition of the candidate graph The candidate graph splits into five independent clusters with no edges between them; each cluster is labelled with its size and the solver it is routed to, from exact pair matching up to defer-for-review. no edges cross components 2 · exactkey / amount 3 · one-to-one(exact match) 4 · small grouped(bounded subset) 6 · medium ambiguous(beam + repair) 8 · large low-confidence(split / defer)
Each connected component is an independent subproblem. A different component can be solved separately, and the right-sized solver is chosen per cluster.
Component sizeSolver choice
Exact key / amount pairAccept deterministically
Small one-to-one componentExact weighted matching
Small grouped componentBounded subset search, then exact set selection
Medium ambiguous componentBeam search plus local repair
Large low-confidence componentSplit by stronger keys or defer to human review

4. Solve the easy part exactly

Take deterministic matches first:

  • same external reference;
  • same amount after known fee rule;
  • same currency and account;
  • no competing candidate above the confidence margin.

Then run exact one-to-one matching inside clean components. This is where a Hungarian-style assignment or min-cost-flow solver makes sense. It gives a principled answer for the tractable part of the problem.

The important constraint: do not let one-to-one matching pretend to solve grouped settlement. That is how “impressive” demos become wrong accounting.

For unresolved records, generate candidate groups. This is the dangerous part. Without limits, searching all subsets is exactly how a minutes-long job becomes a centuries-long job.

Use bounds:

  • maximum group size, usually small and domain-specific;
  • maximum date spread;
  • amount tolerance based on currency, fee model, and rounding;
  • counterparty or reference-token compatibility;
  • cap on candidates emitted per anchor record.

Useful generation strategies:

  • meet-in-the-middle for small subset-sum windows;
  • dynamic programming when amounts can be represented as bounded integer cents and the range is narrow;
  • branch-and-bound when partial sums already exceed the possible amount window;
  • beam search when metadata score matters as much as amount equality.
type Explanation = {
  bankIds: string[];
  ledgerIds: string[];
  amountResidual: number;
  score: number;
  reasons: string[];
};

function generateGroupedExplanations(component: Component): Explanation[] {
  const queue = new MaxPriorityQueue<SearchState>((s) => s.upperBound);
  queue.enqueue(seed(component));

  const out: Explanation[] = [];
  while (!queue.isEmpty() && out.length < MAX_EXPLANATIONS) {
    const state = queue.dequeue();
    if (state.upperBound < MIN_GROUP_SCORE) continue;
    if (isBalanced(state)) out.push(toExplanation(state));
    for (const next of expand(state)) {
      if (withinHardBounds(next)) queue.enqueue(next);
    }
  }
  return out;
}

The priority queue is doing real work here. It keeps the most promising partial explanations at the front and lets weak branches die early. That is a heuristic, not a proof of optimality, but it is the right kind of heuristic: constrained, measurable, and explainable.

6. Select disjoint explanations

Now the problem is set packing. Each explanation consumes some bank IDs and some ledger IDs. We want the highest-value subset with no overlaps.

For production, I would use a hybrid:

  1. Accept only explanations above a strict precision threshold.
  2. Prefer explanations with a large confidence margin over their nearest rival.
  3. Use greedy selection for obvious winners.
  4. Run local repair: swap one accepted explanation for two better non-overlapping explanations, or remove a weak explanation that blocks a strong one.
  5. Use exact branch-and-bound only for small components.
  6. Leave ambiguous residue for review.

That last step is not failure. It is the safety valve. In finance, the system should be allowed to say, “I cannot reconcile this automatically without creating risk.”

Why plain greedy is not enough

A naive greedy matcher sorts candidate pairs by score and takes the first one that does not conflict. That works when the score distribution has obvious winners. It fails when the best local pair blocks a better group.

Example:

CandidateUsesScore
Ab1 + l10.94
Bb2 + l20.93
Cb1 + b2 + l1 + l20.99

If the accounting truth is grouped, taking A first can make C impossible. Greedy is acceptable only with margin rules and repair passes. Otherwise it turns high local confidence into global inconsistency.

The greedy trap: a high local pair score blocks a higher grouped score Candidate A (b1-l1, 0.94) and B (b2-l2, 0.93) are single pairs; Candidate C groups all four rows for 0.99. Taking A first consumes b1 and l1, making the grouped explanation C impossible. greedy locks b1+l1, so the grouped 0.99 can no longer use them C blocked unreachable after A 0.93 B 0.94 A taken b1 b2 l1 l2 fix: margin rules + repair pass
The highest local pair wins greedily, but taking A first can make C impossible, so the matcher needs margin rules and repair passes.

What “minutes” really means

The performance improvement is not one trick. It is a cascade of reductions:

  1. 1
    Blocking / indexing
    global pair scan sparse candidate retrieval
  2. 2
    Connected components
    one massive problem many local problems
  3. 3
    Exact deterministic
    obvious key/amount rows matched + removed immediately
  4. 4
    One-to-one matching
    fuzzy pairs mixed with groups tractable matching, solved cleanly
  5. 5
    Bounded group search
    exponential subset explosion small scored set
  6. 6
    Abstention
    wrong forced matches human review only on ambiguity
Performance is a cascade of reductions: blocking, components, exact, one-to-one, bounded search, then abstention, each stage removes a class of problems before the expensive solver runs.
StageWithout the stageWith the stage
Blocking/indexingGlobal pair scanSparse candidate retrieval
Connected componentsOne massive optimization problemMany independent local problems
Exact deterministic passSolver wastes time on obvious rowsObvious rows disappear immediately
One-to-one matchingFuzzy pairs mixed with groupsTractable matching solved cleanly
Bounded group searchExponential subset explosionSmall, scored explanation set
AbstentionWrong forced matchesHuman review only where ambiguity remains

This is also why the output can be trusted. The solver is not pretending every case is equally knowable. It separates:

  • accepted matches, with reasons and idempotent write keys;
  • candidate explanations, with scores and conflicting alternatives;
  • unresolved residue, with enough context for a human to clear it.

The write path matters too

Algorithmic speed is worthless if a retry double-posts a settlement. Every accepted explanation needs an idempotency key built from the records it consumes and the rule version that accepted it.

function idempotencyKey(explanation: Explanation, ruleVersion: string) {
  return hashStable({
    bankIds: [...explanation.bankIds].sort(),
    ledgerIds: [...explanation.ledgerIds].sort(),
    ruleVersion,
  });
}

That makes reruns safe. It also makes audits possible: given a posted match, we can reconstruct why the system believed it, what records it consumed, and which rule version made the decision.

The strategy I would ship

The practical solver is:

  1. Normalize records and build multiple hash-backed indices.
  2. Generate a sparse candidate graph from hard accounting gates and soft similarity scores.
  3. Split the graph into connected components.
  4. Accept deterministic exact matches.
  5. Use exact bipartite matching for clean one-to-one components.
  6. Use bounded subset search to generate grouped explanations.
  7. Select non-overlapping explanations with greedy margins, local repair, and exact search on small components.
  8. Defer ambiguous residue instead of forcing low-confidence matches.
  9. Apply accepted matches through idempotent writes.

That is how an intractable reconciliation job becomes operationally routine. Not because the hard problem disappeared, but because the system stops asking the hard question everywhere at once.

The model is the leverage: index aggressively, isolate independent components, solve the tractable subproblems exactly, search the NP-hard boundary only inside small bounded windows, and make every uncertain case visible instead of dangerous.