Skip to content
Hominis Agentic OS — early access program now openJoin the waitlist
RealAI
InsightsPublic Sector

Follow the Money: Budget Sankey and Spend Transparency from Appropriation to Outcome

RealAIFeb 21, 202511 min read
Public SectorResponsible AI
Budget SankeysourceoutcomeBudget Sankey

Every spreadsheet hides a story. A budget line that ships as written. A contract that executes ahead of schedule. A grant that lands in a community account and then vanishes into a network of transfers that no single person can follow. The question that stops every treasury officer is not how much money do we have, but where did it actually go, and who can I explain that to?

A single audit trail from appropriation to outcome. At 30% AI instrumentation, 2 of 5 hops are lit and 25% of spend is traceable toward the outcome; the trail goes dark at the un-instrumented hops where leakage hides. Dragging coverage up lights the dark hops left-to-right and surfaces 0 anomalies; traceable only when the outcome hop lights. opaque.
Exhibit 1Follow the dollar to the outcome.Spend transparency means following a dollar appropriation to outcome — but the trail goes dark at the hops nobody instruments. Drag AI instrumentation and watch the dark hops light up, the traceable-to-outcome share climb past the gate, and anomalies surface as flags.

The Audit Loop That Never Closes

Imagine a municipal government appropriates a large neighborhood-rebuild budget. The mayor's office splits it across infrastructure, social services and workforce development. Months later, the financial controller reconciles the year-end books and discovers the actual spend drifted several points off the plan in every category — a drift nobody flagged when it happened. By the time it surfaces in a post-facto audit, the decisions that caused it are weeks or months old. The corrective action happens after the damage is done.

This is the gap that kills public trust. Not the amount — sometimes the drift is legitimate; demand for social services swelled that quarter — but the latency. A euro should be traceable from the moment it leaves the treasury account through every stop it makes until it lands on a child's desk in a rebuilt school, or in the bank account of a retraining program. When you trace it in real time, the drift becomes visible while you can still course-correct it. When you do not, it becomes the scandal that shows up in the annual report.

The BudgetSankey instrument does this. It ingests the appropriation ledger, the purchase-order system, the payment flows, the transfer logs and the outcome metrics — test scores, employment placement rates, infrastructure completion dates — and renders them into one connected Sankey diagram. Not a visualization you look at once and file away. A live network where every node is a bucket of money and every edge is a transaction. When a budget line underperforms or overspends, the deviation surfaces in the same instant it happens, traceable back to the transaction and forward to the outcome.

The work is not predictive risk scoring — a model that flags potential waste based on historical patterns. It is anomaly pattern surfacing — observing the actual flows against the expected allocation and flagging deviations that warrant investigation. A duplicate invoice triggers not because a neural net learned fraud; it triggers because two line items with the same vendor, the same amount and the same date landed in the same week. An auditor can look at it and decide: legitimate invoice entered twice by accident, or intentional? But they can decide on the day it happens, not at the year-end review.

Process flow · hover a step to trace it
BudgetSankey traces appropriation to outcome

Deviations Without Drama: Flagging What Actually Warrants a Conversation

The treasury receives a payment request for a contractor. The amount is within the contract. The vendor is approved. The approval chain is intact. By every standard process check, it is clean. But it is the tenth payment to that vendor in three weeks, while the contract covers one month of work. That is not a rule violation. It is anomalous. And anomalies are the conversation starters.

The systems that have lived longest in public finance do not predict fraud. They surface patterns. A vendor who is paid every Tuesday without fail, suddenly paid on a Thursday. Not necessarily wrong, but different. A transfer to an agency account that normally receives monthly disbursements, now receiving weekly ones. Not necessarily theft, but unusual, and unusual enough to warrant a conversation with the payments officer.

The BudgetSankey approach splits the difference. It does not classify a transaction as clean or dirty. It surfaces the context — the deviation from pattern, the concentration of spend, the timing anomaly — and lets a human investigator decide. The result is that when auditors look back at the books, they do not see a black-box risk score. They see a log: transaction, flagged transaction, investigator decision and resolution. They can read the logic in plain terms: this payment was flagged because it represented the third identical invoice from the same vendor in one week. The investigator confirmed the contractor was submitting weekly invoices per an approved change order. Approved.

That chain — from observation to decision to record — is what survives a forensic audit. A risk score does not. Because a risk score requires the auditor to trust that the model learned the right patterns. A decision log requires the auditor only to read.

Reading the Allocation and the Actual Spend Against Each Other

Say a school district budgets to reduce class sizes, splitting the appropriation between staffing and facilities expansion. The budget officer allocates it. Classroom hires begin. A new school comes online. By November, the staffing line is tracking a few points below plan and facilities a few points above. Small drift — probably fine. But what if the facilities spend is concentrated in one renovation that has already slipped twice? What if the staffing drift is because the hiring plan pivoted to fill a gap in a school that just lost a principal?

These are the real conversations of public finance. Not whether the drift violates the budget, but whether the allocation still makes sense given what has actually happened. The BudgetSankey instrument does the surface reading. It says: staffing tracking below allocation, facilities above; facilities spend concentrated in one project that is weeks behind schedule. Would you like to reallocate back to staffing before the hiring window closes?

The intelligence is not predictive. It is observational. But it surfaces when you can still do something about it — when the hiring window is still open, when a facilities contract can be renegotiated, when another program can be drawn on if priorities have genuinely shifted.

The same logic applies to grants. A central government distributes workforce-training funds to local authorities. Each authority gets a target, and by law must spend most of the allocation or return it. The auditor's job is verifying that the spend happened and on the intended uses. The BudgetSankey instrument reads the spend in real time, showing not just that money left the account, but which training programs it landed in, how many participants each one served, and what employment outcomes those participants achieved. When an authority is running slow on spend, the alert surfaces while there is still time to adjust the training mix, accelerate contracting or rescope the program — not in a post-hoc remediation after the deadline has passed.

Appropriation → outcome
Full BudgetSankey traceability
Audit-ready
Anomaly flags, decision-logged
500K+
Learners served on governed data
4–6 weeks
Assessment phase, typical

The Copilot for Auditors: From Numbers to Investigation

Every auditor spends weeks walking the books. A transaction sample from each category, verified against supporting documents. An account balance reconciled across three systems. A journal entry investigated for the reason it was reversed. Today this work is manual, slow and happens long after the books close. BudgetSankey compresses it.

The auditor walks in knowing: which line items drifted most from budget, which vendors concentrated the spend, which transfers took the longest route to their final destination, which programs over- or under-performed against outcome targets. Not predictions. Observations. The auditor then picks the anomalies worth investigating — not the ones the model thinks are risky, but the ones that look different and warrant a conversation.

The copilot aspect is the audit trail. When the auditor asks why was this spending concentrated in November, they pull not a model score but a chain: procurement request, approval log, contract execution date, invoice dates, payment dates, outcome evidence. They can read the story in the data itself, not in an explainability layer built on top of the model.

In public finance, a euro traced is a euro trusted.

From Pilot to Population Scale: Sustaining Transparency

The hard part of spending transparency at scale is not the infrastructure. It is the discipline. A single department with one financial system and a handful of programs can be read in real time. A government with dozens of agencies, hundreds of programs, data scattered across many systems, legacy formats and new cloud platforms — that becomes a coordination problem.

The deployments that have lived — government treasuries that sustain spending visibility across populations measured in the hundreds of thousands of beneficiaries — do three things relentlessly:

One: they keep the data in one perimeter. Data residency is not a security afterthought — it is a structural requirement. Every agency's payment system, every program's outcome ledger, lives inside the treasury data platform. When it does not, that system becomes dark. And dark spots are where anomalies hide.

Two: they retrain the anomaly detector against ground truth. When the auditor investigates a flagged transaction and finds it is legitimate, that becomes part of the training data. This pattern means something different than we thought. When a legitimate transaction goes unflagged and only shows up in a year-end review, that is also training data. We missed this one. The model learns not from prediction, but from supervised investigation.

Three: they keep the human investigator in the loop. A transaction that is flagged does not auto-approve, auto-reject or trigger an automatic hold. It triggers a notification. A human reads the flag, the transaction and the context — is there a reason this looks anomalous? Is it a rule break, or just unusual? They decide. That decision gets logged. The log becomes the audit trail.

The outcome at scale is not zero fraud or zero waste — no system achieves that. The outcome is that when fraud or waste happens, it is visible in days, not months. And when an auditor looks back, they do not see a black box. They see a record of observations and decisions, traceable every step of the way.

Where to start

The Assessment phase starts with the spend ledger you already have: purchase orders, invoices, payments, transfers, and whatever history exists. The early weeks are purely historical. You ingest this history into the BudgetSankey instrument and ask it to find the patterns: which vendors concentrate spend, which transfer patterns repeat, which deviations from budget recurred year after year?

Then you compare: were these deviations flagged at the time? Did an auditor investigate them later? Did they prove legitimate, or did an investigation never happen? You map the missed anomalies — the ones that should have triggered a conversation — and the false alerts — the ones that triggered but were always going to be legitimate. From that map, you set the sensitivity thresholds for real-time detection.

The middle of the window is integration and deployment. You wire the anomaly detector into the live payment system. You test the flagging cadence. You run it in read-only mode alongside the existing audit workflow — the auditor sees the flags but continues their normal process, and you measure how many of the AI flags the auditor would have caught anyway.

By the close of the 4–6 week Assessment, you have a baseline: how many flags per month, how many warrant investigation, how many prove legitimate or false. You have also documented the full chain: transaction sourcing, anomaly detection logic, investigation and decision-logging. That documentation becomes the audit trail the external auditor will read.

Then you flip the switch: the flags start landing in real time, and the investigator response gets logged. The treasury goes live with spending transparency.

In public finance, a euro traced is a euro trusted.

Get in touch

Put RealAI’s applied-AI team on your hardest data problem.

We help enterprises move from pilots to production — sovereign models, governed data, and agents you can audit. Start with a value-first assessment.

Next step

Ready to make AI real?