Health Disbursement Explorer

Automated PDF extraction + dashboard for transparent facility flows. Built for decision-makers who need traceable numbers, defensible methods, and privacy by design. We turn static reports into validated, queryable datasets, then surface them in a governance-ready analytics layer.

Health Data ETL from PDFs Governance Privacy Dashboards

Outcomes

  • Machine-readable disbursement tables from PDFs with confidence scores and source lineage.
  • Facility-level flows linked to districts, programs, and funding streams.
  • Quality checks, anomaly flags, and reproducible audit logs.
  • Role-aware dashboard: transparency for the public, detail for analysts, guardrails for executives.
  • Policy artifacts: data dictionary, RACI, SOPs, retention and access rules.
Facility disbursement dashboard and data lineage diagram
Automated PDF extraction and validation checks

Automated Extraction & Validation

We ingest PDFs (scanned and digital-native), normalize tables, and reconcile facility codes. Every extracted value carries provenance: file → page → table → cell.

  • OCR + table detection with confidence thresholds and human-in-the-loop review queues.
  • Reference matching to official facility registries and program codebooks.
  • Balancing checks across totals, carry-overs, and periodized allocations.

OCR Table Parsing Lineage

Analytics & Strategy

We focus on decisions: who is underfunded, where bottlenecks occur, and which policy levers matter. Models are simple enough to explain and strong enough to guide action.

  • Trend and seasonality on disbursements vs. planned budgets.
  • Equity lenses: per-capita and burden-adjusted allocations by region.
  • Bottleneck diagnostics: approval delays, last-mile leakage indicators.
  • Sensitivity analysis for policy scenarios and funding shocks.

Equity Scenario Bottlenecks

Equity and scenario analysis visuals
Governance policy checklist and approvals

Privacy, Governance & Compliance

Public health finance needs transparency without compromising privacy. We embed governance from the start and ship the paperwork that auditors actually want.

  • PII minimization, access tiers, and masking for sensitive joins.
  • Versioned datasets with immutable checksums and change logs.
  • RACI, data-sharing agreements, retention schedules, and DPIA templates.
  • Monitoring: drift, anomalies, and policy violations with alerting.

PII Audit Trail DPIA Access Control

Dashboard Features

Explore

Facility Flows

Searchable and filterable

  • Disbursements by facility, program, and period.
  • Roll-ups by county/district and funding source.
  • Inline links back to original PDF pages.
Assure

Quality & Lineage

Trust the numbers

  • Validation badges with pass/fail checks.
  • Confidence scores and anomaly flags.
  • Downloadable audit packs for review.
Decide

Equity & Gaps

Actionable comparisons

  • Per-capita, burden-adjusted funding views.
  • Backlog and delay visualizations.
  • What-if sliders for allocation policies.

Modules

01

Ingestion

PDFs → Tables

  • OCR, table detection, and file cataloging.
  • Facility code reconciliation and dedupe.
02

Validation

Trust & QA

  • Balancing, totals, and period checks.
  • Lineage stamps and review queues.
03

Analytics

Equity & Gaps

  • Trend and bottleneck diagnostics.
  • Scenario planning and sensitivity.
04

Governance

Policy & Audit

  • RACI, DPIA, retention, access tiers.
  • Monitoring and alerting policies.

Case Snapshot

Health ministry PDFs published quarterly were inconsistent and hard to verify. We automated extraction, reconciled facility IDs, and shipped a public dashboard with an analyst view behind SSO.

  • 100% coverage across 3 years of disbursement PDFs, with per-cell provenance.
  • Zero manual copy-paste after onboarding; review time cut by 70%.
  • Policy pack approved: data dictionary, sharing agreements, and retention plan.
Before/after: PDFs vs. structured dashboard
Technical stack diagram

Toolkit & Deliverables

Works with your stack; we leave you with code, docs, and a sustainable process.

  • Python pipelines (pandas, pdfplumber/tesseract), orchestration via dbt/Luigi.
  • Storage: PostgreSQL or BigQuery; exports to CSV/Parquet for sharing.
  • Dash/BI dashboard with role-based access and change logs.
  • Governance bundle: SOPs, RACI, DPIA template, retention schedule.

Python PostgreSQL dbt/Luigi Dash/BI

Who it’s for

Public Health & NGOs

Transparency and accountability

  • Publishable, documented numbers with provenance.
  • Equity and gap analysis for funders and auditors.

Ministries & Agencies

Policy & oversight

  • Access-tiered dashboards and audit packs.
  • Repeatable pipeline for each reporting cycle.
Ready to open the books? Bring a sample PDF set. Leave with a validated dataset, a dashboard, and a governance plan.