← Back to Services Case Study

Health Disbursement Explorer

Automating transparency: Extracting data from unstructured PDFs to track fund flows to thousands of health facilities.

Dashboard visualization of disbursements

The Challenge

A national health ministry published disbursement records only as scanned PDFs. This "data graveyard" made it impossible for the public or facility managers to verify if funds had arrived. Transparency was theoretical, not practical.

Our Solution

We built an automated pipeline using OCR (Optical Character Recognition) and Python to liberate this data.

  • Extraction Engine: `camelot` and `tesseract` to parse tables from PDFs.
  • Cleaning Pipeline: Deduplication and fuzzy matching to standardize facility names.
  • Public Dashboard: An interactive explorer allowing users to search by county or facility name.

Impact

"For the first time, a nurse can check on her phone if the clinic's supplies budget was released."