The Challenge
A national health ministry published disbursement records only as scanned PDFs. This "data graveyard" made it impossible for the public or facility managers to verify if funds had arrived. Transparency was theoretical, not practical.
Our Solution
We built an automated pipeline using OCR (Optical Character Recognition) and Python to liberate this data.
- Extraction Engine: `camelot` and `tesseract` to parse tables from PDFs.
- Cleaning Pipeline: Deduplication and fuzzy matching to standardize facility names.
- Public Dashboard: An interactive explorer allowing users to search by county or facility name.
Impact
"For the first time, a nurse can check on her phone if the clinic's supplies budget was released."