Contract Conversion & Analytics 

Customer Challenge

To enable data driven decisions, the Air Force required a process to convert raw non-searchable PDF contracts into machine readable structured and unstructured formats to enable actionable data to be extracted, analyzed, and visualized. 

Innovative Solution

ILW created a processing pipeline that automated the extraction of semi-structured form and table data embedded within Air Force contracts and parsed this data into structured tabular outputs. The extracted contract text, forms, and tables are ingested into a NoSQL database, allowing for east search capability for Air Force users. ILW utilizes text mining tools to search these converted contracts for compliance with various regulations. 

Benefits/Outcomes

  • Insight into an almost untapped source of data
  • Converted 3.7 million Air Force contracts into machine readable language
  • Processed 300,000 computational hours
  • Parsed 7 types of PDF forms and tables into structured format

Business Value

New search capability enables enterprise-level understanding on contract compliance, contract health, data rights

Toolbox

  • Data Science, NLP, ML, Text Mining
  • Optical Character Recognition
  • High Performance Computing
  • Open-source Python solution using DoD compatible libraries (Pandas, Tabula, Fitz, Scikit-learn, OpenCV)
  • Tesseract and Couchbase

Related Case Studies You May Like

Interested In Working With Us?