Text Analytics of PDF Technical Documents (Air Force)

Text Analytics of PDF Technical Documents

Customer Challenge

The Air Force required a logistics data crosswalk to mitigate known maintenance and supply data connection challenges limiting accurate demand planning and forecasting.

Innovative Solution

ILW data scientists used natural language processing (NLP) and unsupervised machine learning (ML) techniques to evaluate and determine an automated method to tie Work Unit Code (WUC) to related National Item Identification Numbers (NIINs). They used information extracted from Technical Orders in native PDF format as well as data captured in maintenance and supply data systems.

Benefits/Outcomes

Extracted master parts list (MPLs) for two Air Force weapon system programs
Developed multiple table extraction techniques that read PDF documents and pull tabular information out with high degrees of accuracy. Techniques leverage and improve open-source libraries
Provide enterprise search capability of Air Force technical documents

Business Value

Improves parts supportability, contract lead times, integrated repair planning
Enables planning for predictable shifts in demands and condemnations, buying the right quantities of the right parts, avoiding overbuy on other parts

Toolbox

Open-source Python solution using DoD-compatible libraries: Pandas, Tabula and Fitz, Scikit-learn, and OpenCV
Native PDFs
Text analytics, NLP, Machine Learning