Navy Automated Data Cleansing with ML
Poor data quality is hindering the Department of Navy’s (DON) ability to gain valuable and accurate insight from their data. Given the volume of errors, manual correction is ineffective and inefficient.
ILW data scientists implemented Phase I of our Automated Data Cleansing and Analysis Tool (ADCAT), which applies machine learning (ML) and probabilistic graphical modeling (PGM) to automatically cleanse DON data of errors.
For Phase II, ILW is currently applying algorithm enhancements and user interface creation for improved healing functionality across multiple commands, domains, and DON operational environments.
- Robust natural language processing (NLP) and supervised ML classifier algorithm resulting in 71-87% error correction rates
- PGM Bayesian network in ADCAR to provide end-users with the five most probable corrections for a given error. 95% of the time, the correct value for an error was in the top five most probable values
- A human-in-the-loop error correction recommendation solution is available when needed to enable review and validation of the predictions
- Improved analyst productivity: less time correcting data, increased focus on core mission tasks
- Higher quality data: higher-confidence, data-informed decisions, cost savings
- Supervised/unsupervised ML with over 16,000 parameter combinations tested
- Probabilistic graphical model (Bayesian)
- Open-source Python solution using DoD compatible libraries
- Categorical, ordinal, and string data types
- NAVAIR maintenance data (S3 aircraft)
- NAVSEA labor data