Navy Automated Data Cleansing with Machine Learning
Poor data quality is hindering the Department of Navy’s (DON) ability to gain valuable and accurate insight from their data. Given the volume of errors, manual correction is ineffective and inefficient.
ILW data scientists implemented Phase I of our Automated Data Cleansing and Analysis Tool (ADCAT), which applies machine learning (ML) and probabilistic graphical modeling (PGM) to automatically cleanse DON data of errors.
For Phase II, ILW is currently applying algorithm enhancements, optimization, model quality monitoring, and user interface creation for improved healing functionality across domains as well as preparing for deployment in the DON environment.
- Robust natural language processing (NLP) and ML classifier models, achieve 96 – 99.8% accuracy
- ADCAT’s PGMs provide end-users with the five most probable corrections for a given error; 98% of the time the correct value was in the top five most probable values
- Exposes black box of ML error correction logic by providing transparent, human-understandable explanations
- Scalable processes and automatic discovery methods enable new error correction models to be built quickly
- Human-in-the-loop solution is available to enable review and validation of the ML-driven error corrections
- Improved analyst productivity: less time correcting data, increased focus on core mission tasks
- Higher quality data: higher-confidence, data-informed decisions, cost savings
- Supervised/unsupervised ML
- Probabilistic graphical model (Bayesian)
- Open-source Python solution using DoD-compatible libraries
- Categorical, ordinal, and string data types
- NAVAIR maintenance data
- NAVSEA labor data