Automated Data Cleansing with Machine Learning

Customer Challenge

Poor data quality is hindering the Department of Navy’s (DON) ability to gain valuable and accurate insight from their data. Given the volume of errors, manual correction is ineffective and inefficient.

Innovative Solution

ILW data scientists implemented Phase I of our Automated Data Cleansing and Analysis Tool (ADCAT), which applies machine learning (ML) and probabilistic graphical modeling (PGM) to automatically cleanse DON data of errors. For Phase II, ILW applied algorithm enhancements, optimization, model quality monitoring, and user interface creation for improved healing functionality across domains as well as deployed ADCAT to a DON production environment.

Benefits/Outcomes

  • Robust natural language processing (NLP) and ML classifier models, achieve 96 – 99.8% accuracy
  • ADCAT’s PGMs provide end-users with the five most probable corrections for a given error; 98% of the time the correct value was in the top five most probable values
  • Exposes black box of ML error correction logic by providing transparent, human-understandable explanations
  • Scalable processes and automatic discovery methods enable new error correction models to be built quickly
  • Human-in-the-loop solution is available to enable review and validation of the ML-driven error corrections

Business Value

  • Improved analyst productivity: less time correcting data, increased focus on core mission tasks
  • Higher quality data: higher-confidence, data-informed decisions, cost savings

Toolbox

  • Supervised/unsupervised ML
  • Probabilistic graphical models (Bayesian Networks)
  • Natural language processing
  • Open-source Python solution using DoD-compatible libraries

Domain Expertise

  • NAVAIR maintenance data
  • NAVSEA labor data

Related Case Studies You May Like

Interested In Working With Us?