We recently sat down with Janette Steets, PhD and Principal Data Scientist at Illumination Works, to learn about the exciting predictive analytics work our team is doing at Wright-Patterson AFB.
Tell us about the analytics platform your team is developing for the Air Force.
As a part of the Data Analytics Resource Team (DART) within AFLCMC/LZIA, we’re currently working in the Air Force Research Laboratory’s (AFRL) high-performance computing (HPC) environment to develop a big data analytics platform. DART embraces a strong government civilian-contractor partnership to bring together disparate Air Force logistics data and provide a range of analytic tools users can leverage to make sense of and realize insights from the data. At the heart of the analytics platform are Air Force datasets, which DART’s database administrators and data architects have ingested into a structured database for easy querying and future analytic needs. The end goal is to enable the Air Force to better understand and glean useful information from their data, leading to process improvements and data-informed decision making.
We think in terms of three user types for the platform. First, we have those high-level folks that want pre-canned reports requiring minimal effort to realize insights from their data—we call these our dashboard users. The Illumination Works software developers, in collaboration with team members from The Perduco Group and government civilians, are developing custom-coded dashboards based on analyses data scientists and analysts on the team have performed. We roll these capabilities into a dashboard format that will be useful and insightful for Air Force users requiring this level of interaction.
The second user type is our exploration users. These are folks that perhaps don’t have deep knowledge of a coding language and require low-code tools that are able to create analytics leveraging the powerful computing resources behind them. One example might be using an application like KNIME, where the user can drag-and-drop nodes of what they want to do, create a workflow that allows them to quickly prep the data, and then couple that with machine learning and other analytic algorithms for data observations. For these users, one of the greatest benefits the platform provides is time savings via the ability to leverage data that has already been prepped. For example, for me to make sense of a raw data set, I might spend 80% of my time prepping the data, and only 20% of my time actually doing the fun analyses. For users working in this layer, the DART team has done some of the time-consuming prep work for them so they can get right to the analyses.
Data Science Users
The final user type is the data science user. These are folks, like myself and others on the team, that are applying coding languages, such as Python and R, to the data. We’re integrating disparate data sources, performing feature engineering to generate new input features for inclusion in analytic models, and utilizing machine learning techniques to address questions that were posed to us, as well as answering questions that perhaps the
Air Force hasn’t thought to ask themselves yet. We’ve leveraged the platform to solve quite a few problems to answer questions of the day. Two examples that come to mind are aircraft maintainer utilization and contracts.
What are some examples of problems being solved on the platform?
We’ve leveraged the platform to solve quite a few problems to answer questions of the day. Two examples that come to mind are aircraft maintainer utilization and contracts.
Aircraft Maintainer Utilization
We performed data integration and analysis using the DART Data Science Lab to develop an aircraft maintainer utilization metric and associated visualizations. We started with data from two different sources: aircraft maintenance data and manpower data. Our goal was to understand the average workload for a maintainer, by location and job type, in a given year. We integrated these data sources and developed a manpower utilization metric to help the Air Force understand the maintainer manpower workload and how much are they working ― information they didn’t previously have at their disposal.
Up until now, contracts have been largely untapped data resources housed as PDF images. We’ve been extracting the text from those PDF documents and performing text analytics to gain new understandings for the Air Force. I think this is a hugely impactful area and it’s very exciting to be a part of this in the early stages. In our first pilot effort, we focused on getting the conversion process pipeline plumbed out, and we successfully converted a half million contracts. Today, we’re looking at about four million contracts that are currently running through the conversion process, which over time has changed to accommodate different problems that have arisen.
What are you most excited about on this project?
There are actually a lot of things that excite me about this project. One really exciting area that we’re getting into is next steps on the contract analytics. It’s a lot of data and there are so many opportunities for what we can do next. The next step is to ingest the converted contracts into a document-oriented database and index key elements of those contracts to allow for easier searching. Natural language processing and machine learning techniques are critical for DART’s current and future contract analysis efforts. Supervised machine learning techniques are planned for categorizing Air Force contracts in useful ways, such as categorizing whether a contract provides contractor logistics support (CLS) vs. Air Force logistics support (i.e., organic support). Another example of future efforts with the contracts is to mine them and develop an enterprise database of data rights and assertions, ensuring the Air Force is not paying contractors multiple times for the same data rights.
Tell us about the recent DART award and why you feel the team was selected.
The DART team was nominated for the 2019 Air Force Materiel Command Analysis Award, which was a real honor in and of itself, and then we actually won the award. We were evaluated on originality, relevance, impact, and analytic value. They gave out a number of different analysis awards, and our team won the award for Analytic Innovation, a team award. The collaborative DART team was comprised of Illumination Works’ contractors and government civilians. Next, we’ll be competing at the Air Force level, so more to come on that front.
I think we were selected for developing the first big data analytics platform leveraging the AFRL HPC environment. On this platform, we’ve successfully integrated structured Air Force data sources that had never been brought together before and coupled that with analytics tools targeting three levels of users, enabling them to take advantage of the massive underlying computing power.
Some of our initial analyses have proved to be very useful to the Air Force, like trending five-digit Work Unit Codes (WUCs) and developing an aircraft maintainer utilization metric. There is also the contract text analyses we performed on those 500,000 contracts, which has also been impactful and has really changed how business enterprise solutions work. In addition, I think right now, big data and data science are huge buzz words everywhere, and there are a lot of initiatives to do similar things across the Air Force, but we actually have an implemented solution that is being used and continuing to expand in functionality.
Before we wrap up, do you have any words of wisdom for anyone wanting to get into the data science and analytics field?
I would say there are many paths that can lead to a successful career in data science. There are many non-traditional paths that are just as successful or even more so than a traditional degree program in data analytics, if you will, or data science. So, for folks that are interested and intrigued in data science, but perhaps are not in a program that’s specifically called that, there are many ways to get there from where you are, and so many higher degrees in STEM fields that can also prepare you for data science. Honestly, I think I’m an example of that. In the end, you might need to do a little extra training on the side to give yourself the whole skill set needed to be successful in data science, but it’s totally doable and a very satisfying career that is currently in high demand.
Janette Steets, PhD is a Principal Data Scientist at Illumination Works. She has 15+ years of experience in research and data science, including extensive expertise in experimental design and statistical analysis. With over a decade of teaching at the collegiate level, she also brings wide-ranging experience in developing and implementing engaging instructional methodologies.
For more information contact us at firstname.lastname@example.org.