The Challenge

A large US federal government agency needed to streamline the collection, validation, mining, and benchmarking of mission-critical data. Manual data management tasks, processes, and analyses were cumbersome, time consuming, and didn’t allow for a dive deep into the data to uncover actionable insights that could help the agency achieve its mission goals.

The Reveal Solution

To reduce data quality issues, improve workflow efficiency, and enable near real-time insights, Reveal automated manual data management workflows, applied its proven approach to ensure data quality, and developed an artificial intelligence (AI) based expert system for the agency.

The Reveal solution enables the agency to continuously analyze diverse data from numerous sources to:

  • Deduce new knowledge to get ahead of potential issues and uncover new opportunities
  • Deliver near real-time insights to boost the decision-making power of field personnel
  • Gain efficiency from analytic process automation of manual workflows
  • Create baseball card-like summaries field personnel can use for situational awareness
  • Identify and alert on abnormal/outlier behavior or unusual relationships for further action
  • Leverage context specific recommendations which allows the user to create intelligent Q&A sessions
  • Produce recommendations for the analytic support engine so users can take further action as necessary

A Proven Data Quality Approach

Insights are only as good as the data used to mine them. Reveal’s approach to data quality has been proven over and over again to improve data quality by optimizing confidence levels in data. We validated our client’s data using these emerging, open-source, and best-of-breed technologies.

  • Natural Language Processing (NLP): Reveal Access Controller capabilities ingest and analyze data from various sources (social media, newsfeeds, reports, video, satellite images, audio, etc.)
  • Machine Learning (ML): discovers non-apparent relationships between entities and subjects and defines relationship strength through advanced analytics visualizations
  • Supervised Algorithm Scoring: scores denote the percentage of records where the supervised algorithm is in agreement with the training data – only algorithms that score greater than 95% are used
  • Threshold Fitness Values (FV): assigned (-1.0) to records to separate “inlier” (FV> threshold) from “outlier” records
  • “Inliers” or “Outliers” Labels:  labels are fed into training of supervised algorithms
  • Unsupervised FV: calculated with different FV ranges (-20, -5 & -5, +1) and for each row and the distance of the row from the decision surface of each of the four unsupervised learning algorithms
  • Simulation and Experimentation:  demonstrates “outlierness” correlates to more negative FV and “inliers” have FV closer to +1.0. Outliers show a strong disagreement between unsupervised/supervised FVs.