Data Modernization: Reimagine. Recode. Reveal.

Jan 6

Reveal is leading a modernization movement to help organizations unlock the future of data and analytics, from translating legacy systems into open-source technologies, to generating privacy-preserving synthetic data that enables safer innovation and accelerated insights. It’s not just about rewriting code or creating data; It’s about reimagining potential, rebuilding smarter, and revealing what’s possible.

This year, we’re taking an opportunity to feature our modernization work in key areas for federal agencies. Our team is developing new advancements and showcasing methods for modernization, especially in the areas of synthetic data, open-source code translation, and autocoding survey data.

Synthetic Data

Synthetic data are data which have been simulated to test new processes when real and historical information is not available. Reveal generates synthetic data to preserve complex statistical relationships and reflect key characteristics that are expected from the actual datasets. Depending on the need, we approximate the basic structural similarity to highly complex statistical features. Using machine learning methods as well as simpler rules-based generation where appropriate, we tailor the development of these data within the cost and utility constraints set by clients. Using these synthetic data, federal agencies can ensure that protected data is not exposed while testing new systems.

Open-Source Translation

Many federal agencies have built data systems and statistical processes which rely on licensed proprietary technologies (e.g., SAS). Open-source solutions provide a flexible and interoperable alternative that supports long-term maintenance and collaboration. Reveal uses large language model (LLM)-based translation pipelines to create efficient, and structured translations from proprietary technologies to open-source equivalents like Python. This method is easily scalable over large codebases and can significantly reduce the costs and time involved in manually translating to produce accurate codebases files.

Autocoding

Reveal develops autocoders to recode write-in survey responses to classify into organizations’ official codes, reducing manual coding burden on human coders. These autocoding models rely on both traditional machine learning methods and LLMs where appropriate. Each model is tailored to the subject matter of the survey questions. This allows our autocoder systems to efficiently digest the meanings of complex free-form text to accurately pair a write-in response with the correct code.

Reveal is also engaged in a broad range of other modernization projects, including:

Creating LLM-based search strategies for legislative research and for retrieving data from federal systems.
Developing methods to link businesses with their contact information using probabilistic matching algorithms and weighted ranking methods to build new frames for survey sampling.
Exploring the use of LLMs for translating survey questions and content from English to languages in hard-to-count populations.

Throughout Reveal’s modernization efforts, we prioritize building more efficient and responsive systems that adapt with agencies to the changing federal and technological landscape. We’re finding ways to employ new ideas and advancements in AI to create opportunities for agencies to innovate without compromising flexibility, cost, or privacy.

Interested in learning more about our modernization work? Email us at office@revealgc.com

Shen Peng

Data Modernization: Reimagine. Recode. Reveal.

Reveal Perspective: Civic Engagement as a Public Good

Reveal’s Internship Program Enters 5th Year

Reveal Global Consulting