LEARN > Blogs > Navigating a Data Use Paradox

Navigating a Data Use Paradox

Shifting From "Can We?" to "Should We?" – A Framework for State Chief Data Officers

	10 Jul 2025
	Blogs

Key Takeaways for State Chief Data Officers

The Central Question: Technical capability alone doesn't justify data use. State CDOs must consistently ask both "Can we?" and "Should we?" when considering data initiatives.

Build on Proven Foundations: The Fair Information Practice Principles (FIPPs) from the 1970s remain essential ethical guideposts, but need modern implementation approaches for today's complex data environment.

Apply the Five Safes Framework: Use this structured approach to evaluate data sharing decisions across five dimensions—People (who gets access), Projects (what purposes justify access), Data (what protection level is needed), Settings (where and how data is accessed), and Outputs (what can be released).

Prepare for AI Governance Now: AI amplifies existing data quality issues and creates new challenges. State governments need flexible frameworks that ensure high-quality data, effective governance principles, and enhanced technical capacity.

Embrace Dynamic Risk Management: Use tiered access approaches and coordination mechanisms that adapt to changing technology while maintaining public trust. High-value, high-risk use cases (like linking data for child welfare) require careful safeguards, not automatic rejection.

Lead Through Collaboration: State CDOs are uniquely positioned to demonstrate responsible data innovation. Share experiences, coordinate approaches across agencies, and maintain focus on public benefit.

The Bottom Line: Governments will continue collecting and using data in the years ahead at an escalating pace. Success depends on doing so in ways that build public trust while delivering real value to citizens

On my drive back from presenting to Maryland's state Chief Data Officers in early-July 2025, I passed a gate with a simple sign: "Push to Open." A perfect metaphor for the challenge I had just spent two hours discussing with the state’s key data stewards.

In the data world, we're constantly pushing gates open—accessing new datasets, linking information across agencies, and now applying AI to government services. Emerging technology makes it easier than ever to push these gates open. But as we do, we must be mindful of what lies on the other side: individual privacy, entity confidentiality, public trust, and the approaches to simultaneously achieving transparency and protection.

This tension across these topics was a key point during an Executive Education session for Chief Data Officers from Maryland's state departments and agencies. Led by Stefaan Verhulst, joined by Maryland CDO and Data Foundation Senior Fellow Natalie Evans Harris, and presenting alongside Lynn Overmann, Executive Director of the Beeck Center, our discussion centered on a fundamental question that every state CDO must grapple with: "Just because we can push the gate open, should we?"

Fair Information Practice Principles Still Matter

Before diving into modern frameworks, recall that we're not starting from scratch in answering this question. The Fair Information Practice Principles (FIPPs), established in the 1970s, remain a bedrock of responsible data governance and stewardship. These core principles—transparency, individual participation, purpose specification, data minimization, use limitation, data quality and integrity, security, and accountability—provide the ethical foundation for all data activities that have also shaped data policy implementation for decades.

However, the FIPPs were designed for simpler data environments more than 50 years ago, before many modern data laws, before popular use of generative AI, and before vast expanses in computational capabilities. Today's challenges—data linkage across agencies and with the private sector, responsible use of AI and machine learning applications, and open data mandates—require more nuanced approaches while maintaining these core principles.

The applications in the state of Maryland in this arena are noteworthy. The state’s open data policy was the foundation for open data directives in the federal government in 2010 which ultimately led to the OPEN Government Data Act.

The Five Safes: A Practical Framework for Data Sharing Decisions

The Five Safes framework, originally developed in the UK for statistical activities, offers state CDOs and data stewards a relevant, structured approach for evaluating data sharing and access decisions. Each "safe" addresses a different dimension of risk to be managed to avoid harms to individuals and businesses.

During our session in Maryland, one attendee highlighted a powerful use case that exemplifies both the high value and high risk of linking and blending data: understanding the full public support ecosystem for a child with divorced parents. In the example, one parent may receive SNAP, SSI, or child welfare supports while the other pays child support as a noncustodial parent. To monitor and improve children's outcomes, caseworkers need to see the complete picture for certain aspects of data help by multiple departments—but this requires linking data across multiple agencies and benefit systems. Yet, policymakers and program managers may not require this level of detailed knowledge to support families effectively.

This example helps illustrate why the Five Safes framework is relevant and useful for data stewards:

Safe People – Who should have access to your data? In our child support implementation example, this might include caseworkers from multiple agencies, court personnel, and benefit administrators using data. Responsible use involves establishing clear criteria for both internal staff (through role-based access and clearance levels) and external partners (through researcher vetting and contractor requirements). Different people may also have different levels of access to view more or less details based on needs. The key question: Do you have standardized, transparent criteria for data access approval?

Safe Projects – What purposes justify data access? Our child support scenario represents a clear public interest—improving outcomes for children navigating complex family structures with access to identifiable data about a child. This differs significantly from other purposes and research, and not all research is created equal. Public health research and studies may use health records and statistical samples to generate aggregate knowledge. Separately, cross-agency collaboration for citizen services and customer experience improvements require different types of reviews from external partnerships and private entities with restricted access to support public sector program implementation. State CDOs must develop clear guidelines for evaluating whether proposed data uses align with public interest.
Safe Data – What level of protection does the data require? The child support example involves highly sensitive personal information—financial status, family relationships, benefit eligibility. This is where technical safeguards are relevant for the administrative records combined across sources. If looking for a binary indicator to indicate a change – a user may only need aspects of the data not a full casefile. Approaches that are also relevant, especially when looking for group-level insights, include concepts like statistical disclosure control, de-identification and anonymization techniques, synthetic data generation, and differential privacy.
Safe Settings – Where and how is data accessed? For sensitive cases like our child support example, the setting for accessing and using data becomes critical. Caseworkers might need real-time access in secure offices or systems, while researchers might work with versions in data enclaves. Physical security matters, but so does digital infrastructure. This includes data enclaves, secure computing environments, API rate limiting, access logging, cybersecurity protocols, and cloud security considerations. The setting must match the sensitivity of the data and the trustworthiness of the users.
Safe Outputs – What can be released from analysis? Even with secure people, projects, data, and settings, the outputs themselves can pose disclosure risks. In our child support scenario, aggregate insights about support patterns might be valuable for policy development, but individual case details must remain protected during use by case managers. Automated disclosure risk assessment, tiered release processes, and output review procedures help ensure that insights can be shared while protecting individual privacy.

The AI Era: New Challenges Require Enhanced Frameworks

The emergence of AI and machine learning at scale has fundamentally changed the data landscape. As the Data Foundation emphasized in a recent report on “Data Policy for the Age of AI” there's no "one-size-fits-all" approach. Policymakers need flexible frameworks to assess whether existing laws and policies adequately address AI's unique challenges, particularly in the constellation of more than 3,000 privacy laws across the country.

At least three factors make AI particularly challenging for state CDOs:

Data is at the core of AI: Training data quality directly impacts AI outputs that affect peoples' lives. Poor data governance doesn't just affect statistics—it affects increasingly automated decision-making support systems and application reviews. Ensuring metadata and documentation are available to support high-quality data is key, alongside accessibility to that information. The emergence of rapidly accelerating AI models means CDOs must work to rapidly enable this responsibility while also appropriately managing risks of harm.
AI amplifies existing issues: Biases and errors in data can be replicated and magnified at scale through AI systems. While there are tremendous potential benefits for the data ecosystem, poor data quality and integrity cascades without feedback loops from users to improve data that is fit for purpose. Nationally we have poor systems in place to help monitor and assess trustworthiness of data and fitness for purpose.
Legal gaps exist: Current national frameworks like the Privacy Act, Foundations for Evidence-Based Policymaking Act, the OPEN Government Data Act, and sectoral data laws were largely not intentionally designed for AI's unique challenges emerging in real-time. Yet, the frameworks may be sustained and built upon based on a growing user base and articulation of value about how data can be effectively and effectively used for legal and ethical purposes.

Moving forward to build AI-ready data governance, there are several key components for data stewards to consider. First, amplifying efforts to ensure high-quality data that is fit for purpose, specifically includes applying appropriate data standards, establishing knowable data revision processes, publishing rich metadata with assessments of data sensitivity, and sufficient documentation that provides context for AI training and use. Second, applying effective governance principles can occur alongside enhanced privacy protections for AI training data – especially when for open data available from the public sector – as well as, transparency in AI decision-making processes. Finally, stewards can enable the technical capacity for secure infrastructure for AI model training, ethical procurement standards for AI tools, and workforce development for AI literacy across government.

Lessons from National Academies on Blended Data

In 2024, the National Academy of Science’s Committee on National Statistics published a policy-risk framework for considering the value and risk of harm when blending data from multiple sources. This policy-risk framework can be applied for a range of purposes – including both statistical uses and administrative actions. The National Academies report concludes that when agencies combine multiple data sources, they create magnified disclosure risks. These "composition effects" mean that multiple data releases can accumulate disclosure risks in ways that weren't anticipated when each dataset was collected or released individually.

Initially designed for statistical disclosures, the National Academies framework emphasizes three essential elements that state governments can implement that are expansive in their realism for generating aggregate insights or individual record linkage:

Dynamic Risk Management: Processes that adapt to changing technology and policy needs, respond to stakeholder concerns, and reflect differing levels of risk and usefulness.
Tiered Access Approaches: Different levels of access for different users and purposes—from public use files with synthetic or heavily aggregated data, to restricted access through research data centers, to secure enclaves with full data access under strict controls.
Coordination Mechanisms: Standard application processes across agencies, common terminology for privacy and risk discussions, and streamlined data-sharing agreements.

In the framework, the child support example may be recognized as both high value and high risk, which does not preclude using the data – but does require successful application of approaches to manage risk responsibly to protect vulnerable populations.

The Ethical Imperative: Moving Beyond Technical Capability

Perhaps the most important insight from our session with Maryland's CDOs in July 2025 was that technical capability alone does not justify data use. The question "Can we?" must always be paired with "Should we?"

Answering the can-should we question together requires considering multiple factors: public benefit, privacy risks, community impact, and whether less invasive alternatives exist. The Fair Information Practice Principles provide an ethical standard. The Five Safes framework provides structure for these decisions, notably not replacing ethical considerations and community engagement. The National Academies policy-risk framework can also help consider appropriate management strategies if the answer is yes to both questions.

State CDOs are uniquely positioned to lead in responsible data innovation, at the nexus of the American people and their national government. Importantly, state governments can be more agile while maintaining strong public accountability.

Moving Forward

Addressing the data use paradox facing state governments – and all data stewards – highlights why CDOs will shape public trust in government for years to come. The Maryland CDOs who participated in our session are already demonstrating this leadership. By sharing experiences, coordinating approaches, and maintaining focus on public benefit, state governments can lead the way in responsible data innovation.

The question isn't whether public sector agencies will continue to collect and use data—they certainly will. The question is whether they'll do so in ways that builds or maintains public trust while delivering real value to citizens.

NICK HART, Ph.D. is President and CEO of the Data Foundation. This article reflects themes from a presentation delivered on July 8, 2025, to Maryland state Chief Data Officers, co-presented with Lynn Overmann of the Beeck Center and led by Stefaan Verhulst. The session was co-sponsored by The Gov Lab, Open Data Policy Lab, and The Data Tank, with participation from Maryland CDO and Data Foundation Senior Fellow Natalie Evans Harris.