Data Architecture for Social Sector AI

Series 02 — Data architecture for the social sector

1. Data ownership · 2. Modular systems · 3. Content architecture · 4. Localization · 5. Outcome metrics

The most important question in any AI proposal is the one that almost never gets asked at the proposal stage: who owns the data and the system, and what happens to both when something changes?

This is not a technical question. It is a power question. Whoever controls the system controls the intervention. Whoever controls the data controls who can be helped, who can be ignored, and who can be harmed. And in the social sector, ambiguity about either is the most reliable predictor of failure that we have seen.

This page lays out the framework we use with Tilted Ground clients to think about data architecture in social sector AI: the four questions to ask before building anything, the four-tier ownership model that those questions produce, and the three principles that should hold regardless of which tier applies.

Why this is harder than it looks

The two naive answers both fail.

The nonprofit ownership default

"The nonprofit owns it because they built it" has a well-documented failure mode. The organisation builds something useful, the funding cycle ends, the engineering team disperses, and the system dies. The community or government partner that was depending on it is left with nothing, and often with broken trust in technology as a whole. The graveyard of social sector tech is full of nonprofit-owned systems with no succession plan.

The government ownership default

"The government owns it because they need to sustain it at scale" has an equally well-documented failure mode. Governments take ownership before they have the capability to maintain the system, vendors disengage once the contract closes, and the system degrades within two to three years. Or worse: governments come to own data in ways that create surveillance risks for the very populations the system was designed to serve. India's documented welfare AI failures (Telangana's Samagra Vedika cancelling 1.86 million food security cards, the POSHAN facial recognition system denying rations to pregnant women) share this pattern. Government ownership without accountability infrastructure is not accountability. It is the appearance of accountability without the substance.

Neither default works universally. What works is designing ownership deliberately, based on the specific context of each system, and having that conversation explicitly before the first commit, not after the first failure.

Four questions to ask before you design

These four questions, applied honestly, will produce the right ownership structure for any AI system in any social sector context.

Question 1

Who needs to sustain this system beyond the initial funding horizon?

Work backwards from the desired end state, not the current convenience. If the system is meant to operate at government scale on government budget, design it for government ownership from day one: open-source where possible, formats compatible with the relevant national stack (Sunbird, ABDM / ABHA, iGOT, ONEST), and a transfer plan documented before the build. If the nonprofit is the ongoing service provider with no intention of handing off, nonprofit ownership is appropriate, but only if the nonprofit has a real technology maintenance capability, which most social sector organisations currently do not. If neither party has the capability to sustain it, the honest answer may be a Digital Public Good approach.

Question 2

Who has legitimate authority over the data subjects?

Data subjects (citizens giving feedback, teachers being coached, children being assessed, frontline workers submitting observations) have rights over their data regardless of who owns the system. The practical question is who has the existing accountability relationship with those subjects that makes them the appropriate steward.

In most contexts, the answer is layered: the government has constitutional accountability to citizens; the nonprofit has a programmatic relationship and often an explicit consent framework; the AI company has terms of service. Map these layers and assign stewardship accordingly. Name the risk that most architects skip: in government-adjacent AI systems, data collected for a beneficial purpose can be repurposed for surveillance, punitive action, or political ends. The architecture needs explicit constraints on what the data can and cannot be used for, who can access it, and under what circumstances.

Question 3

Who has the capability to be accountable when the system fails?

AI systems fail. They produce incorrect outputs. They perform differently across demographic groups. They miss genuine signals and flag false positives. When that happens in a high-stakes context (a child incorrectly flagged for developmental delay, a citizen complaint incorrectly routed, a teacher incorrectly scored, a beneficiary incorrectly denied a ration), someone needs the operational capacity to investigate, correct, and make the affected person whole.

Ownership should track to that capacity. If the government owns the system but lacks the technical capacity to audit its outputs or investigate complaints, government ownership creates the appearance of accountability without the substance. That is worse than honest nonprofit ownership with a clear escalation pathway.

Question 4

What happens to this system if the relationship between the nonprofit and the government breaks down?

Nonprofit-government relationships are not permanent. Governments change, secretaries rotate, political priorities shift. A system that depends on a specific government champion to function, with that champion's access baked into its architecture, is fragile. Design for the relationship breaking down, not for it continuing indefinitely. Data portability is not a technical nicety; it is a resilience mechanism.

The four-tier ownership model

These questions, applied across a portfolio, produce a tiered model rather than a single answer.

Tier 1 Nonprofit stewardship with government access

When it applies

The nonprofit is the ongoing service provider, has the technical capability to maintain the system, and the government is a beneficiary of insights rather than an operator of the system.

Data arrangement

The nonprofit owns and stewards the data. Government receives aggregated, anonymised insights through a formal data sharing agreement with explicit constraints on use. Individual-level data stays with the nonprofit.

Examples in the field. Tarjimly's translation services for refugees hold beneficiary data within the nonprofit's own infrastructure while sharing system-level outcomes with funders and partners. Bayes Impact's CaseAI works the same way: caseworker queries and outputs sit with Bayes Impact, while aggregated impact data flows to funders. Qure.ai's TB screening pilots use Piramal Swasthya as the nonprofit field operator, with aggregated state-level diagnostic outcomes flowing to the National Health Mission while individual patient images remain inside the operator's infrastructure.

Tier 2 Joint stewardship with a defined transfer plan

When it applies

The system is designed to eventually operate inside government infrastructure, but the nonprofit needs to run it during the pilot and scaling phase while government capability develops.

Data arrangement

Data is held jointly under a formal MoU that specifies the transfer timeline, the conditions that must be met before transfer (technical capacity, security certification, user training), and what happens to data if the transfer does not happen.

Examples in the field. Wadhwani AI's Kisan-eMitra is the canonical Indian example: the Ministry of Agriculture as the data owner, Wadhwani AI as the operating partner, NIC as the hosting layer, scope expanding by Ministry decision rather than by re-tendering. The Sarvam AI + EkStep "Listen at Scale" deployments with the National Health Authority and the Department of Empowerment of Persons with Disabilities run on the same pattern: structured data writes back into government registries (the ONEST registry for disability profiles) so the durable record sits inside the public system from the start.

Tier 3 Digital Public Good architecture

When it applies

The system has the potential to be replicated across multiple states or organisations, the use case is not proprietary, and sustainability is better served by community maintenance than by any single owner.

Data arrangement

System code is open-source under an appropriate licence. Data generated by each deployment stays with that deployment's operator; there is no central data pool. Governance is community-based through an open-source maintainer model, often with formal DPG registration through the Digital Public Goods Alliance or India's own DPG frameworks.

Examples in the field. The Sunbird stack underneath DIKSHA (built by EkStep Foundation, maintained by a multi-organisation community, deployed by national and state education departments) is the reference example. ABHA under the Ayushman Bharat Digital Mission, the ONEST registry, and BHASHINI follow the same architecture. DPG-compatibility is no longer a technical preference; it is a scale precondition. An intervention designed to be Sunbird-compatible from day one inherits national reach. An intervention designed in isolation has to build it.

Tier 4 Government ownership from day one

When it applies

The system is being built explicitly for government operation, the government has or will develop the technical capability to maintain it, and the nonprofit's role is implementation support rather than ongoing service delivery.

Data arrangement

Government owns the data from the start, under existing government data governance frameworks. The nonprofit has access only for the purposes of the current engagement, with that access formally ending at the engagement's conclusion.

A note of caution. This tier is the highest-stakes one because government ownership without accountability infrastructure is exactly the failure mode documented in cases like Telangana's Samagra Vedika and the POSHAN facial recognition rollout. When Tier 4 is the right choice (typically for civil-servant-facing systems such as Karmayogi Bharat's competency platforms, or for sovereign-data-only deployments with no individual-level beneficiary risk), the funder's role is to ensure that government accountability infrastructure exists and is funded before the data transfer happens. Otherwise the right tier is Tier 2 with a transfer plan, not Tier 4 by default.

Three cross-cutting principles

Regardless of which tier applies, three principles should govern every AI system in social sector deployment.

Data minimisation by design

Collect only the data the system actually needs to function. Every additional data point is an additional liability: legal, political, and ethical. This is especially important in government-adjacent contexts where data collected for one purpose can be repurposed for another. Build the minimum architecture that makes the system work, not the maximum that makes it theoretically more powerful.

Portability and exit rights for data subjects

Every person whose data sits in the system (frontline workers, parents, citizens, students) should have the right to see their data, correct inaccuracies, and request deletion. This is not just an ethical principle; it is what builds the trust that makes adoption possible. In practice, it means building data subject interfaces into the system architecture from the start, not retrofitting them later.

Ownership tracks to accountability

Whoever owns the system should be the same party who is accountable when it fails. If those two things are misaligned (if the government owns the system but the nonprofit is expected to fix it when something goes wrong, or if a commercial vendor owns the model and refuses transparency to anyone), you have built a governance failure into the architecture. Map ownership and accountability together, document them in the MoU, and review them annually.

What this implies for funders

Ask for the data architecture in the proposal, not in the year-two report. Make the four questions above part of the standard due diligence template. Treat vague answers about ownership as a red flag rather than a detail to be resolved later.

Fund the accountability infrastructure, not just the technology. Tier 1 and Tier 2 deployments require a redress mechanism, an audit capability, and a data governance committee. These cost money. They do not show up in pilot photos. They are the difference between a system that survives and one that does not.

Fund DPG architecture as infrastructure. The Rockefeller Foundation and the Mastercard Center for Inclusive Growth's data.org work, Schmidt Sciences' AI2050 programme, and Google.org's grants underneath Sunbird and BHASHINI are examples of funders treating the rails, not just the trains, as the unit of investment. The Bill & Melinda Gates Foundation's sustained support for NIKSHAY (India's national TB patient management platform) is the same pattern in health.

Fund civil society accountability work alongside the technology. Amnesty International's algorithmic accountability work on Samagra Vedika, the Pulitzer Center-backed Decode investigation of POSHAN FRS, and the Wire / Decode coverage of Delhi Police's facial recognition system are the most useful reading we know of for funders deciding what not to fund. These investigations exist because someone funded them. Continuing to fund them is part of building a healthy AI deployment ecosystem, not a separate concern.

Shruti Keerti is the founder of Tilted Ground. Before starting the studio, she spent a decade building consumer products at PayPal, eBay, and Google, then spent several years inside social sector organizations grappling with the same infrastructure questions this series is about. She started Tilted Ground because the gap between what AI can technically do and what field practitioners can actually use kept showing up in every context she worked in.

Part of the series "Data architecture for the social sector." Next: From monolith to modular: why resilient social sector systems are built to be decoupled.

Data Architecture for Social Sector AI

Why this is harder than it looks

Four questions to ask before you design

Who needs to sustain this system beyond the initial funding horizon?

Who has legitimate authority over the data subjects?

Who has the capability to be accountable when the system fails?

What happens to this system if the relationship between the nonprofit and the government breaks down?

The four-tier ownership model

Three cross-cutting principles

What this implies for funders

Continue the series

From monolith to modular: why resilient social sector systems are built to be decoupled

Content as data: the architecture decision that makes localization possible

Building multilingual products: localization pipeline design for the Global South