If you want to know which villages in India have an electric connection, where should you look? Which official dataset would give you reliable, current information? How can daily operational data from a thermal power plant help reduce air pollution while simultaneously boosting efficiency at cement plants?
India is sitting on a goldmine of public data. With the growing adoption of artificial intelligence (AI) and digital tools, it is time to unlock its true potential.
Digital India and the data exhaust
The Digital India initiative and digitisation of public services have dramatically increased the volume of data generated by government departments, regulators, statutory bodies, and public institutions.
Many of these datasets — covering education, health, environment, infrastructure, taxation, and more — are already being collected at a granular level and sometimes even updated very frequently.
Meanwhile, companies and individuals are generating large volumes of publicly accessible digital information. Together, this ecosystem of “alternative data” is rapidly expanding.
For over a decade now, investors and businesses have used such data to gain information advantages — famously, by analysing satellite images of retail parking lots to forecast store revenues. Over time, alternative data sources have broadened to include transaction records, regulatory filings, and scraping of open databases.
India now hosts thousands of such datasets in the public domain.
AI enters the arena
The emergence of AI has added a powerful new player to the world of data analysis. What
was once the domain of seasoned analysts is now being democratised by AI models that can
process vast volumes of data, identify patterns, and generate actionable insights.
Young analysts who started tracking companies in a sector would invest countless hours in
understanding industry structure and its dynamics, figuring out what data is required to track
the entities and where it would be available, and then eventually updating these datasets
immaculately.
It could be years before they got a good grasp of understanding the detailed value chain
(suppliers, customers) and the other relevant stakeholders (government, industry peers, etc.).
AI tools can drastically accelerate the learning curve for junior analysts tracking sectors and
companies, enabling them to extract deeper insights with greater efficiency.
AI’s promise depends fundamentally on the availability and quality of data. Just as with
human intelligence, AI adheres to the rule of “garbage in, garbage out”. Poor data quality,
incompleteness, or lack of clarity results in misleading outputs — or worse, hallucinations.
Ensuring clean, structured data pipelines and deep, secure data lakes is therefore crucial.
The role of public policy
While private firms will compete to develop proprietary AI models, public policy has an
indispensable role to play in improving access to clean and credible datasets.
Specifically, data funded by public resources — whether through taxpayer money or
government-administered systems — should be made openly available, subject to appropriate
safeguards for privacy and security.
Consider the benefits of publishing granular, high-frequency public datasets.
Data from the Unified District Information System for Education (UDISE) can indicate
whether schools in remote villages report having electricity connections, offering an
independent check on rural electrification claims.
Or note that daily data from thermal power plants can help estimate fly-ash generation, which
cement companies can use to plan procurement more efficiently.
These are just two examples. The economic value hidden in publicly held datasets is
immense — and often unlocked in unexpected ways.
Time for a data unlock
India already releases consolidated reports, such as annual tax collections or monthly
inflation indices.
Richer insights can emerge when more granular data is made available more frequently.
As AI models evolve, they need wide, varied, and verifiable inputs to train on. With access to
robust public datasets, researchers, entrepreneurs, and investors can build tools that deliver
sharper insights, enhance governance, and even predict macroeconomic inflection points.
We have seen this before. In an earlier article in this newspaper (India’s fiscal contract: more
incomes in the tax net, April 19, 2024), we noted that publicly available tax data revealed that
individual taxpayers’ contribution to tax-to-GDP rose steadily even though average tax rates
remained flat, while corporate tax contributions stagnated despite lower effective rates.
Imagine what AI could surface if given access to more granular, real-time data across sectors.
A simple but powerful policy reform
The ask is straightforward: make publicly funded datasets public, in machine-readable
formats, and updated frequently enough for them to be useful — while ensuring that privacy
and security are not compromised.
Doing so will not just improve transparency: it will catalyse innovation, enhance productivity,
and give India a powerful competitive advantage in the global AI economy.
This article first appeared as an op-ed in the Financial Express.
Cover photo credit: The Financial Express
View disclaimer
Unlock the power of alternative data
Do not just follow the market — stay ahead of it. Thurro helps you transform raw filings and alternative datasets into actionable insights.
Explore Thurro AltData Book a demo
