A central government research consortium tasked with sequencing coronavirus variants nationwide has consistently delayed data release for scientific scrutiny and provided patchy data to its own scientists, disappointing them and undermining its own objectives.
The Indian National SARS-CoV-2 Genome Sequencing Consortium (Insacog) has over the past 15 months sequenced and analysed over 204,000 genomes. But only around 135,000 of those sequences are in any public database for independent scientific scrutiny, several Insacog scientists said.
The consortium is a network of 28 research labs across the country. It is intended to provide public health authorities with insights into locally circulating coronavirus variants and alert them about trends, if any, linking the variants to infection patterns or disease outcomes.
But several scientists have told The Telegraph that the consortium’s hub-and-spoke structure appeared to prioritise the central government’s control of access to the sequence data generated by the labs rather than making the data available for rapid scientific analysis.
“It is unfortunate. The sequence data is largely a one-way traffic — we know what sequences we have submitted from our lab, but we can’t see the sequences that other labs have submitted,” a senior scientist from an Insacog lab said.
Under the data submission protocols, Insacog labs need to upload their sequence data on a site named the Indian Health Information Portal, established by the National Centre for Disease Control (NCDC), New Delhi, an institution under the Union health ministry.
“The current protocols blind us to any real-time sequence data from across the country — only the NCDC has access to that nationwide data,” said another Insacog scientist. “And the information available to scientists is patchy and delayed.”
Two senior government officials familiar with the Insacog protocols said the hub-and-spoke structure represented an effort to maintain the quality of the sequence data and to prevent it being misinterpreted or used irresponsibly and causing panic.
One of the two officials, asked by this newspaper why the lab scientists did not have unhindered access to the sequences, said: “Why should they? Sequence data needs to be viewed in the context of epidemiological data. We make the sequences available after analysis.”
A senior Insacog scientist conceded that sequence data was liable to be misinterpreted and needed verification because the task of assigning a lineage or a sub-lineage “depends on software and some amount of interpretation that could be tricky”.
But several scientists said they found it disconcerting that the Insacog data hub —the NCDC — should presume that it alone had the knowledge not to misinterpret the data.
“I think what the protocols reflect is an attempt to control scientists’ access to data generated by fellow scientists,” a scientist said.
Rajesh Gokhale, secretary in the Union science ministry’s department of biotechnology (DBT), which is funding and coordinating the Insacog programme, said it would be incorrect to say that scientists lacked access to the coronavirus sequencing data.
The DBT, he said, has initiated an effort to store all the Insacog sequences on a separate database, maintained at the Indian Biological Data Centre, which would serve as a repository for all academic researchers.
“Scientists need to seek access to the data through some documentation certifying that they would use the data only for research,” Gokhale said.
A senior scientist in an Insacog lab said the faster the data accessibility issue was resolved the better.
“Data sharing can help with early detection of changes, if any, in the coronavirus,” another Insacog scientist said.
“We shouldn’t forget how lack of data sharing contributed to the delay in recognising the public health significance of the delta variant last year.”
The delta variant had driven India’s brutal second Covid-19 wave last year. A microbiologist at a medical college in Pune had detected the delta as a highly transmissible virus by mid-February. But slow follow-ups and a lack of data sharing meant the health authorities could attribute the wave to the delta only in April.