I feel my job as a statistician has been fascinating because I often feel data whisper stories in my ears and I just translate them into models,” said professor Bhramar Mukherjee, basking in the fading warmth of the winter sun at St. Xavier’s College.
“I want to see more integration of statistics into our day-to-day vocabulary, statistical literacy and curiosity permeating the way people think about the world and natural phenomena. However, that can only happen when artists, humanists, statisticians and philosophers all come to a room and discuss big ideas and better ways to communicate them,” said the 51-year-old Indian-American biostatistician and data scientist, who is currently serving as the inaugural senior associate dean of public health data science and data equity at Yale School of Public Health.
Mukherjee was recently in Calcutta to address a young crowd at the South Asia International Conference on Data Management, Analytics & Innovation at St. Xavier’s College, Kolkata, and to present the PK Sen Memorial lecture at the University of Calcutta.
At the forefront in the field of biostatistics, her interests include “critical data studies, analysis of electronic health records, studies of gene-environment interactions, shrinkage estimation, data integration and assessment of multiple environment pollutants”. She and her team garnered international attention for tracking SARS-CoV-2 trajectory in India.
When it comes to laurels, there are plenty to mention, including being a fellow of the American Statistical Association and the American Association for the Advancement of Science, besides being elected to the National Academy of Medicine in 2022. She is an overseas fellow at Churchill College, University of Cambridge, where she holds a senior honorary visiting fellow appointment in the biostatistics unit. Before Yale, she spent considerable time at the University of Michigan School of Public Health, where she was the John D. Kalbfleisch Distinguished University Professor of Biostatistics, the Siobán D. Harlow Collegiate Professor of Public Health, and the first woman chair of the department of biostatistics.
‘I will fight for data equity’
Mukherjee is someone who knows the importance of distinguishing achievements from accomplishments. Accolades have come in abundance but she has never forgotten the first interview she gave to Anandamela at the age of 16 as a student of Class X in Dum Dum Christ Church Girls’ High School. Nothing could deter her from fulfilling her dreams of making an impact through her work and breaking glass ceilings as a woman in science.
“I try to stay deeply connected to Bengali literature, art and theatre, and obviously my upbringing is a part of the process (her father being a celebrated thespian on the Calcutta stage — Ashok Mukhopadhyay of Theatre Workshop). Almost every evening I attend a concert, a literary meet, or a play. This is very much who I am. A friend of mine from the US asked me what would Bengalis talk about if (Satyajit) Ray or (Rabindranath) Tagore were not born. I said that I didn’t want to think about that counterfactual world. They have completely defined my intellectual psyche. These two iconic geniuses have influenced almost every person in Bengal, directly or indirectly, and you will see that Bengalis have a very distinct philosophy of life and you cannot decouple that philosophy of life from the philosophy of science that we pursue.
“I do not think if I did not read Tagore, watched Ray’s films, and were not familiar with the Gananatya Andolon and saw what my father aimed to achieve through the theatre movement, my statistical science would be the same. For me, data equity is an outcome and a process that ensures everyone, regardless of their personal beliefs, attributes or circumstances, benefits equally from data science innovations,” said Mukherjee, who has been an academic high-flier at Presidency College and the Indian Statistical Institute.
A regular visitor to Calcutta and Santiniketan, where her parents spend their time, Mukherjee pointed to the importance of communication in science.
“In my family, there is an expectation that you should be able to communicate your scholarship in a language so that everybody around the dining table can understand it. Everyone has a chance to opine on your ideas. In our family gatherings, we often hold debates, for example: arts versus science. Everyone in the household shares their thoughts, including children. I grew up with this sense that everybody has a chance to speak and argue respectfully on radically different ideas.”
For her, there are three Cs that are very important for a modern statistician — computation, collaboration and communication.
“If you have a strong liberal arts foundation, if you have a knowledge of the world and human existence, it helps you with collaboration because you naturally connect with other scientists — a social scientist, a medical doctor, an astronomer or a geologist or an environmental health scientist, because you all think about improving the human condition.”
‘Build a massive bio bank in India’
Mukherjee and her team played an active role in modelling the SARS-CoV-2 virus trajectory in India during the Covid-19 pandemic. Now, with the entire world thinking about policies around artificial intelligence and data science, she has her thoughts about the leadership role India can play.
“I have been thinking a lot about data systems in India. Without good data, there is no good data science and no good science. Indian data scientists often work on publicly available data from other countries and focus on building elegant models and learning algorithms. However, every university, every IIT, every ISI, should also push for good data, not just for good models. Having diverse, fair, nationally representative data is essential for AI to make a positive social impact for everyone. For example, if there is a nationally representative two-million-people Indian bio bank, like the UK Biobank, which has health data, genetics data, socio-demographic data in a secure server… and it is released to the world in a privacy-preserving way so that scholars from the whole world log in to the secure environment, analyse data, write papers and make scientific advances, using data collected from India, on the beautifully diverse population of this country, how game-changing and transformative would that be? Nearly one-fifth of the world lives in India, we need more data to understand this key world population. Sharing data is the future of scientific research, the UK Biobank is a testament to that.”
She also believes that we need to have more respect for data. “There must be a grassroots-level movement around respecting data, both in terms of its quality and quantity,” said Mukherjee. “Democratisation of data is not just the responsibility of a government, it is also the responsibility of citizens. We saw during Covid that the best datasets from India were actually crowd-sourced efforts, resulting from the contribution of the large number of talented citizens and professionals we have in India. Why don’t we continue with this journey with citizen data science and put together data systems that the world could be proud of.”
During the Covid-19 pandemic, some countries produced seminal observations that helped the world: For example, the emergence of a new variant or that the vaccine effect is waning. Countries like the UK, Israel, Denmark had excellent real-time population-wide data, accessible to scientists.
“They could do this because these countries have integrated data ecosystems with national coverage. In India, there were efforts to build outstanding data systems like CoWin or the Covid testing database, but it was nearly impossible to link the two databases due to a lack of interoperability and cross-talk. Creating encrypted identification for data linkage is essential to have a holistic multi-modal dataset. I am really looking forward to the rollout of the ABDM (Ayushman Bharat Digital Mission, which aims to create an integrated digital health infrastructure). Ultimately, you have to empower the public with trustworthy data and the knowledge derived from the data. Being a data activist, I want to see data as a mass movement, a public priority, and not viewed just as the responsibility of the government.”
Completingthe arc
What about her plan for the next decade? “We are building something special at the Yale School of Public Health with our trailblazing dean Megan Ranney at the helm. For the first time in my career, I have a female boss.’’
After a point, one’s success can and should be only measured through the success of others. “When I was an early career researcher, rising through academic ranks, I would go to conferences in India where there was not a single female on the speaker list. India is traditionally extremely strong in statistics, if you think about the phenomenal discoveries led by P.C. Mahalanobis, C.R. Rao, R.R. Bahadur, S.N. Roy, J.K. Ghosh and many others… statisticians of Indian origin have made seminal contributions to world statistics over the years. There is a paucity of female statisticians in that distinguished list.
“One of the reasons I did the Covid work and engaged with the media is that I saw physicists, economists, medical doctors discussing models on national television. This is our lane; we need to be present and vocal as statisticians,” said Mukherjee.
She also “realised that only a handful of women were given a platform to talk about pure science, decoupled from politics or social issues… hardcore math, hardcore biology, hardcore immunology. I wanted to be present with my science”.
“My biggest rewards from that period were outreach from other young women, for example, a high-school girl watching my interview in Jharkhand and writing to me: ‘What is Biostatistics?’ I have a steadfast commitment to grow the careers of other junior researchers, particularly women and other minority groups in science. As the first woman chair of the 75-year-old biostatistics department at the University of Michigan, I have recruited many junior women — they are now winning big prizes. I get extreme satisfaction from watching their growth and success as well as the success of my 25 doctoral and post-doctoral fellows. I am so proud of the 357 undergraduate summer programme students we trained over the last 10 years and the ripple effects of that programme. That probably completes the arc of that Anandamela interview from 35 years ago, making a difference in other people’s lives in a field which is traditionally not welcoming to all.”
The ICDMAI 2025 conference at St. Xavier’s College, Kolkata, brought together academicians, researchers, and industry leaders to discuss advancements in AI, data science and machine learning. Giving shape to the conference were professor Kanad Basu, who holds the title of Professor of the Practice at USC Marshall School of Business in the US, and Saptarshi Goswami, a computer scientist and educator from Calcutta. The college offers Bengal’s first postgraduate course in data science, made possible under the guidance of Father Dominic Savio, principal of St. Xavier’s College, Kolkata.