The world of proteins remains remarkably mysterious. It turns out that a vast number of them have been hiding in plain sight. In a study published last month, scientists revealed 4,208 previously unknown proteins that are made by viruses such as influenza and HIV. Researchers elsewhere have been uncovering thousands of other new proteins in bacteria, plants, animals and even humans.
“If we ever want to understand fully how our biology works, we have to have a complete accounting of all the parts,” said Thomas Martínez, a biochemist at the University of California, Irvine, US.
Earlier, a lot of it used to be luck. In 1840, for example, Friedrich Ludwig Hünefeld (in pic), a German chemist, became curious about earthworm blood. He collected blood from a worm and put it on a glass slide. When he looked through a microscope, Hünefeld noticed platelike crystals: he had discovered haemoglobin.
A century later, scientists accelerated the search for proteins by working out how our bodies make them. Each protein is encoded by a gene in our DNA. To make a protein, our cells make a copy of this gene in the form of a molecule called messenger RNA, or mRNA. Then a cellular factory called a ribosome grabs the messenger RNA and uses it to assemble the protein from building blocks.
The search sped up even faster when scientists began sequencing entire genomes in the 1990s. Researchers could scan a genome for protein-coding genes, even if they had never seen the protein before. Scanning the human genome led to the discovery of 20,000 genes.
But scientists later discovered that they were actually missing a lot of proteins by searching this way.
Once more, the discovery came by accident. Researchers at the University of California, San Francisco, US, wanted to monitor the proteins that cells made. They figured out how to fish ribosomes from cells and inspect the messenger RNA that was attached to them.
The method, called ribosome profiling, delivered a surprise. On closer inspection, many of the messenger RNA molecules did not correspond to any known gene. Previously unknown genes were making previously unknown proteins.
In the years that followed, scientists learned how genome scanning had led them to miss so many proteins. For one thing, they thought they could recognise protein-coding genes by a distinctive sequence of DNA that told a cell to start copying a gene. It turns out that a lot of genes don’t share that start sequence.
Scientists also assumed that most proteins were big, made of hundreds or even thousands of building blocks known as amino acids. The thinking was that proteins needed to be big in order to carry out complex chemistry. But in fact, a lot of the new proteins turning up were smaller than 100 amino acids long. Some of these microproteins contain just a couple dozen amino acids.
Other scientists have been uncovering a similar abundance of microproteins in other species. “All these studies in all these organisms have discovered a new universe of proteins that previous methods failed to detect,” said Shira Weingarten-Gabbay, a systems biologist at Harvard Medical School in the US.
As a graduate student, Weingarten-Gabbay became interested in looking for hidden proteins in viruses. But it’s a challenge: scientists must infect human cells with viruses, then wait for the cell’s ribosomes to start grabbing viral messenger RNA and make proteins.
Unfortunately, scientists don’t know how to grow a lot of human viruses quickly in the lab. And even when scientists can coax them to grow, the experiments still take a long time to carry out because of the safeguards required to make sure nobody gets sick. When the Covid-19 pandemic started in 2020, Weingarten-Gabbay and her colleagues carried out a ribosome study on the new coronavirus. It took four months.
“The truth is that for the great majority of the viruses, we don’t have information on these hidden microproteins,” Weingarten-Gabbay said.