A Covid-19 blog post for the non-expert
The story that Covid-19 is a genetically-engineered virus from a Wuhan laboratory has gone viral across the internet (dated pun intended). President Trump implied it, and the Daily Express ran a 10th March headline, “Coronavirus may have been genetically engineered for the efficient spreading in the human population, a bombshell new study has claimed.” The article was withdrawn soon after.
The title of this post is deliberately provocative, and it may attract conspiracy theorists, so I’ll say up front, there is no evidence whatsoever SARS-CoV-2, the causative virus for Covid-19, was genetically engineered in China, or anywhere else. Instead, everything points towards completely natural origins which Charles Darwin would recognise (and probably say, “I told you so”). If you’re not a conspiracy theorist and you want to know why I’m confident this is the case, read on.
Everyone has heard of Covid-19 (caused by SARS-CoV-2), Middle East Acute Respiratory Syndrome (caused by MERS-CoV) and Severe Respirator Syndrome (caused by SARS-CoV). These are well known because the infection spread across multiple countries but there have been four other, less infective and therefore less well known outbreaks, HKU1, NL63, OC43 and 229E. In all seven cases, the coronavirus jumped from an animal species into human. How do we know this? It goes back to the genetic code and Darwinian evolution.
The molecular structure of DNA comprises bases called purines and pyrimidines pointing inwards from a double helical backbone of phosphate and deoxyribose sugar. There are two purines called adenine and guanine and two pyrimidines called thymine and cytosine. The DNA molecule is constructed somewhat like a child’s magnetic building kit, where north and south poles of magnets stick together to make a cube or a pyramid, or other similar shape. The child soon discovers that although north and south poles stick together, north-north or south-south poles repel. They can’t build their pyramid from magnets joined with like-poles, no matter how hard they try. Like north and south poles of magnets, bases of DNA can only join pyrimidine to purine, never pyrimidine to pyrimidine or purine to purine. In fact, they are more specific than that, because adenine (A) always bonds to thymine (T) and guanine (G) always bonds to cytosine (C). The sequence of A-T and G-C in DNA forms the genetic code, and is collectively known as the genome. It encodes for all the processes of life that makes everything from viruses to you and me. (There are some great animations on DNA and its function here).
How does a sequence of A-T and G-C go to make the complexity of all life? The DNA helix unwinds and makes ribonucleic acid (RNA) from the A, T, G, C template. RNA is like DNA except it has ribose instead of deoxyribose, and instead of thymine it has another base called uracil. Putting aside the complexities of the three types of RNA (messenger – mRNA, transfer – tRNA and ribosomal – rRNA) the molecule translates the A,T, G,C code into a series of amino acids. Each amino acid is coded by a sequence of 3 bases, GGT, for example, codes for the amino acid, glycine. By joining together long strings of amino acids, we make proteins and proteins control all the biochemistry of life. To quote from a previous blog post:
“Proteins are hugely complex molecules made from strings of up to 20 distinct types of amino acids. Some proteins contain hundreds or even thousands of individual amino acids; the muscular protein, titin has 30,000 of them. The long strings of amino acids fold like tangled pieces of string but unlike string, the tangles are very precise. Proteins have molecular grooves and pockets where specific biochemical reactions take place. The grooves and pockets are analogous to spanners and wrenches in a biochemical tool kit, each fitting a particular sized nut or bolt in building the machinery of life.”
Coronaviruses take a shortcut that leaves out DNA, and goes directly to RNA, but it’s still the sequence of bases which codes for viral existence. It takes over the genetic machinery of mammalian cells, mostly human lung cells in the case of SARS-CoV-2, to make viral proteins, which builds lots of new viruses. The human cells are destroyed in the process, giving us the symptoms of Covid-19. If you wanted to engineer a coronavirus pandemic, then you would have to start with its RNA and it’s the same genetic sequence that tells us SARS-CoV-2 was made by mother nature herself.
Coronaviruses jump from one species to another, making them so called zoonotic viruses, but this isn’t that uncommon. In fact perhaps three quarters of all viruses are zoonotic, or at least transmitted through a vector, such as mosquitoes. SARS-CoV-2 most probably started out in horseshoe bats, based on a 96% identical RNA sequence to the RaTG13 virus endemic in that species. There is also a related RNA sequence in SARS-CoV-2 implicating the scaly anteater, called a pangolin, as an intermediary species before infecting humans. The origins of SARS-CoV-2 are written right there in its genetic code but we have to be careful because there’s much we don’t know about viruses carried by the thousands of mammalian species across the world. We do know that the vast majority of species-hops, result in a cul-de-sac for the virus and it’s only extremely rarely that a virus jumps into humans and then is able to transmit human to human. It seems however, a mutation in RaTG13 to SARS-CoV-2 is one of those rare cases.
But what about the 4% difference in RNA between RaTG13 and SARS-CoV-2, where did that come from? The genome of RaTG13 changed for the same reason the genome alters in any organism. Random changes as imperfections in the RNA→protein process arise as mutations. Some mutations are fatal and the organism dies, and some give it a small advantage so it perpetuates through the generations. Once a chance mutation in RaTG13 RNA gave it the advantage of human to human transmission – it then thrived giving us Covid-19. This random mutation, followed by biological selection is what defines Darwinian evolution. It’s happening all the time, it’s just we don’t see it until there’s a pandemic.
Not all parts of any genome are equally susceptible to mutation and one notable region of the SARS-CoV-2 RNA is particularly prone to variation. This is the region which codes for the so-called spike protein that recognises a receptor on the mammalian cell enabling it to enter that cell and infect it. It’s this variable region of RNA in coronaviruses generally that gives them their nasty species-hopping talent. From analysis of the RNA sequence, all seven coronavirus outbreaks likely originated from other mammalian species through variations in RNA coding for spike proteins. By looking at the RNA sequence in this variable region we know, both MERS-CoV and SARS-CoV, like SARS-CoV-2, most likely originated in bats, while the milder HKU1 coronavirus originated in mice and OC43 likely came from cattle. Scientists have identified precise changes in RaTG13 RNA and the corresponding amino acid sequence alterations in the spike protein which transformed it to SARS-CoV-2, enabling it to latch onto a human cell-surface protein called ACE leading to infection (I blogged about ACE previously). From the rate of coronavirus mutation, we can estimate this mutation probably occurred sometime in the last 40 to 70-years. There’s nothing unusual in the way RaTG13 mutated to SARS-CoV-2 and, indeed, it’s exactly what you would expect it to do, given the right opportunity as humankind expands into previously uninhabited ecosystems.
Worryingly, the mutation which gave us SARS-CoV-2 does not optimise the spike protein’s infectivity and so Covid-19 is actually a milder disease than nature might have given us. Even more worryingly, a variant was recently identified where another change in the variable spike protein region has increased SARS-CoV-2’s potency. The trouble is, a single error in copying the virus’s 30,000 base pairs in the RNA code can result in replacement of one amino acid in the protein for another, thereby changing the protein’s functionality. Where GGT codes for glycine, for example, just one changed base to GCT now codes for the amino acid alanine.
The problem with all conspiracy theories, be it genetically engineered SARS-CoV-2, or fake moon landings, they use a simple lie to hide the complex truth. As soon as you get below the surface, conspiracy theories lack detail and rely instead on the idea of vast networks of people, all somehow holding on to the dastardly secret. Like all the world’s virologists and molecular geneticists conspiring to keep genetically engineered SARS-CoV-2 from the unknowing public. And the fact I’ve just explained the natural process by which SARS-CoV-2 arose, just makes me part of the conspiracy. I’d like to explain more, but I’m scheduled to attend an illuminati meeting in Atlantis, so I’ll see you next time.