The Philippine Society for Cell Biology, Inc. (PSCB) and the Philippine Genome Center (PGC) hosted a webinar titled Community-acquired COVID-19 infections in the Philippines: Knowledge gleaned from SARS-CoV-2 Whole Genome Sequencing last June 6 via Zoom. In the online event, PGC Executive Director Dr. Cynthia Saloma tackled the collaborative work of local scientists using different Genomics tools to analyze the COVID-19 outbreak in the country.
One of the major pillars of Genomics involves the sequencing—determining the nucleotide base arrangement—of the entirety or fragments of an entity’s genetic material, be it deoxyribonucleic acids (DNA) or ribonucleic acids (RNA). Mapping of the genome can then off-shoot to several endeavors, including the construction of phylogenetic trees, which are branched diagrams that show how genetically similar, and therefore how closely related, different species are.
In turn, this can be useful in Epidemiology, or the study of the incidence and spread of diseases. Applied to the current COVID-19 outbreak, examining the sequences obtained from different cases can enable tracing of the virus’ transmission route and source of infection.
Such was the premise for the research work pursued by Saloma and an extensive team comprising various experts and students from PGC, the National Institute of Molecular Biology and Biotechnology (NIMBB), the University of the Philippines system, and the Philippine General Hospital.
As framed by Dr. Jose Enrico Lazaro from the NIMBB and PSCB in his opening spiel, “We are making headway in understanding the virus and its pathology…[studying] the origin and relatedness of viral strains in a given place. Question here is, is [the spread of] COVID-19 in the Philippines largely community-based, or [is it] transmitted in some other way?”
Sequencing patient samples
Discussing the phases involved in their research, Saloma noted that a total of 500 samples were collected from partner hospitals in late March and subjected to reverse transcription polymerase chain reaction (RT-PCR) testing. Those deemed positive for the severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2)—the scientific name of the viral causative agent of COVID-19—were processed through capillary sequencing, whereas the negatives were analyzed using next generation sequencing. Identifying the normal sequences present in negative cases is important in distinguishing the genetic material belonging to viral strains.
The downstream workflow for positive samples included the “removal of low quality [nucleotide] bases and filtering out of non-target sequences, like human genes,” according to Saloma. This was meant to isolate the target sequences: in this case, the contigs or segments of the virus’ genetic material. Finally came the scaffolding and alignment step, in which the sample segments were matched with known sequences of the SARS-CoV-2 genome as the reference material, and compared with other isolates for phylogenetic analysis.
Although, for demonstration purposes, Saloma only presented a subset of six positive cases; as it turned out, these sequences shared five variants in common, which “represent [an] account of the unique mutational events [that occurred] across the phylogeny.”
Genes code for amino acids, which are then synthesized into protein molecules. Among the shared variants, four were missense mutations, each translating into a different amino acid in the proteins produced, while the other was a silent mutation resulting in an unchanged amino acid sequence. While a difference of one amino acid may not seem like a big deal, these substitutions can actually have significant consequences in the structure and function of the final protein.
Performing structural modeling—or using a software tool to predict the features of a protein based on its composite amino acid sequence—supported such possibilities, Saloma relayed. For example, one of the missense mutations causes a change from leucine to phenylalanine in the NSP6 protein, which could reduce the flexibility of its helical structure and decrease its overall stability.
Finding the source
The isolates also offer insight to tracking disease spread on a global scale. Constructed based on deposited sequences in the GISAID database, the “global subsampling tree” depicted how the local samples clustered with other countries’ submitted sequences, with the transmission path inferred to be from China to India to the Philippines.
From this, Saloma stated that the likely conjecture many would come up with is that “some samples from India may be ancestral to the Philippine samples.” However, such an explanation, she relented, seemed rather improbable and illogical.
There is a caveat, though, with how the data was processed.
“The time of transmission [used in the subsampling] is based on the reported sample collection date, but the virus may have already been circulating in the community days or even weeks prior,” Saloma pointed out, cautioning that the transmission analysis assumes disease spread to follow a linear course, when this may not necessarily be the case. Further, only a limited number of samples can be analyzed at a given time, suggesting that “the choice of samples to include in the subset will have [an] impact on the interpretation of results,” she added.
Narrowed down to the Asia-only subsampling tree, the hypothesized infection route had added Japan in between China and India. This insertion proved to make quite the difference, as Saloma laid out the groundwork for the viral infection narrative to unfold.
In addition to the subsampling tree, phylodynamic analysis was performed—a method to plot the evolutionary and epidemiological behavior of infectious agents, or how they mutate and spread—still using the genetic sequences from the GISAID database.
Clusters, described as sets of closely related or highly similar sequences, revealed that the Philippine samples grouped primarily with India and Japan’s deposits, lending support for the aforementioned transmission route. “These older samples [from Japan and India] can be used to trace the origins of the Philippine isolates we have,” Saloma explained.
Here, the dots could be connected into a cohesive story: the Japanese samples were collected from the Diamond Princess cruise ship that had docked in Yokohama in mid-February. If one were to recall the news headlines at that time, the ship had become a disastrous nexus for the COVID-19 outbreak, with dozens infected and forced to quarantine on deck. Aboard the ship were many Filipino and Indian crew members, who were eventually repatriated by the end of February.
Although they underwent a 14-day quarantine, the first local positive cases dated March 11 to 12 had come from the seafarers, Saloma recounted. Living in close quarters and handling beddings of possibly infected passengers without proper protective attire, she furthered, it is easy to see how the virus would have circulated among the crew.
The available information thus hinted toward the transmission having come “probably not from India, but most likely from the Diamond Princess cruise ship,” Saloma surmised.
Tedious and collaborative in nature, the team’s work is yet an ongoing and unfinished endeavor, but these preliminary results have demonstrated the relevant use of Genomics tools in better understanding how viral outbreaks unfold. It is also a cautionary tale about data interpretation; looking from different scales and incorporating different approaches may be necessary to paint a more comprehensive and contextualized picture. Moving forward, these methods hold the potential to draw from a deluge of data and help map out clusters of related infected cases.