Time-Series Trend of Pandemic SARS-CoV-2 Variants Visualized Using Batch-Learning Self-Organizing Map for Oligonucleotide Compositions

(A) NUMBER OF SEQUENCES WITH LESS THAN 10% UNKNOWN NUCLEOTIDES
CLADE\CONTINENT	ASIA	EUROPE	NORTH AMERICA	OCEANIA	AFRICA	SOUTH AMERICA	UNKNOWN	TOTAL
S	794	1,860	3,449	664	110	74	0	6,951
L	823	3,196	600	65	4	11	0	4,699
V	247	4,687	402	253	13	23	0	5,625
G	979	20,928	6,568	1,106	1,141	461	0	31,183
GH	2,058	10,325	23,916	964	232	176	0	37,671
GR	2,657	42,888	5,251	11,135	1,632	1,129	0	64,692
GV	3	12,229	3	14	0	0	0	12,249
O	2,220	1,127	553	531	60	25	0	4,516
Non-human host	35	247	19	0	1	4	13	319
#Total	9,816	97,487	40,761	14,732	3,193	1,903	13	167,905
(B) NUMBER OF SEQUENCES WITH LESS THAN 1% UNKNOWN NUCLEOTIDES
CLADE\CONTINENT	ASIA	EUROPE	NORTH AMERICA	OCEANIA	AFRICA	SOUTH AMERICA	UNKNOWN	TOTAL
S	731	1,047	3,056	466	71	58	0	5,429
L	760	1,964	549	49	2	10	0	3,334
V	228	3,036	366	207	10	17	0	3,864
G	877	15,200	5,071	858	634	300	0	22,940
GH	1,923	8,365	19,014	717	191	150	0	30,360
GR	2,425	32,518	4,549	9,166	1,180	871	0	50,709
GV	3	10,712	3	11	0	0	0	10,729
O	1,824	522	349	415	30	9	0	3,149
Non-human host	30	176	19	0	1	0	13	239
#Total	8,801	73,540	32,976	11,889	2,119	1,415	13	130,753

BLSOM for pentanucleotide usage. **(A)** Pentanucleotide composition and **(B)** their odds ratio for sequences with less than 10% unknown nucleotides. **(C)** Pentanucleotide composition and **(D)** their odds ratio for sequences with less than 1% unknown nucleotides. Lattice points that include sequences from more than one clade are indicated in black, those that contain no genomic sequences are indicated by blank, and those containing sequences from a single clade are indicated in color as follows: S (), L (), V (), G (), GH (), GR (), GV (), O (), non-human host (). **(E)** Distribution of sequences by continent on the BLSOM with the pentanucleotide odds ratio. Lattice points that include sequences from more than one continent are indicated in black, those that contain no genomic sequences are indicated by blank, and those containing sequences from a single continent are indicated in color as follows: Asia (), Europe (), North America (), Oceania (), Africa (), South America ().

3D display of viral classification by clade and continent. The Z-axis corresponds to the number of sequences attributed to each lattice point. Results for all continents are shown in the ALL panel for each clade. In clades G, GH, GR and GV, lattice points where less than 5 sequences exist are not shown. The vertical bars for individual continents are distinguished by the following colors: Asia (), Europe (), North America (), Oceania (). Different subclusters are given suffix numbers.

3D display of temporospatial changes. The Z-axis corresponds to the number of sequences attributed to each lattice point. Results for all collection months are shown in the ALL panel for each continent. The vertical bars for individual clades are distinguished by the following colors: S (), L (), V (), G (), GH (), GR (), GV ().

Analysis of 100% stack bar graph for time-series transition in each continent for each subcluster in clades S **(A)**, L **(B)**, V **(C)**, G **(D)**, GH **(E)**, and GR **(F)**. The colors of each subcluster are indicated at the bottom of each figure. The results for months with more than 100 sequences are shown as thick horizontal bars. The number of sequences used in this analysis is given in Supplementary Table S1.

References

Authors

Metrics

Articles in this issue

DOI: https://doi.org/10.5334/dsj-2021-029 | Journal eISSN: 1683-1470

Journal RSS Feed

Language: English

Page range: 29 - 29

Submitted on: Mar 23, 2021

Accepted on: Sep 10, 2021

Published on: Sep 21, 2021

Published by: Ubiquity Press

In partnership with: Paradigm Publishing Services

Publication frequency: 1 issue per year

Keywords:

COVID-19,

SARS-CoV-2,

Oligonucleotide composition,

Batch-Learning Self-Organizing Map (BLSOM),

Unsupervised explainable machine learning,

Time-series trend

© 2021 Takashi Abe, Ryuki Furukawa, Yuki Iwasaki, Toshimichi Ikemura, published by Ubiquity Press
This work is licensed under the Creative Commons Attribution 4.0 License.

Volume 20 (2021): Issue 1

Time-Series Trend of Pandemic SARS-CoV-2 Variants Visualized Using Batch-Learning Self-Organizing Map for Oligonucleotide Compositions

Figures & Tables

Table 1

Figure 1

Figure 2

Figure 3

Figure 4

Paradigm

My account