Morphological characterization of the plant material
The morphological characteristics of the accession (ICROPS 1374) align with the descriptions provided by
Don (1827).
Allium cepa var.
aggregatum (ICROPS 1374; Fig 1) exhibits green foliage with a shaft that is 29.67±1.34 mm long and 3.52±0.23 mm in diameter. The bulb is broadly elliptic, aggregated, with thin deep purplish red skin and white flesh. The bulbs also have protective bulb-coat leaves, purplish, brown, or white (
Geerinck, 1993). Each aggregate is composed of 5 to 8 bulbs, this “dividing” habit constantly appears in breeding lines of the common large-bulbed forms of
A. cepa (
Jones and Mann, 1963).
Complete genome assembly and annotation of Allium cepa var. aggregatum
The assembled complete cp genome of
Allium cepa var.
aggregatum (GenBank no. OP756522) had a typical quadripartite circular structure and a sequence length of 153,587 bp, which is longer than the chloroplast genomes of its closest related
Allium species,
Allium cepa (153,529 bp; KM088013.1),
Allium cepa strain CMS-T (153,568 bp; KM088015.1) and
Allium cepa genotype normal (N) (153,538 bp; KF728080.1) available in NCBI. It includes a pair of inverted repeats (IRa and IRb) with 26,468 bp each that separates the rest of the genome sequences into two single-copy regions: LSC region (17,931 bp in length) and SSC region (17,031 bp in length) (Fig 2). The overall GC content of the cp genome is 36.8% with base compositions of 31.3% A, 31.9% T, 18.1% G and 18.7% C. A total of 126 coding genes were annotated in the cp genome of
A.
cepa var
aggregatum comprising 38 tRNA genes, 8 rRNA genes and 80 protein-coding genes. Based on the functions of these genes, there are 43 photosynthesis-related genes, 29 self-replication genes, 6 other genes with different functions (
matK, cemA, accD, clpP, cCsA and
inFa) and 1 unknown conserved open reading frame (Table 1). This is the first reported complete chloroplast genome of
A. cepa var.
aggregatum.
Phylogenetic analysis
Phylogenetic analysis showed that
A. cepa var.
aggregatum clustered with other
Allium cepa accessions to form a distinct clade. However, despite their close relationship,
A. cepa var.
aggregatum was markedly separated from the six other
A. cepa accessions with strong bootstrap support (Fig 3)
. The observed topology within the
A. cepa clade may be attributed to infraspecific differences initially observed in this taxon with 19 recorded botanical varieties that are presently considered as heterotypic synonyms (
POWO, 2022). This indicates that infraspecific plastome differences are observed between
A. cepa individuals, unlike other taxa with conserved plastome sequences across accessions (
Waters and Schaal, 1991;
Agrawal et al., 2014; Liu et al., 2021). Moreover, we observed that
A. cepa accessions were closely related to
A. schoenoprasum. A similar relationship, within and between species, was resolved by
Jimenez et al., (2020) where they included plastomes of fertile and sterile strains of
A. cepa in their phylogenetic analysis of
Allium species in the Amaryllidaceae family.
Plastome comparison
Inverted Repeat (IR) lengths of
Allium cp genomes ranged from 12,186 bp to 26,468 bp with
A. cepa genotype male sterile (S) and
A. cepa genotype normal (N) having the shortest and longest IR regions, respectively. Differences were observed across all junction sites among the seven
A. cepa plastomes (Fig 4)
rps19 and
rpl22 were distributed in LSC/IRb boundary across six
A. cepa accessions with varying distances of two genes to the junction. Only the
rpl22 of
A. cepa var.
aggregatum was placed 30 bp away from the LSC/IRb boundary. For SSC/IRa boundaries, the
ycf1 gene located on the SSC region had a varying extension length to the IRa regions between the six
A. cepa accessions while this gene was absent in
A. cepa var.
aggregatum. The
ndhF genes, located primarily in the SSC region, were extended by 1 bp and overlapped the IRb region for five out of seven
A. cepa genotypes at the IRb/SSC boundary. Moreover, there is also differential placement of
psbA gene in reference to the IRa/LSC junction. The size variations, as observed in the seven plastomes sequenced from different
A. cepa, may be attributed to the expansion and contraction of the IR and SSC junction sites
(Amiryousefi et al., 2018). The compositions of these genomes were also different in terms of the number of mRNA and rRNA genes (Table 2). These observed variations, along with the differences in the junction sites, provide additional resolution to the differences observed in the
A. cepa plastomes. This further supports the distinctness of the
A.
cepa var.
aggregatum plastome sequenced. These recorded chloroplast genome variations could further facilitate generation of DNA barcodes needed for germplasm identification at the variety level.