SSR analysis of the Cortaderia selloana transcriptome
High-throughput sequencing generated 333,731 non-redundant unigenes of average length 463.04 bp and total length 154,530,802 bp (Table 1). Screening these unigenes identified 20,684 SSR loci, representing an SSR frequency of 6.20%. Among these, 18,242 sequences contained SSR loci (SSR incidence 5.47%). Additionally, we detected 1197 unigene sequences harboring compound SSR loci (0.36% of total unigenes) and 987 sequences containing ≥ 2 SSR loci (0.30% of total unigenes).
Types of SSR repeat motifs
SSR repeat motifs ranged from mono- to hexanucleotides, with significant variation in abundances. Mononucleotide (7781; 37.62% of all SSRs) and trinucleotide repeats (8232; 39.80%) predominated, followed by dinucleotide repeats (4064; 19.65%). Tetra- to hexanucleotide repeats each accounted for < 2.00%, while compound SSRs constituted 1197 motifs (5.79%). Distribution patterns revealed pentanuc- leotide repeats to have the longest average distribution distance (1136.26 kb) but the lowest frequency (0.04%). Conversely, trinucleotide repeats had the shortest average distance (18.77 kb) and highest frequency (2.47%) (Table 2).
Base composition and proportion of SSR repeat motifs
A total of 20,684 SSRs comprising 171 repeat motifs were detected in the
C. selloana transcriptome, with an overall frequency of occurrence of 6.20% (Table 3). Numbers of six-nucleotide repeat types (mono, di, tri, tetra, penta, and hexa) gradually increased, with 2, 4, 10, 31, 41, and 83 types, respectively (frequencies ranging 0.01%-2.13%). However, the total number of SSR loci trended downward.
In terms of base composition, mononucleotide repeats had the fewest types (2), with A/T being the dominant type, forming 7107 loci, accounting for 91.34% of this motif category. Dinucleotide repeats included 4 types, mainly AG/CT, which formed 2462 loci (60.58% of dinucleotide motifs). The CG/CG type was the least frequent, with only 196 loci (4.82%). Trinucleotide repeats comprised 10 types, with CCG/CGG (2488 loci, 30.22%) being the most abundant, followed by AGG/CCT (1443 loci, 17.53%) and AGC/CTG (1082 loci, 13.14%). Least frequent was ACT/AGT (87 loci, 1.06%). Tetranucleotide repeats included 31 types, with AAAG/CTTT (42 loci, 12.80%) the dominant type, then AAAC/GTTT (35 loci, 10.67%). Remainding types accounted for < 10%, with the least frequent being ACGT/ACGT and ACCC/GGGT (1 locus each, 0.31% each). Pentanucleotide repeats had 41 types, with AGAGG/CCTCT (27 loci, 19.85%) and AAAAAG/CTTTTT (25 loci, 18.38%) being the most numerous. Hexanucleotide repeats had the highest number of types (83), with AAAAAAG/CTTTTT (14 loci, 9.79%) being most abundant (Table 3).
Repeat count distribution of SSR motifs
An inverse relationship existed between repeat count and motif abundance across all six nucleotide repeat types (Table 4). Key distribution patterns emerged. Mononuc-leotides were primarily distributed at 6-15 repeats (92.94% of mono-SSRs). Dinucleotides were dominated by 6-10 repeats (81.23% of di-SSRs). Trinucleotides were concentrated at 1-10 repeats (98.99% of tri-SSRs). Collectively, these dominant repeat ranges accounted for 90.32% of all SSR loci. Tetra-, penta-, and hexanucleotides exhibited preferential low-repeat distributions (≤ 6 repeats), with numbers of their main repeat motifs being 324, 109, and 110, respectively (1.57%, 0.53%, and 0.53% of all SSRs, respectively).
Evaluation of SSR availability
After filtering out fragments of length < 12 bp, the lengths of SSRs with different motifs in
C. selloana were analyzed (Table 5). The length of dinucleotide SSRs was mainly concentrated at 12-20 bp (66.78% of all dinucleotide SSRs). The length of trinucleotide SSRs was 12-20 bp (75.66% of all trinucleotide SSRs). For tetra-, penta-, and hexanuc-leotide repeats, SSR lengths were 1-20, 21-40, and 21-40 bp, respectively (56.40%, 86.76%, and 76.92% of their respective total numbers). Nucleotide lengths of compound SSRs were mainly concentrated at 20-40, 41-60, and 61-80 bp (25.65%, 21.55%, and 17.88% of all compound SSRs, respectively).
SSR lengths were mainly concentrated in the range of 12-40 bp, with 11,769 sequences (56.90% of all SSRs (20,684)). Among these SSR loci, sequences of 12-20 bp length were most numerous (9127, 44.13%), followed by those of 21-40 bp length (2217, 10.72%). Some 1,112 sequences were > 40 bp length (5.38% of all SSRs), with the most numerous being the 15-bp trinucleotide repeat (4640 in total, 22.43%), then 18 bp and 12 bp (7.68% and 6.04% of all SSRs, respectively). There were 3329 SSRs of length > 20 bp (16.10% of all SSRs). We speculate that these longer sequences may have greater polymorphism potential.
With the rapid advancement of high-throughput sequencing, reference genomes for numerous plant species have been assembled. However, non-model ornamental grasses-particularly those with complex chromosome ploidy and limited molecular biology research-still lack comprehensive genomic resources. Current transcriptome sequencing of
C. selloana relies on references from related species within the family, resulting in significantly lower data yields and fewer annotated unigenes compared with more commonly studied plants.
We obtained 333,731
C. selloana unigenes through transcriptome sequencing. From these, 20,684 SSR loci were identified. With a frequency of occurrence of 5.47%, frequency of appearance of 6.20%, and average distribution distance of 7.47 kb, the SSR loci appear to be sparsely distributed in the transcriptome, possibly related to the genomic characteristics of gramineous plants.
Li et al., (2023) used MISA software to search for SSR loci in 130,393 unigene sequences of the
Narenga porphyrocoma transcriptome; from 14,233 unigenes, 16,372 SSR loci were obtained, with a frequency of appearance of 10.92%
Mao et al., (2022) searched SSR loci in the 3
rd generation full-length transcriptome of
Psammochloa villosa and found 93,563 SSR repeat sequences distributed in 56,824 unigenes (appearance frequency 50.83%). In the full-length transcriptome of
Littledalea racemosa with 30,624 unigene sequences, 14,089 SSR repeat sequences were searched using MISA software (frequency of appearance 46.01%)
(Fu et al., 2025). Yin et al. (2024) searched for SSR loci in 214,676 unigene sequences of the
Hordeum brevisubulatum transcriptome, and detected 24,877 SSRs in 21,618 unigene sequences (frequency of appearance 11.59%).
Wang et al., (2022) performed transcriptome sequencing on an
Agropyron mongolicum hybrid and from 110,115 unigenes obtained 5620 SSR loci after locus searching (frequency of appearance 5.10%) (
Wang et al., 2022). Compared with these gramineous plants, the SSR frequency of appearance in
C. selloana is higher than for
A. mongolicum (5.10%), but lower than for
P. villosa,
L. racemosa,
N. porphyrocoma and
H. brevisubulatum. The frequency of occurrence of SSR loci in
C. selloana differs from transcri-ptomes of other gramineous plants. Inherent differences in gene structure of species may explain these differences, although factors such as the size of the analysis database, SSR retrieval tools, and parameter settings of retrieval conditions cannot be excluded
(Fu et al., 2021).
Regarding repeat types,
C. selloana SSRs were dominated by mononucleotide (37.62%) and trinucleotide repeats (39.80%), with dinucleotides secondary (19.65%). Tetra- to hexanucleotide repeats collectively comprised < 3% of all SSRs. This distribution aligns with patterns in other Poaceae species
L. racemosa (mono, 36.88%; tri, 39.34%; di, 21.13%) and
H. brevisubulatum (mono, 50.90%; tri, 28.78%; di, 17.48%) (
Yin et al., 2021) -confirming the prevalence of low-order repeats in grasses. Notably,
C. selloana exhibited a significantly higher proportion of trinucleotide repeats (39.80%) than
H. brevisubulatum (28.78%), suggesting enhanced variation potential in coding regions and greater utility for polymorphic marker development.
In terms of motif composition, the dominant motif types in
C. selloana are consistent with those of Gramineae. Among the 171 repeat motif types detected in the
C. selloana transcriptome, the mononucleotide A/T type accounts for the highest proportion (91.34%); the dinucleotide is dominated by AG/CT (60.58%), and the trinucleotide CCG/CGG is also high-frequency type (30.22%). This base preference is related to the codon usage preference of Gramineae plants and the high mutation rate in AT-enriched regions of the transcriptome. The conclusion that the dominant motif in dinucleotide repeats of SSR loci is AG/CT and the dominant motif in trinucleotide repeats of
C. selloana (CCG/CGG) is consistent with results for other Gramineae plants such as
Teosinte (
Li et al., 2023) and
Coix lacryma-jobi (
Ouyang et al., 2021). This observation also supports research reporting the dominant repeat motifs of dinucleotides and trinucleotides in SSRs of most monocotyledonous plants to be AG/CT and CCG/CGG, respectively
(Kantety et al., 2002). However, dominant motifs in plants such as
Diospyros rhombifolia (
Wang et al., 2022) and
Brassica campestris chinensis var.
purpuraria differ
(Xi et al., 2022), possibly because of species specificity or search conditions. The repeat motif types of nucleotides of different species vary, and the dominant motifs in different repeat types also change. This is conducive to development of SSR molecular markers. Among them, the CCG/CGG motif also has the highest proportion in
L. racemosa (31.32%) and
N. porphyrocoma (20.26%), indicating its evolutionary conservatism in monocotyledons.
High polymorphism-the core value of SSR markers-depends on repeat unit counts and the abundance of ≥ 20 bp fragments. Generally, SSRs ≥ 20 bp exhibit high polymorphism, and those of 12-20 bp show moderate polymorphism. In
C. selloana, SSR repeat counts predominantly ranged 6-15 (92.94% of total loci), with lengths concentrated in 12-40 bp (56.90%). Notably, 3329 SSR loci (16.10%) exceeded 20 bp. These longer repeats have enhanced polymorphism potential because of extended motif repetitions, making them promising targets for developing highly informative molecular markers.
Study limitations must be acknowledged. First, the SSR frequency in
C. selloana (6.20%) is significantly lower than that of
P. villosa (50.83%) and
L. racemosa (46.01%), possibly because of sequencing depth or unigene assembly completeness. Second, tetra- to hexanucleotide repeats constituted < 3% of all SSRs, constraining development of complex polymorphic markers. Future efforts could integrate genomic sequencing data to mine genomic SSRs for enhanced marker density, and validate high-polymorphism loci through primer screening. These advances will support cultivar identification, genetic linkage map construction, and stress-resistance gene localization in
C. selloana.