2.6 Preprocessing, alignments, and analysis of novel genes and transcripts
NanoFilt (version: 2.8.0) (De Coster & Rademakers, 2023) was used to filter the raw fastq data and obtain valid data for subsequent analysis (quality score > 7 and sequences longer than 50 bp). Data statistics were performed using SeqKit (version: 0.12.0) (Shen et al., 2016b). Alignment results were then analyzed and quantified using samtools (version: 1.11; parameters: flagstat) (Li et al., 2009). Flair (version: 1.5.0; parameters: -t 20) (Tang et al., 2020) was employed to obtain consistent sequences from the alignment results, then further aligned to the reference genome. Gffcompare software (version: 0.12.1; parameters: -R-C-K-M) (Pertea & Pertea, 2020) were used to compare the transcripts with the known transcripts of the genome and find new transcripts and new genes (FA download link: ftp://ftp.ensemblgenomes.org/pub/plants/release-52/fasta/oryza_sativa/dna/; GTF download link: ftp://ftp. ensemblgenomes.org/pub/plants/release-52/gff3/oryza sativa/).