Results

AAV production

AAV were produced by co-infecting Sf9 insect cells with rBAC-GFP and rBAC-REP/CAP (MOI of 0.05 for each virus) in a 2 L stirred tank bioreactor (Figure 1a ). Cell growth kinetics after infection (Figure 1b ) followed a typical profile for a low MOI infection, indicated by one cell population doubling until growth arrest at 24 hpi comparable to previous studies (Correia et al., 2022; Pais et al., 2019; Pais et al., 2020 and Virgolini et al, 2022). The production yielded 2.5 × 1010 VG.mL-1 intracellular genome-containing AAV particles at 72 hpi (Figure 1c ). As one bioreactor run generated enough cells for single-cell RNA-seq analysis and the cell and AAV profiles were consistent with those of previous runs from our group, no further replicates were conducted.

scRNA-seq data processing and quality control

Samples for scRNA-seq were acquired at 0 hpi (prior to infection), 10 hpi and 24 hpi (Figure 1b ). Following Illumina sequencing, an average of 309 ± 21 million reads were acquired for each of the three samples (Table S1 ). The BD Rhapsody WTA analysis pipeline from Seven Bridges Genomics was utilized to process sequencing data, perform genome mapping, and generate RMCE-adjusted molecule UMI count matrices. An average of 5,309 ± 234 barcodes per sample passed the initial quality control step (e.g., read quality filtering, RSEC adjustment, cell label filtering), yielding an average of 21,647 ± 1,397 UMIs and resulting in the detection of 3,405 ± 299 genes per cell (Table S1 ). Retained barcodes were imported into Seurat and further assessed using established quality parameters, such as the proportion of mitochondrial UMIs (Figure 1d ) and the detected UMIs per cell (Figure S1a ). For quality control, only the proportion of UMIs originating from the mitochondrial genes were considered and barcodes where this proportion was below 5% were retained, resulting in a total of 5,288, 4,750 and 5,159 cells for 0, 10 and 24 hpi, respectively, to be considered for further analysis (Figure 1e ). While the total number of genes identified per cell showed a similar median value along infection, a bimodal distribution can be observed at 24 hpi, with 21% of cells having less than 2,000 identified genes, because of a high percentage of baculovirus mRNA molecules in cells at this timepoint (Figure 1f overall results of baculovirus UMIs per cell inFigure S1b ).

Sf9 cell heterogeneity

To assess variations in the transcriptome of Sf9 associated with infection, each timepoint was combined into a single Seurat object for analysis. Merged samples were log-normalized and scaled, regressing out variations caused by different total UMIs per cell. For principal component analysis, 2,000 variable features were considered, and 20 principal components were used to construct a UMAP and perform clustering analysis. A total of 11 clusters were identified, of which 5 were present prior infection (Figure 2a ), indicating an inherent degree of heterogeneity in this Sf9 cell line. One cluster (Cluster 7) was clearly separated at 0 hpi. Using the FindMarkers function with the MAST differential expression test revealed that cells in Cluster 7 had significant overexpression of stress response genes (e.g., heat shock protein 68 and protein lethal(2)essential for life ) and ribosomal RNA, suggesting increased cell stress and/or early signs of cell death (Table S2 ).
As cell cycle has been shown to contribute to heterogeneity in scRNA-seq datasets (Tirosh et al., 2016; Tzani et al., 2021), classification of cells to cell cycle phases was attempted using the Cell Cycle Scoring method in Seurat. For this 25 G2/M and 30 S phase genes (Table S3 ) were used to calculate a score representing the probability that a certain cell is in a particular phase of cell cycle. At early stages of infection (0 and 10 hpi), the predicted classification of cells seems to corroborate observed data, as can be observed by comparing the distribution of cells associated to each cell cycle phase (Figure S2a-b ) with the relative expression of 3 selected cell cycle-related genes (Figure S2c ). At later stages of infection (24 hpi) this is no longer valid, with the increase in the amount of baculovirus transcripts in infected cells hindering effective association of cells to a correct cell cycle phase, as demonstrated by the overall low expression of the G1 cell cycle-related gene (Figure S2c ) but still association with G1 cell cycle phase (Figure S2b ). As a result, cell cycle regression was not considered for further analysis.
Following visualization of the 3 cell populations (Figure 2a ), a clear difference in the position of each population can be observed suggesting an increased impact on the cell transcriptome as infection progresses. This impact can be associated with enhanced expression of baculovirus genes, with the furthest clusters (Clusters 5 and 8) showing an average of 92 ± 3% of baculovirus mRNA per cell (Figure 2b ) and the lowest number of detected genes, averaging 783 ± 331, while total UMIs per cell stayed similar to other clusters (Figure S3 ). To support this assumption, an established early gene (proliferating cell nuclear antigen ) and a late baculovirus gene (viral cathepsin ) were selected and their expression mapped to the identified clusters (Figure 2c ). While theproliferating cell nuclear antigen (early gene) showed elevated expression in clusters (0, 4, and 6) associated to early infected cells (i.e., 10-24 hpi), the viral cathepsin (late gene) only shows high expression in clusters (5, 8, and 10) associated with more advanced stages of infection (i.e., 24 hpi).

Expression of AAV transgenes

Following the evaluation of cell heterogeneity, cells were categorized throughout infection according to the expression of each target transgene (rep , cap , and gfp - Figure 3a ). Baculovirus and transgenes were not detected in cells prior to infection (0 hpi); this number increased to 36.2 % at 10 hpi, and reached close to 100 % at 24 hpi (all cells were infected) (Figure 3b ).
At 24 hpi, 29.4 % of cells expressed transgenes from both rBACs (i.e., expressed gfp and rep /cap ) and are therefore capable of producing packaged AAV. Moreover, 59.0 % of cells expressedgfp only, 3.3 % of cells expressed rep /cap , and 8.3% of cells had still no AAV transgene expressed at this timepoint (Figures 3a-b ). In addition, there is a poor correlation between expression of gfp and rep (Figure 3c ); for rep and cap, their expression is similar, althoughrep seems to be expressed at higher levels than cap(Figure 3d ).

Identification of cell marker genes associated to infection

Gene expression changes associated to baculovirus infection were identified using only the 10 hpi sample. This timepoint was deemed preferential over the 24 hpi one for two reasons: (1) it contains infected, non-infected and presumably cells in intermediate states; (2) the number of UMIs acquired from the host cell transcriptome at 24 hpi timepoint was extremely low. Given the high number of non-infected cells (63.8%) at 10 hpi, merging with the 0 hpi datasets was deemed unnecessary.
The 2,000 most variable genes were retained for PCA, with the first 20 principal components to establish UMAPs. Overall, 9 clusters were identified, separated into two major cluster groups (Figure 4a ). Clusters 3, 4, 6 ,8 were identified as clusters of infected cells, with 4 and 8 being most advanced in the infection process according to the percentage of baculovirus transcripts, and clusters 0, 1, 2 as clusters of non-infected cells (Figure 4b ). Cluster 7 diverged from the remaining clusters of non-infected cells, showing fewer total UMIs and genes per cell compared to others (Figure S4a-b ) and thus was not included in the differential expression analysis. To identify marker genes of infected cells, the FindMarkers function was used with the MAST test. Overall, 291 differentially expressed genes between infected and non-infected cells were identified, of which 134 were upregulated and 157 downregulated in infected cells clusters (Table S4 ). Genes significantly upregulated derived mostly from the baculovirus, with only 9 identified as host cell genes; these 9 genes were associated to stress response (e.g., heat shock protein 68 , peroxiredoxin-6 ) and microtubule mobility (dynein regulatory complex protein-8 ) (Figure 4c ). Genes downregulated in infected cells were associated to metabolic processes (e.g., mitochondrial glutamate dehydrogenase and, L-lactate dehydrogenase and aminomethyltransferase ), ion transport (e.g.,innexin-3 ) as well as apoptosis inhibitors (baculoviral IAP repeat-containing protein 5 ) (Figure 4c ).

Assessing significantly enriched gene ontology terms along infection

To further assess transcriptional changes along infection, non-differential expression analysis was conducted. Monocle3 was used to perform trajectory analysis and thus evaluate which genes vary along pseudotime in cells at 10 hpi. The 10 hpi sample was again deemed preferential due to the before mentioned reasons. A trajectory graph was drawn, from which it becomes evident that clusters associated to infection were furthest along pseudotime (Figure 5a ). Monocle3 identified 5,704 genes varying along pseudotime (q -value < 0.01), of which 2616 were deemed significant (Table S5 ). Gene ontology analysis of respective genes revealed biological processes such as stress response to infection (e.g., cellular response to virus), cell growth (e.g., cell division, DNA replication and transcription), cell cycle and protein folding significantly vary along infection (Figure 5b ).