2.2. Data compilation, extraction and gap-filling
From the final selected publications, we compiled all data and associated metadata including the following: 1) nitrate and/or DOC concentrations, 2) climate classifications for each burned watershed, 3) location of the wildfire, 4) year of the wildfire, 5) and the time since fire from when the sampling occurred (Cavaiani et al., 2024). For studies where nitrate or DOC concentrations were only presented within a figure, we used Web Plot Digitizer 4.7 (Rohatgi, 2023) to extract discrete values. In order to extract discrete values, we digitized the points by manually setting the range of the x and y-axis and then clicking on each point which yielded approximate values for each point. This was performed systematically by the same operator to reduce person-by-person variation in data extraction. Data compilation was then further completed in R v. 4.2.3 (R Core Team, 2023).
To group watersheds by climate, we extracted coordinates directly from the publications when reported, or found coordinates via Google Maps that correspond to the reported location. These coordinates were run through an interactive climate analysis web platform (Zepner et al., 2021) to classify sites based on the Köppen-Geiger climate classification systems (Fig. S2). Collectively, the sites in this study represented eight unique climate guilds: Mediterranean, subarctic, warm-humid, warm-mediterranean, hot-mediterranean, subtropical highland, humid subtropical, and cold semi-arid (Dsb, Dfc, Dfb, Csb, Csa, Cfb, Cfa, BSk). Finally, geospatial variables for each site were extracted primarily from the Environmental Protection Agency StreamCat Dataset (Hill et al., 2016) and NHDPlusV2.1 using a custom R script (Willi & Ross, 2023). Land use/land cover data for all the sites were extracted from StreamCat, while catchment area and slope were extracted via NHDPlusV2.1 with the R package ‘nhdplusTools’ (Blodgett & Johnson, J. M., 2022).
Data reporting standards across studies were highly variable. Some studies reported data on weekly or monthly scales, while others were taken seasonally or annually. Rarely (1 out of 18 studies), were samples taken daily. To achieve a time-series of daily stream concentrations, we used the Zoo R package (Zeileis et al., 2023) to interpolate stream concentrations between discrete sampling periods within each publication. We set a gap-fill limit of 50 days to reduce the amount of uncertainty in the interpolation process. This gap-fill limit eliminated much of the winter season.
In most cases, relevant discharge information was not available to calculate flux or yield estimates. This was in part due to lack of discharge information presented in the publication. Often site locations were not within reasonable proximity to known gauge stations. Therefore, due to the limited discharge data available across studies, we generated a pseudo yield metric to facilitate intercomparisons across watersheds of different sizes by normalizing nitrate and DOC concentrations to watershed area and aggregating these by year:
\begin{equation} \frac{\text{mg}\ \text{NO}_{3}^{-}-N}{L{\bullet\text{km}}^{2}\bullet\text{yr}}\nonumber \\ \end{equation}
OR
\begin{equation} \frac{\text{mg}\ \text{DOC}-C}{L{\bullet\text{km}}^{2}\bullet\text{yr}}\nonumber \\ \end{equation}
where mg/L is milligrams of analyte per liter normalized to either mass of nitrogen (nitrate) or carbon (DOC), km2 is watershed area, and yr is year.