2.2. Data compilation, extraction and gap-filling
From the final selected publications, we compiled all data and
associated metadata including the following: 1) nitrate and/or DOC
concentrations, 2) climate classifications for each burned watershed, 3)
location of the wildfire, 4) year of the wildfire, 5) and the time since
fire from when the sampling occurred
(Cavaiani et al.,
2024). For studies where nitrate or DOC concentrations were only
presented within a figure, we used Web Plot Digitizer 4.7
(Rohatgi, 2023) to
extract discrete values. In order to extract discrete values, we
digitized the points by manually setting the range of the x and y-axis
and then clicking on each point which yielded approximate values for
each point. This was performed systematically by the same operator to
reduce person-by-person variation in data extraction. Data compilation
was then further completed in R v. 4.2.3
(R Core Team, 2023).
To group watersheds by climate, we extracted coordinates directly from
the publications when reported, or found coordinates via Google Maps
that correspond to the reported location. These coordinates were run
through an interactive climate analysis web platform
(Zepner et al., 2021)
to classify sites based on the Köppen-Geiger climate classification
systems (Fig. S2). Collectively, the sites in this study represented
eight unique climate guilds: Mediterranean, subarctic, warm-humid,
warm-mediterranean, hot-mediterranean, subtropical highland, humid
subtropical, and cold semi-arid (Dsb, Dfc, Dfb, Csb, Csa, Cfb, Cfa,
BSk). Finally, geospatial variables for each site were extracted
primarily from the Environmental Protection Agency StreamCat Dataset
(Hill et al., 2016)
and NHDPlusV2.1 using a custom R script
(Willi & Ross,
2023). Land use/land cover data for all the sites were extracted from
StreamCat, while catchment area and slope were extracted via NHDPlusV2.1
with the R package ‘nhdplusTools’
(Blodgett & Johnson,
J. M., 2022).
Data reporting standards across studies were highly variable. Some
studies reported data on weekly or monthly scales, while others were
taken seasonally or annually. Rarely (1 out of 18 studies), were samples
taken daily. To achieve a time-series of daily stream concentrations, we
used the Zoo R package
(Zeileis et al.,
2023) to interpolate stream concentrations between discrete sampling
periods within each publication. We set a gap-fill limit of 50 days to
reduce the amount of uncertainty in the interpolation process. This
gap-fill limit eliminated much of the winter season.
In most cases, relevant discharge information was not available to
calculate flux or yield estimates. This was in part due to lack of
discharge information presented in the publication. Often site locations
were not within reasonable proximity to known gauge stations. Therefore,
due to the limited discharge data available across studies, we generated
a pseudo yield metric to facilitate intercomparisons across watersheds
of different sizes by normalizing nitrate and DOC concentrations to
watershed area and aggregating these by year:
\begin{equation}
\frac{\text{mg}\ \text{NO}_{3}^{-}-N}{L{\bullet\text{km}}^{2}\bullet\text{yr}}\nonumber \\
\end{equation}OR
\begin{equation}
\frac{\text{mg}\ \text{DOC}-C}{L{\bullet\text{km}}^{2}\bullet\text{yr}}\nonumber \\
\end{equation}where mg/L is milligrams of analyte per liter normalized to either mass
of nitrogen (nitrate) or carbon (DOC), km2 is
watershed area, and yr is year.