Statistical analyses
Our main goal was to assess whether the inclusion of sequences from genetic diversity hotspots (in our case the Iberian and Italian Peninsulas) increases intra-specific genetic divergence. We did not consider the Balkan Peninsula, as barely any sequences were available in BOLD and we had not sampled in that geographic region.
In order to test whether southern Europe was underrepresented in the Barcode of Life Data System relative to its species richness we checked the geographical distribution of all the study species at the GBIF website (GBIF.org 2017). The study species are common ones and we could thus assess their distribution range reliably on GBIF records. In parallel, we checked another database (Lepidoptera Mundi, lepidoptera.eu) based on records and bibliographical data to confirm the species geographical distribution. We took the southernmost and northernmost European records for each species of the study group and assumed that these were the limits of its geographical distribution in Europe; in between them, the species would be present. We then counted to which extent the number of species recorded decreased with increasing latitude starting from southern Iberian Peninsula. At the same time, and taking only the DNA barcodes available in BOLD (not including the individuals sequenced in this project) we assessed the relationship between latitude and the number of barcodes. Regression fitting was done using STATISTICA (Statoft Inc 2005).
To assess whether genetic divergence was higher in the pairwise inter-population comparisons when at least one of the populations was Iberian or Italian we performed linear mixed models (LMMs) using ‘nlme’ package (Pinheiro, Bates, De Roy, Sarkar & R Core Team 2017) of R (R Core Team 2016). We did so because we considered both fixed and random mixed-effects in the regression models. Four types of pairwise contrasts between populations were defined: i) between two European populations excluding Iberian and Italian ones (contrasts abbreviated henceforth as EUEU), ii) between one European population (not Italian) and one Iberian (abbreviation EUIB), iii) between one European population (not Iberian) and one Italian (EUIT) and iv) between two Iberian populations (IBIB). The pairwise comparisons between only Italian populations were not conducted due to low sample size.
We performed three LMM tests: the first one to assess whether the genetic divergence differed between EUEU and EUIB pairwise population contrasts, the second to calculate the same but between EUEU and EUIT; and the third one to assess it within the same geographical area (EUEUvs IBIB contrasts). In all the analyses, the genetic divergence (measured as K2P% distance) was the dependent variable and the type of population contrast the independent factor; the pairwise spatial distance between populations was the covariate. Additionally, the largest number of sequences at each pairwise comparison between populations was also included as covariate to control for the potential effect that sample size could have on genetic divergence. In the EUEU vs IBIB analysis the spatial range was reduced to 1000 km, as the maximum distance between any pair of Iberian populations was lower than that. In the three tests the species of Lepidoptera was included as a random factor.