Molecular single-gene species delimitation
The sequences of each species alignment were collapsed into unique
haplotypes using the online .fasta sequence toolbox FaBox (Villesen
2007). We discarded four species either because no sequences were
available from any of the southern European peninsulas (Orthosia
cerasi ) or elsewhere on the continent (Eupithecia massiliata ,Phycita torrenti and Dryobota labecula ). We removed them
because those missing data precluded any pairwise comparison between
populations from at least one southern Peninsula and the rest of the
continent. Our final database included a total of 18 taxa, 17 species
plus the subspecies Colotois pennaria ssp. carbonii , which
was treated as distinct species in the analyses on intra-specific
genetic divergence (Table 1). We did so to be conservative and avoid
inflating intra-specific genetic divergence in Colotois pennaria .
The ssp. carbonii is morphologically distinct (Sihvonen & Skou
2015), and so the intra-specific divergence would be expected to be
higher than in those species in which no subspecies have been described
so far. The alignments of the unique haplotypes of each of the 18
species were visually inspected one by one in MEGA 7 (Kumar 2016); we
corroborated that there had been no errors and all sequences differed in
at least one base. Then, the haplotypes of all species were pooled
together and subsequently aligned again; this was the basic and final
alignment used for species delimitation with the Generalized Mixed Yule
Coalescent (GMYC) method.
The GMYC single-locus method identifies the number of independent
operational taxonomic units (OTUs) or putative species present in a
sample of sequences under the maximum likelihood solution (Pons et
al . 2006; Fujisawa & Barraclough 2013). Some OTUs can correspond to
cryptic species or isolated genetic lineages, thus, this sort of
analyses may well illustrate the patterns of geographical genetic
variability of these insects in Europe. GMYC requires an ultrametric
gene tree as input, which was built using the software Bayesian
Evolutionary Analysis Sampling Trees (BEAST 1.7.5; Drummond, Suchard,
Xie & Rambaut 2012). To select appropriate partitioning scheme and
substitution model for the three-codon positions of COI, we used
PartitionFinder version 1.1.1 (Lanfear, Calcorr, Ho & Guindon. 2012).
We tested between the available models in BEAST using the Bayesian
Information Criteron (BIC) and between all possible partitioning
schemes. PartitionFinder supported three separate partitions for the
codon positions, and three different models were selected according to
the BIC scores (equal-frequency Tamura-Nei model with Gamma correction
(G), Hasegawa Kishino Yano model with invariable sites (I) and
equal-frequency Tamura-Nei model with Gamma correction (G) for 1st, 2nd
and 3rd codon positions respectively). We used a strict clock model with
rate fixed to 1 and a constant size coalescent tree prior as this could
be considered conservative towards the null model when testing against
the GMYC model in a likelihood ratio test (Monaghan 2009). The effects
of tree reconstruction method and model for the GMYC results have been
investigated and, in general, a Bayesian estimation under a coalescent
tree prior has performed well in comparisons (Monaghan 2009; Talavera,
Dincă & Vila 2013; Tang, Obertegger, Fontaneto & Barraclough 2014).
Talavera et al . (2013) found no difference in GMYC results
between using a strict or relaxed clock model for inferring the input
ultrametric tree. We ran two Markov Chain Monte Carlo (MCMC) runs each
with 10 million generations, sampled every 1000 generations, which were
merged using LogCombiner. Convergence and Effective Sample Size (ESS)
values for sampled model parameters were monitored in Tracer version 1.6
(Rambaut, Suchard, Xie & Drummond 2014). TreeAnnotator was used to
select the maximum clade credibility tree (MCC tree) from the sampled
trees (burnin=0.25) with posterior median values used for node heights.
The maximum clade credibility (MCC) tree with branch lengths was
imported in R version 3.3.1 (R Core Team 2016) and the GMYC analysis was
conducted using the splits package version 1.0 (Ezard, Fujisawa &
Barraclough 2009), assisted by the ‘APE’ package (Paradis, Claude &
Strimmer 2004). The ‘splits’ package calculates the likelihood of the
tree under a single coalescent (null model) and under a GMYC model,
which may follow the single threshold or the multiple threshold
assumptions. In the first, a single threshold is placed at every node in
the tree, and the threshold at the maximum likelihood solution delimits
the number of evolutionary units. This method has shown to display close
correlation with the number of species in the tree (Fujisawa &
Barraclough 2013). The multiple threshold GMYC relaxes the assumption
that speciation events must be older than all coalescent events in the
gene tree. The method iteratively splits and fuses existing species
clusters starting from the single threshold solution, until no further
improvement in the maximum likelihood occurs (Fujisawa & Barraclough
2013). We performed both and compared the results in terms of the number
of OTUs delimited. Besides these tree-based methods, we used two
distance-based approaches for delimiting the number of OTUs, namely ABGD
(Automatic Barcode Gap Discovery) (Puillandre, Lambert, Brouillet &
Achaz 2012), and jMOTU (Jones, Ghoorah & Blaxter 2011). ABGD uses
genetic divergences between sequences and a coalescent model to identify
a barcode gap between intra- and interspecific distances, and defines
OTUs accordingly (Puillandre et al. 2012). jMOTU takes one, or a
range of, user-supplied distance cut-off values and clusters sequences
using single-linkage clustering so that no member of one OTU is closer
than the cut-off to any member of any other cluster (Jones et al.2011). A detailed explanation of the basis and principles of each method
are provided in the supplementary information (Table S3 and Fig. S1).