Accurate runs of homozygosity estimation from low coverage genome
sequences in non-model species
Abstract
Runs of homozygosity (ROH) are increasingly being analyzed using whole
genome sequences in non-model species as a measure of inbreeding and to
assess demographic history, thus providing useful information for
conservation. However, most studies have used Plink for ROH inference
which has been shown to perform poorly when sequencing depth is below
10X, often underestimating the true proportion of the genome in ROH,
which could lead to erroneous status assessment and management
decisions. We use whole genome sequences from caribou, a non-model
species at risk, subsampled to sequencing depths ranging from 1X to 15X,
to assess the performance of ROHan, a program developed to enable ROH
estimation using lower coverage sequences but so far only optimized for
human data. We use 22 individuals with varying extent of inbreeding to
assess the effects of sequencing depth, input parameters, and
demographic history on the inference of ROH. We found that accurate
estimation of the percentage of the genome and lengths of ROH can be
achieved down to depths as low as 3-5X. However, input parameters and
the demographic history of the individual can have a dramatic effect on
results. Using our optimized settings, we then re-analyze low coverage
sequences from a small and isolated caribou population and demonstrate
high levels of inbreeding which had previously been missed. We provide
recommendations for thorough optimization of parameters including the
need for multiple runs as well as careful interpretation of outputs to
enable robust ROH inference using low coverage whole genome sequences in
wildlife species.