The linkage map used in the current study has already been published and was created using R scripts developed using previously described methods (Bourke et al. 2016 (
link), 2017 (
link); Preedy and Hackett 2016 (
link)). Full details of map construction are described in Bourke et al. (2017 (
link)). The final integrated linkage maps had 25,695 SNP markers (not all unique positions) and covered 55 of the 56 expected parental homologues (the base chromosome number in
Rosa is 7; each tetraploid parent is expected therefore to have 28 “homologue” maps, resulting in 56 maps across both parents).
A subset of these markers was chosen for the estimation of inheritance probabilities in the population using the TetraOrigin software (Zheng et al. 2016 (
link)). In an outcrossing autotetraploid, there are nine distinct marker segregation types, namely 1 × 0, 0 × 1, 2 × 0, 0 × 2, 1 × 1, 1 × 3, 1 × 2, 2 × 1 and 2 × 2, where the numbers represent the dosage of the marker in the mother and father, respectively. All other marker types (e.g. 4 × 1) can be converted to one of these 9 types without loss or distortion of their linkage information (Bourke et al. 2016 (
link)). For each 1-centiMorgan (cM) interval, a single marker from each marker segregation type was selected (if possible) which had the lowest number of missing observations across the population. TetraOrigin (Zheng et al. 2016 (
link)) was run on
Mathematica version 10 (Wolfram Research Inc. 2014 ), allowing both bivalent_decoding options (False/True) in the ancestral inference stage. This generated identity-by-descent (IBD) probabilities for the population under a model that allowed for both bivalents and multivalents in the parental meiosis (i.e. bivalent_decoding = False), as well as a purely bivalent model for which double reduction is ignored (i.e. bivalent_decoding = True). In the latter case, unexpected scores are treated as genotyping errors by the software. The following settings were used: parental dosage error probability (epsF) = 0; offspring dosage error probability (eps) = 0.001; parental bivalentPhasing = True (which assumes that bivalent pairing predominates across the population in the determination of parental marker phase). The IBD probabilities at the marker positions were interpolated at 1-cM intervals using the default settings of
smooth.spline in R (R Core Team 2016 ) and saved for subsequent QTL analysis.
We compared both a one-stage and two-stage analysis to detect QTL. For the one-stage analysis, we converted the 36-genotype probabilities into 8 haplotype probabilities (these are also provided in the TetraOrigin output), where each haplotype probability
Hi is the inheritance probability of haplotype
i at a particular locus per individual. We first fit the “full” model using the
lmer function from the lme4 package (Bates et al. 2015 (
link)) with option REML = FALSE (i.e. using maximum likelihood) as follows:
where the fixed-effect terms
H1 and
H5 were dropped to satisfy the boundary conditions
and
(and where
μ is the intercept term). The random part of the model containing environmental (
E) and block (
B) effects was also separately fit using the
lmer function as follows:
We performed a model comparison using the
anova function in R, which performs a likelihood ratio test and returns a
p value from a comparison of the test statistic to a χ
2 distribution. The − log
10(
p) values therefore give an approximation to the more usual LOD scores from similar genetic studies. However, we also wanted to determine empirical significance thresholds, for which we ran permutation tests (Churchill and Doerge 1994 (
link)) with 1000 permutations and
α = 0.05, permuting the order of the haplotype probabilities and saving the maximum − log
10(
p) value from each genome-wide scan to generate approximate 95% genome-wide significance thresholds.
The alternative approach (“two-stage analysis”) we tested was a weighted linear regression on the single-environment BLUEs, with the 36-genotype IBD probabilities as weights. We first fit environmental effects (
E) in the following simple linear model:
This allowed us to impute any missing observations using the fitted model (but only in cases of 1 missing value—we did not impute if 2 or more of the 4 observations were missing). The residuals
ɛjk from this were carried forward as the
Y vector to subsequently detect QTL effects. The form of the model used has been described in detail elsewhere (Hackett et al. 2013 (
link), 2014 (
link); Kempthorne 1957 ), namely:
where each
Xi is an indicator variable for one of the eight parental homologues, having taken the inheritance constraints
and
into account, and weighting by the IBD probabilities as calculated by TetraOrigin (and
μ is the intercept). Note that because of the balanced design, this is equivalent to including environment as a fixed term in the QTL model (although our approach also allowed the inclusion of incomplete data within an ordinary linear model context).
For the traits bending time, height and vigour that were measured in the Wageningen summer season alone (WAG_S), a single-environment analysis was performed, using the phenotype values rather than residuals as the dependent variable. Genome-wide significance thresholds per trait were determined using permutation tests by recording the maximum LOD score from each of 1000 genome-wide QTL scans using permuted genotypes, with the 95th percentile of the sorted LOD scores taken as the threshold. Single-environment analyses were also performed to assess the stability of QTL across environments, with significance thresholds per environment and per trait determined using permutation tests as described. To facilitate visualisation, we plotted LOD profiles from the single- and multi-environment analyses together, with the single-environment profiles adjusted so that significance thresholds were equal (i.e. with multiple y-axes on a single plot). Regions for which the LOD profile of the multi-environment QTL analysis exceeded the significance threshold were re-mapped by saturating the LOD-2 intervals of the QTL peaks with extra markers before re-running TetraOrigin to generate more precise IBD probabilities in the vicinity of QTL. These extra markers were selected as previously described but with a binning window of 0.1 cM in the QTL interval, added to the already selected marker set from the initial QTL scan. We used the two-stage approach for QTL detection (weighted regression on the single-environment BLUEs, with the IBD probabilities as weights), focusing on the linkage groups where QTL were originally detected.
QTL peaks from the (marker-saturated) multi-environment analysis were also explored to determine the most likely QTL segregation type and mode of action using the Bayesian information criterion (BIC) (Schwarz 1978 (
link)) as described by Hackett et al. (2013 (
link), 2014 (
link)). We tested all bi-allelic additive, simplex-dominant and duplex-dominant configurations [where simplex- and duplex-dominant are defined by the number of alleles required to give full expression of the QTL (either 1 or 2 copies, respectively) (Rosyara et al. 2016 (
link))]. We also tested for multi-allelic QTL by considering all possible configurations of up to five different alleles, with unconstrained allele effects [so that for example three alleles
Q1,
Q4 and
Q8 offspring QTL classes—
Q1,
Q4,
Q8,
Q1Q4,
Q1Q8,
Q4Q8 and
Q1Q4Q8 were assumed to have different means. This is equivalent to the “codominant factor” designation in TetraploidSNPMap (Hackett et al. 2017 (
link))]. We estimated the average contribution of each homologue as
where
using IBD probabilities
πi, multi-environment BLUEs
yi and overall population mean
. These were visualised to help clarify the allelic effects around QTL peaks. The average QTL allele effect was estimated using a weighted regression of the QTL genotype counts on the 36 genotype means, using the cumulative probability in each of the 36 genotype classes as weights (i.e. weighted by the approximate number of individuals in each class, which sum to
N). The slope of the regression and standard error of the estimate were recorded. In the case of multi-allelic QTL, the effect of each allele was estimated separately. For QTL predicted to exhibit dominance, QTL genotype counts were coded as 0 or 1 for both simplex and duplex dominance models (with a 1 being assigned to QTL genotype classes carrying at least 1 (respectively 2) copies of the predicted QTL alleles for simplex (respectively duplex) dominant QTL).
For comparison purposes, we also conducted a single-marker analysis of variance (ANOVA) for each trait on the marker dosage classes for all mapped markers (using the multi-environment BLUEs as phenotypes), with the − log
10(
p value) of the model fit used as a proxy for the LOD score. Significance thresholds were determined using permutation tests with
N = 1000 and
α = 0.05 as described above. ANOVA and IBD-based results were visualised together to enable a comparison of the two approaches, adjusted so that significance thresholds overlapped.
A genetic co-factor analysis was performed using the detected QTL peak positions as co-factors in a two-stage analysis as previously described. In cases where multiple QTL were found, all QTL were simultaneously used as co-factors within a single model. Each QTL was supplied as a set of six fixed terms (
H2,
H3,
H4,
H6,
H7 and
H8 corresponding to six of the eight parental haplotype probabilities at the QTL peak for the population, accounting for the inheritance dependence between them). Significance thresholds were re-calculated using permutation tests.
The genotypic information coefficient (GIC) per homologue, a measure of how much information we have to estimate QTL effects across the mapped genome, was determined using the following formula:
where
π is the inheritance probability of homologue
j in individual
n at a particular locus, estimated using TetraOrigin (Bourke et al. 2018 ).
Bourke P.M., Gitonga V.W., Voorrips R.E., Visser R.G., Krens F.A, & Maliepaard C. (2018). Multi-environment QTL analysis of plant and flower morphological traits in tetraploid rose. TAG. Theoretical and Applied Genetics. Theoretische Und Angewandte Genetik, 131(10), 2055-2069.