‹‹ Back to SVS Home
Runs of Homozygosity
8.4 Runs of Homozygosity
Runs of Homozygosity Overview
Case/control genotype studies frequently look at individual SNPs for correlations between genes and phenotypes. SNPs can be found to not be associated with positive case status when adjacent SNPs are, making it difficult to find entire groups of genes associated with diseases or other conditions. It has been found that large regions of homozygous SNPs can be found in common between groups of people without direct common lineage, and these regions can be areas of functional significance [Lencz 2007]. A new approach, termed Runs of Homozygosity (ROH), has been developed to identify susceptibility loci across the genome where there has been significant evolutionary pressure.
Using Runs of Homozygosity Window
The Runs of Homozygosity (ROH) analysis feature can be found in the spreadsheet menu Analysis > Runs of Homozygosity and will only be available if the spreadsheet of genotypic data is marker mapped.
ROH Options
When the Runs of Homozygosity dialog opens (see Figure 46), there are several analysis options. Minimum Run Length The minimum run length is the length a sequence of homozygous SNPs must be to be considered a run of homozygosity. The minimum run length can be specified in two ways.
- Distance: Specify the minimum run length in kb (kilobase pairs) based on the genomic position information in the marker map. This option also allows for the specification of the minimum number of SNPs for the specified minimum distance.
- SNPs: Specify the minimum run length solely in the number of SNPs in a run, regardless of how far the
markers are apart.
Missing Genotypes One of the following options must be selected:
- Allow runs to contain up to ... missing genotype(s): Specify allowable number of missing genotypes. The number selected must be an integer greater than or equal to 0.
- Allow any number of missing genotypes: Selecting this option still requires that the first and last genotype
be homozygote genotypes.
Note:
- To not allow any missing genotypes in a run, select Allow runs to contain up to ... missing genotype(s) and set the value to 0.
Minimum Density of a Run The optional parameter Minimum density of a run: 1 SNP per ... kb restricts how runs are formed. If the minimum density is selected and the criteria is not met then a run is not formed.
Output Spreadsheets
At least one of the following output spreadsheets must be selected.Homozygous Runs Spreadsheet To get this output spreadsheet, check the Create a spreadsheet of runs per sample option.
The optional Homozygous Runs of Length >= … spreadsheet details each run of homozygosity found. Displayed are data for the chromosome, start and end genomic position in base pairs, length of the run/segment, start and end column indices, the number of SNPs in the run/segment, the number of missing values and the number of heterozygotes.
Binary ROH Run Status To get this output spreadsheet, check the Create a spreadsheet with binary ROH run status option.
The optional Binary ROH Status for Runs of Length >= … spreadsheet indicates if the data from each marker is part of a run of homozygosity for each sample. For instance, if the genotype for a marker is homozygous for a particular sample, and there are enough surrounding and consecutive homozygous genotypes for a sample to constitute a run, then a 1 is placed in the cell for the marker and sample. If, on the other hand, for a particular sample there there is a homozygous genotype marker surrounded by two heterozygous genotypes then a 0 is placed in the cell for the homozygous marker and sample. All heterozygous genotypes are replaced with 0’s unless a run is allowed to have one or more heterozygotes.
Incidence of Common Runs per SNP To get this output spreadsheet, check the Create a spreadsheet with the incidence of common runs per SNP option.
The optional Incidence of Common Runs per SNP spreadsheet displays columns for the original SNP column number, the number of runs associated with each SNP (i.e. the number of samples that have a run overlapping the SNP), and the chromosome number for the SNP. Row labels are the SNP names.
Cluster of Runs Spreadsheet To get this output spreadsheet, check the Output Cluster of Runs for Association Studies option, and specify the minimum number of samples for a cluster in the Minimum # samples that must contain a run: option. The minimum number of samples is the number of samples that must share a run of homozygosity for a run to be considered “common”. For more details about these parameters, see Runs Of Homozygosity (ROH) Algorithm.
The Cluster of Runs with >= … spreadsheet displays information about the clusters of SNPs where common runs of homozygosity were found including the genomic range of the cluster and the number of SNPs in the region. Additionally, spreadsheets containing the proportion of SNPs in each cluster that are members of common runs of homozygosity. This information can be output in two ways as described below.
- First column of each cluster: The proportion of SNPs for each cluster is only output for the first column (or marker) of each cluster. This format is best for association tests as there is a reduction in the total number of tests and thus the multiple testing correction.
- One column per marker: The proportion of SNPs for each cluster is output for all columns (markers) for each cluster. This format is best for visualization of the data in a heat map.
Association Analysis using ROH Covariates
After the ROH analysis is complete, association studies can be performed by joining either the “First column of each cluster” or “One column per marker” spreadsheet with phenotypic data. Set the phenotypic variable in the joined spreadsheet as a dependent variable and perform numeric association tests. See Numeric Association Tests for more information.
For more information on the ROH algorithm, please see Runs Of Homozygosity (ROH) Algorithm.