‹‹ Back to SVS Home
Genotype Statistics by Marker
7.3 Genotype Statistics by Marker
Several types of overall marker statistics and genetic measures are available as output (see Figure 36).
NOTE:
- These statistics are only available for bi-allelic markers.
- If there is a case/control dependent variable, or a dependent variable being treated as a case/control, several of these statistics can be calculated for all samples, for cases only, or for controls only.
- If there is a quantitative dependent variable, the Genotype Counts option will also return an average of the dependent variable for each genotype, and missing genotypes if there are any.
- These statistics can be calculated simultaneously with running a genotypic association test. To do this, see section Genotype Association Tests.
Data Requirements
General marker statistics require a dataset containing genotypic data. Optionally, case/control data will be used to subdivide
most of these statistics according to “case” and “control” status. A quantitative dependent will output an average of the
dependent variable for each genotype category. First, import your data into a SVS project (See Importing Your Data Into A
Project). Once you have the spreadsheet for this data, select the column representing the dependent variable (See
Column States) if you wish to subdivide your statistics by “case” and “control” or get average values. If no
dependent variable is selected, then only overall statistics will be returned. The marker statistics dialog can
be accessed by selecting Quality Assurance > Genotype Statistics By Marker from the spreadsheet
menu.
Processing
Select your marker statistics options and select the Run button to process. Descriptions of the marker statistics options are
detailed below.
One spreadsheet of results will be created as a child of the current spreadsheet navigator window node. Information about
the number of markers analyzed and the number of markers dropped due to greater than two alleles is entered into the Node
Change Log for the Marker Statistics spreadsheet.
NOTE:
- The markers that were skipped (because of not being bi-allelic) will not be included in the results spreadsheet.
Call Rate
This option displays the fraction of genotypes that are present and not missing for the given marker.
With data from certain providers you can also set a confidence threshold on import to indicate which genotypes are to be
called or not.
Allele Frequencies
This option displays four columns, Minor Allele, Minor Allele Freq., Major Allele, and Major Allele Freq. For each
marker, the minor allele is identified in the Minor Allele column. The Minor Allele Freq. is the fraction of the
given marker’s total alleles that are minor alleles. Similarly, the major allele is identified for each marker in
the Major Allele column. The Major Allele Freq. is the fraction of the given marker’s total alleles that are
major alleles. If the marker is monomorphic then the lone allele is reported as the major allele and the major
allele frequency is reported as 1 and the minor allele is reported as missing with a minor allele frequency of
0.
Hardy-Weinberg Equilibrium P-Value
This option displays the Hardy-Weinberg Equilibrium (HWE) Correlation P-Values for each marker.
This statistic will also be output separately for cases and for controls, if applicable.
Please see the section in the Formulas and Theories chapter for how this statistic is computed (General Marker
Statistics).
Fisher’s Exact Test for HWE P-Value
This option displays Fisher’s Exact Test HWE P-Values for each marker.
This statistic will also be output separately for cases and for controls, if applicable.
Please see the section in the Formulas and Theories chapter for how this statistic is computed (General Marker
Statistics).
Signed HWE R
This option displays the Signed HWE Correlation R for each marker. This is a measure designed to show specifically if the
data for this marker shows a tendency towards being homozygous (positively signed R) or towards being heterozygous
(negatively signed R).
This statistic will also be output separately for cases and for controls, if applicable.
Please see the section in the Formulas and Theories chapter for how this statistic is computed (General Marker
Statistics).
Genotype Count Table(s)
The numbers of samples that contain each genotype are output. These will also be output separately for cases and for
controls, if applicable. If a quantitative dependent variable was selected then an average of the dependent variable for each
genotype category (DD, Dd, dd, Missing) will be calculated for each marker.
Allele Count Table(s)
The counts for each allele are output. These will also be output separately for cases and for controls, if applicable.