‹‹ Back to SVS Home

Genotype Statistics by Marker

7.3 Genotype Statistics by Marker

Several types of overall marker statistics and genetic measures are available as output (see Figure 36).


[Picture]

Figure 36: Genotype Statistics By Marker Window

NOTE:

  1. These statistics are only available for bi-allelic markers.
  2. If there is a case/control dependent variable, or a dependent variable being treated as a case/control, several of these statistics can be calculated for all samples, for cases only, or for controls only.
  3. If there is a quantitative dependent variable, the Genotype Counts option will also return an average of the dependent variable for each genotype, and missing genotypes if there are any.
  4. These statistics can be calculated simultaneously with running a genotypic association test. To do this, see section Genotype Association Tests.
Data Requirements

General marker statistics require a dataset containing genotypic data. Optionally, case/control data will be used to subdivide most of these statistics according to “case” and “control” status. A quantitative dependent will output an average of the dependent variable for each genotype category. First, import your data into a SVS project (See Importing Your Data Into A Project). Once you have the spreadsheet for this data, select the column representing the dependent variable (See Column States) if you wish to subdivide your statistics by “case” and “control” or get average values. If no dependent variable is selected, then only overall statistics will be returned. The marker statistics dialog can be accessed by selecting Quality Assurance > Genotype Statistics By Marker from the spreadsheet menu.

Processing

Select your marker statistics options and select the Run button to process. Descriptions of the marker statistics options are detailed below.

One spreadsheet of results will be created as a child of the current spreadsheet navigator window node. Information about the number of markers analyzed and the number of markers dropped due to greater than two alleles is entered into the Node Change Log for the Marker Statistics spreadsheet.

NOTE:

  • The markers that were skipped (because of not being bi-allelic) will not be included in the results spreadsheet.
Call Rate

This option displays the fraction of genotypes that are present and not missing for the given marker.

With data from certain providers you can also set a confidence threshold on import to indicate which genotypes are to be called or not.

Allele Frequencies

This option displays four columns, Minor Allele, Minor Allele Freq., Major Allele, and Major Allele Freq. For each marker, the minor allele is identified in the Minor Allele column. The Minor Allele Freq. is the fraction of the given marker’s total alleles that are minor alleles. Similarly, the major allele is identified for each marker in the Major Allele column. The Major Allele Freq. is the fraction of the given marker’s total alleles that are major alleles. If the marker is monomorphic then the lone allele is reported as the major allele and the major allele frequency is reported as 1 and the minor allele is reported as missing with a minor allele frequency of 0.

Hardy-Weinberg Equilibrium P-Value

This option displays the Hardy-Weinberg Equilibrium (HWE) Correlation P-Values for each marker.

This statistic will also be output separately for cases and for controls, if applicable.

Please see the section in the Formulas and Theories chapter for how this statistic is computed (General Marker Statistics).

Fisher’s Exact Test for HWE P-Value

This option displays Fisher’s Exact Test HWE P-Values for each marker.

This statistic will also be output separately for cases and for controls, if applicable.

Please see the section in the Formulas and Theories chapter for how this statistic is computed (General Marker Statistics).

Signed HWE R

This option displays the Signed HWE Correlation R for each marker. This is a measure designed to show specifically if the data for this marker shows a tendency towards being homozygous (positively signed R) or towards being heterozygous (negatively signed R).

This statistic will also be output separately for cases and for controls, if applicable.

Please see the section in the Formulas and Theories chapter for how this statistic is computed (General Marker Statistics).

Genotype Count Table(s)

The numbers of samples that contain each genotype are output. These will also be output separately for cases and for controls, if applicable. If a quantitative dependent variable was selected then an average of the dependent variable for each genotype category (DD, Dd, dd, Missing) will be calculated for each marker.

Allele Count Table(s)

The counts for each allele are output. These will also be output separately for cases and for controls, if applicable.