‹‹ Back to SVS Home

Genotype Statistics by Sample

7.5 Genotype Statistics by Sample

Two types of overall sample statistics are available as output (see Figure 38).


[Picture]

Figure 38: Genotype Statistics By Sample

Data Requirements

General sample statistics require a dataset containing genotypic data. First, import your data into a SVS project (See Importing Your Data Into A Project). The sample statistics dialog can be accessed by selecting Quality Assurance > Genotype Statistics By Sample from the spreadsheet menu.

Processing

Select your sample statistics options and select the Run button to process. Descriptions of the sample statistics options are detailed below.

One spreadsheet of results will be created as a child of the current spreadsheet navigator window node. Information about the number of markers analyzed and the number of markers dropped due to having more than two alleles is entered into the Node Change Log for the Statistics by Sample spreadsheet.

Call Rate (fraction not missing)

This option displays the fraction of genotypes that are present and not missing for the given sample.

Hardy-Weinberg Thw P-Value

This option displays, for each sample, the p-value for the genome-wide test for departures of the minor allele count from two times the minor allele frequency of the corresponding markers. This is calculated over all active genotypic markers for the sample. This test does not require absence of linkage disequilibrium from the data and can detect even small deviations from Hardy-Weinberg equilibrium, which may be caused either by violations in the conditions for Hardy-Weinberg equilibrium or by genotyping error.

Output -log 10 p-values

These values are only available for Hardy-Weinberg Thw P-Value and calculates the
log 10(Hardy-Weinberg Thw p-value).

Output

The rows will correspond to samples and the columns in the output spreadsheet (which may or may not be present according to the options that were specified) will be:

  • # Markers: The number of markers used for the calculation.
  • Call Rate: The fraction of genotypes that are present and not missing for the given sample.
  • Thw p-value: P-value of the Hardy-Weinberg Thw statistic.
  • -log10 Thw p-value: Negative log-based-10 of the P-value of the Hardy-Weinberg Thw statistic.
  • Thw: The Hardy-Weinberg Thw statistic. Under the null hypothesis of no departure from Hardy-Weinberg equilibrium, this statistic follows an approximate χ2 distribution with one degree of freedom.
  • E(delta X): Expected residual marker score. The residual marker score at a given marker and sample is given by ΔX = Xi E(Xi) = Xi 2pi, where the marker score Xi is the number of minor alleles and E(Xi) = 2pi is the expected marker score based on the minor allele frequency pi of the marker (considered over all samples).
  • var(delta X): Variance of the residual marker score.
Subdivision of Output by Cases vs. Controls

If a binary column is selected as a dependent variable, the same outputs described above will be repeated for cases only and for controls only. Missing value indicators will be placed in the case-only columns of controls and the control-only columns of cases. In addition, the overall analysis will only be performed on samples for which the binary column has non-missing data.

Note that when the analysis is performed only on cases, only cases are used to determine which is the minor allele and what its frequency is. Similarly, when the analysis is performed only on controls, only controls are used to determine which is the minor allele and what its frequency is.