‹‹ Back to SVS Home
Genotypic Principal Component Analysis
7.8 Genotypic Principal Component Analysis
Data Requirements
Genotypic principal components analysis requires a dataset containing genotypic data. First import genotypic data into
an SVS project (see Importing Your Data Into A Project). Once a spreadsheet has been created in a project with genotypic
data, access the Genotypic Principal Component Analysis options dialog by selecting Quality Assurance > Genotypic
Principal Component Analysis from the spreadsheet menu.
Before principal components can be computed for genotypic data, a numeric equivalent to each genotype is established,
depending on whether the additive, dominant, or recessive model is selected for the genetic model. (The PCA technique is
not available for the basic allelic test or the genotypic test, since establishing a numeric equivalent is not as straightforward
or even possible for those models.)
NOTE:
- It is common practice to inactivate markers that are known to have data quality issues before using principal components analysis.
Using the Genotypic Principal Components Analysis Window
NOTE:
- This window (see Figure 40) essentially accomplishes the functions of the PCA Parameters tab from the Genotype Association Tests dialog obtained through the Analysis menu when Correct for Stratification with PCA is selected on the Association Test Parameters tab. The only difference is that it is not necessary to simultaneously perform an association test to use the separate Genotype Principal Component Analysis window.
Processing
The principal components can be computed, or if they have already been computed for the dataset, the spreadsheet of
principal components can be selected after selecting the “Use precomputed principal components” option. See
Applying PCA to a Superset of Markers and Applying PCA to a Subset of Samples for specific limitations of this
feature.
Select the PCA parameters – specifically, the genotypic model, maximum number of components to find and correct for,
normalization method, spreadsheets to output, and whether to eliminate component outlier subjects. When PCA outlier
removal is performed by recomputing components, there are selections for the number of times to recompute the
components, the criteria for determining an outlier, and the number of components to remove outliers from.
See Correction of Input Data by Principal Component Analysis for more information on the options on this
dialog.
Select the Run button to perform the analysis.
When the analysis is complete, a message indicating the number of markers analyzed and the number of markers that were skipped will be appended to the Node Change Log for each output spreadsheet, and all spreadsheets selected for output will be opened.
Spreadsheet Outputs
The possible output spreadsheets are as follows:
- The corrected input data. (Recall that genotypic data is first converted to numeric data by the selected genetic model.)
- The principal components spreadsheet with rows according to the sample and columns according to the component. These components will be sorted by eigenvalue, large to small. Only the number of components requested will be shown.
- The eigenvalue spreadsheet will simply show the eigenvalues from greatest to smallest (of the number of components requested).
- If re-computation of components after removal of outliers was selected, and outliers were found, then a spreadsheet will be created to list these outliers and the iteration and component in which they were found.
NOTE:
- If you wish to plot any outputs (such as the one column in the eigenvalue spreadsheet) select Plot Numeric Values from the Plot menu. The column of data will be plotted against the row labels or eigenvalue number.
- If you wish to plot a principal component eigenvector against another eigenvector, select XY Scatter Plots from the Plot menu. See Multi-Color Scatter Plots for PCA or Gender Analysis for more information.