Never Let the Important Become Urgent: A reflection on the genetics supply chain and our need to increase value to the end patient
» Read blog post
SVS 7 makes it easy to combine multiple datasets into a single all-encompassing spreadsheet for analysis. Instantaneously join genotypes with phenotypes and pedigree information, genotypes with log ratios, data from different arrays (Affymetrix 500K with 6.0), and data from different vendors (Affymetrix with Illumina). Further, convenient appends facilitates the merging of multiple sample sets, such as your cases with publicly available controls. Transposing spreadsheets is also efficient for plotting, data export, and other data manipulation processes.
Converse to joining data is creating subsets. SVS 7 provides a fast and flexible interface for selecting specific genomic regions or samples. Activate by thresholds or categories. Subset out chromosomes, genes or other regions for more targeted analyses and imputation. Investigate markers in one data set based off results from another.
In order to calculate the correct statistics it is necessary for data to be in the proper format and of the correct type. Data comes in all shapes and sizes and though SVS 7 auto-detects the formats of each variable upon import, the assigned type may not be what the researcher intended (e.g. categorical data represented as numbers will be interpreted as integers). A new fully integrated spreadsheet editor lets you edit and enrich your data on a grand scale. Find and replace within a column or the entire spreadsheet. Create or delete columns. Edit row labels, column headers or individual value. Convert columns from one type to another, and more. One powerful option automatically expands a categorical variable to set of binary variables representing each category.
Genotype data typically comes in the generic AB format. SVS 7 has an option that can take generic AB data and recode it in a number of ways. Encode genotypes as DD, Dd, dd based on in-sample calculations of major and minor alleles or encode them numerically (0, 1, 2) based on the additive, dominant, or recessive model. The latter is important for including genotypes as covariates for advanced regression analysis. You can also flip DNA strands for proper merging of different datasets and imputation procedures, or transcode generic AB alleles to their respective AGCT format.
Most vendors have their own identifiers for genetic markers. Typically these are mapped to a more generalized standard, such as rsID#. In order to properly merge and analyze datasets in different formats, it is essential to use a consistent identifier in both. In SVS 7, you can automatically rename your genetic column headers from one identifier to another if the new identifier is included in your genetic marker map.