Data Editing, Manipulation and Enrichment

The sheer size and complexity of whole genome data makes it extremely difficult to work with. SVS 7 eliminates these hassles with real-time spreadsheet manipulation, data editing, and data enrichment on a grand scale.

Tutorial

Check out our Add-On Script Repository for freely available scripts that will make it even easier to edit, manipulate, and enrich your data.

Visit Script Repository »

Joining, Appending, and Transposing

SVS 7 makes it easy to combine multiple datasets into a single all-encompassing spreadsheet for analysis. Instantaneously join genotypes with phenotypes and pedigree information, genotypes with log ratios, data from different arrays (Affymetrix 500K with 6.0), and data from different vendors (Affymetrix with Illumina). Further, convenient appends facilitates the merging of multiple sample sets, such as your cases with publicly available controls. Transposing spreadsheets is also efficient for plotting, data export, and other data manipulation processes.

Creating Subset Spreadsheets

 

Subsetting Data

Converse to joining data is creating subsets. SVS 7 provides a fast and flexible interface for selecting specific genomic regions or samples. Activate by thresholds or categories. Subset out chromosomes, genes or other regions for more targeted analyses and imputation. Investigate markers in one data set based off results from another.

 

 

 

Editing SpreadsheetsGenotype and phenotype spreadsheet editor

In order to calculate the correct statistics it is necessary for data to be in the proper format and of the correct type. Data comes in all shapes and sizes and though SVS 7 auto-detects the formats of each variable upon import, the assigned type may not be what the researcher intended (e.g. categorical data represented as numbers will be interpreted as integers). A new fully integrated spreadsheet editor lets you edit and enrich your data on a grand scale. Find and replace within a column or the entire spreadsheet. Create or delete columns. Edit row labels, column headers or individual value. Convert columns from one type to another, and more. One powerful option automatically expands a categorical variable to set of binary variables representing each category.

Spreadsheet with phenotype and genotype data.

Recoding Genotypes and Strand Flipping

Genotype data typically comes in the generic AB format. SVS 7 has an option that can take generic AB data and recode it in a number of ways. Encode genotypes as DD, Dd, dd based on in-sample calculations of major and minor alleles or encode them numerically (0, 1, 2) based on the additive, dominant, or recessive model. The latter is important for including genotypes as covariates for advanced regression analysis. You can also flip DNA strands for proper merging of different datasets and imputation procedures, or transcode generic AB alleles to their respective AGCT format.

 

Renaming Genetic Column Headers

Most vendors have their own identifiers for genetic markers. Typically these are mapped to a more generalized standard, such as rsID#. In order to properly merge and analyze datasets in different formats, it is essential to use a consistent identifier in both. In SVS 7, you can automatically rename your genetic column headers from one identifier to another if the new identifier is included in your genetic marker map.

© 2012 Golden Helix, Inc     Facebook     Twitter     Linked In     Blog   YouTube

Site Map   |   Privacy Policy   |   Contact Us