QUICK LINKS



" 'Where is the missing heritability?' is a question asked frequently in genetic research. The difficulty seems to come down to the common disease/common variant hypothesis not holding up." » Read more

What is Python?

Python is a clear and powerful object-oriented programming language, comparable to Perl, Ruby, Scheme, or Java. Integrating Python into SVS 7 provides full programmatic access to many of the software's features enabling the augmentation of existing tools, creating entirely new ones, automation of work flows, integration with other programs and more.

Python Learning Resources

» SVS 7 Scripting Reference
» Python.org
» Beginners Guide to Python

SVS Add-On Scripts Repository

Here you will find a collection of Python scripts submitted by Golden Helix developers and our customers. All scripts are provided for no additional cost. So feel free to download, use, and even enhance!  

Share your scripts with the Golden Helix Community

If you have written any scripts and would like to share them with other SVS 7 users, we encourage you to email a *.txt or *.py file to community@goldenhelix.com with any accompanying documentation or special instructions. Once we test your script and check its validity, we'll post it on this page for others to download.

 

 

 

Date Modified Category Script Author

8/13/2010

Import/Export

BEAGLE/BEAGLECALL Scripts Package
These scripts are for importing and exporting files from the BEAGLE and BEAGLECALL Genetic Analysis Software Packages.

Various GHI Staff
Golden Helix

7/21/2010

SNP Analysis

SNP Density
This script reports various SNP density statistics across all markers in a marker mapped spreadsheet.

Gabe Rudy
Golden Helix

6/11/2010

SNP Analysis

Rare Variant SNP
This script creates a binary spreadsheet to indicate if there exists a rare variant in the window centered about the current marker.

Christophe Lambert
Golden Helix

6/11/2010

Edit

Recode AGCT Alleles to AB
This script recodes AGCT alleles to the standard genotype format of A_A, A_B, and B_B. Prompts for a marker map strand field to use for recoding. The strand field must have allele information in the form [C/T].

Greta Peterson & Christophe Lambert
Golden Helix

6/8/2010

Import

Import VCF File
This script imports the 1000 Genomes .vcf file data and creates a genotype spreadsheet and an optional read depth spreadsheet.

Jesse Dupre
Golden Helix

6/2/2010

CNV Analysis

Affymetrix B Allele Frequency Calculation
Using Affymetrix CEL files as its source, this script combines quantile normalized SNP A and B probe intensities for each marker into a theta value, then calculates B-Allele Frequencies for each marker.

Greta Peterson
Golden Helix

5/26/2010

Quality Assurance

Quartile Summary Statistics
This script calculates and/or reports the following approximate values for each real- or integer-valued active column: Minimum, Q1 (first quartile), Median, Q3 (third quartile), Maximum, Q1 – x*IQR, Q3 – x*IQR.

Where x is a user defined multiplier to define outlier thresholds based on IQR (Interquartile Range).

Bryce Christensen
Golden Helix

5/17/2010

Import

Import Illumina Text File
This script imports multiple fields of data from Illumina Final Report text files. It can be used to just import genotype data, or multiple real-valued columns such as B-Allele Frequency, or Log R Ratios.

Jesse Dupre
Golden Helix

4/29/2010

CNV Analysis

MIP CN Transformation
This script creates 5 transposed spreadsheets, one for each column imported from the MIP Array copy number text file: Copy A, Copy B, CopyNumber, AlleleRatio, and AllelicDifference.

Greta Peterson
Golden Helix

4/29/2010

Edit

Convert Real Columns to Integer by Rounding
This script converts all real columns (‘R’) to integers (‘I’) by rounding.

Greta Peterson
Golden Helix

4/24/2010

CNV Analysis

Log Ratio Tails
This script calculates percentile values for the upper and lower tails of log ratios using two user-specified thresholds. Missing values are skipped. A log ratio call rate is returned with the results.  This script may also be used to identify percentiles for real-value data other than log ratios.

Christophe Lambert,
Bryce Christensen

Golden Helix

4/24/2010

CNV Analysis

Derivative Log Ratio Spread
Calculates derivative log ratio spread for copy number log ratio data, both per chromosome and overall. This is the standard deviation of the differences between adjacent points divided by the square root of 2. Missing values are skipped.

Christophe Lambert
Golden Helix

4/13/2010

CNV Analysis

Percentile Based Winsorizing
This script calculates thresholds for the top and bottom percentiles of log ratio data, as specified by the user, for the purpose of winsorizing – replacing extreme log ratio values with the calculated thresholds. Winsorizing data prevents segmentation algorithms from being driven by outlier values and results in a more accurate determination of regions of copy number variation.

Greta Peterson
Golden Helix

3/25/2010

Marker Map

Import Affymetrix DMET Report
This script imports Affymetrix DMET data. Hemizygous markers are converted to homozygous markers, and tri-allelic markers are converted into two columns, each containing the major allele and one of the minor alleles.

Jesse Dupre,
Greta Peterson

Golden Helix

3/15/2010

Marker Map

Add Gene Names to Marker Map
This script uses the gene name field(s) in a table downloaded from the UCSC Genome Browser to add gene names to an existing SVS marker map saved in the DSM file format.

Greta Peterson
Golden Helix

2/4/2010

Edit

Consensus and Combination of Two Arrays
This script takes genotypes from two different arrays and creates a consensus spreadsheet, and two optional spreadsheets.

Greta Peterson,
Christophe Lambert

Golden Helix

1/20/2010

Edit

Rename Alleles
This script scans for allele names (e.g. 1,2,3,4) and prompts for new names (e.g. A,C,G,T). Missing alleles remain missing.

Greta Peterson
Golden Helix

1/14/2010

Import

Import MACH Output
This script will import output files from the MACH imputation package.
http://www.sph.umich.edu/csg/abecasis/mach/

Jesse Dupre
Golden Helix

1/5/2010

Edit

Add Allele Delimiter
This script converts categorical genotypes such as AA, AG without a delimiter to the standard genotype format accepted by Golden Helix SVS (A_A, A_G, etc). The script prompts for the missing value indicator.

Possible uses include converting genotype columns after importing data that were identified as categorical due to a missing genotype delimiter.

Greta Peterson
Golden Helix

1/5/2010

Select

Activate Columns by Column Labels
This script activates only the columns in a spreadsheet from which the script is run whose column labels are in another spreadsheet specified by the user.  Any inactive columns in the other spreadsheet are not used in activating columns in the current spreadsheet. Non-unique columns in the current spreadsheet are supported.

Gabe Rudy,
Greta Peterson

Golden Helix

1/5/2010

Select

Inactivate Columns by Column Labels
This script inactivates only the columns in a spreadsheet from which the script is run whose column labels are in another spreadsheet specified by the user.  Any inactive columns in the other spreadsheet are not used in inactivating columns in the current spreadsheet. Non-unique columns in the current spreadsheet are supported.

Gabe Rudy,
Greta Peterson

Golden Helix

1/5/2010

Select

Inactivate Rows by Row Labels
This script inactivates only the rows in a spreadsheet from which the script is run whose row labels are in another spreadsheet specified by the user.  Any inactive rows in the other spreadsheet are not used in inactivating rows in the current spreadsheet. Non-unique rows in the current spreadsheet are supported.

Gabe Rudy
Golden Helix

1/5/2010

Analysis

Row Statistics
For a spreadsheet with real valued data, this script calculates the mean, variance and standard deviation of each row, creating a new spreadsheet with the respective row statistics.

Christophe Lambert,
Greta Peterson

Golden Helix

1/5/2010

Quality Assurance

Sample Pair Mismatch
This script compares genotype calls from NSP and STY files and calculates the correlation between the nearest markers in the two sets. If there is a high correlation, the NSP and STY markers correspond to the same person, otherwise there is a mismatch.

Christophe Lambert,
Greta Peterson

Golden Helix

1/5/2010

Analysis

Select Subset of Data by XY Coordinates
This script takes an upper and lower bound for two numeric columns and creates a subset spreadsheet for the two columns.

Greta Peterson
Golden Helix

1/5/2010

Import

Import Numeric Matrix File from BeadStudio
This script imports log ratio files obtained from Illumina exported in the matrix format. The file can be exported in either tab separated or comma delimited format. If B allele intensity information is included in the text file it will be imported to a separate spreadsheet than the log ratio data.

Greta Peterson,
Jesse Dupre

Golden Helix

12/9/2009

CNV Analysis

Discretize CN Segment Covariates with Counts
This script creates integer/binary copy number calls for each value in a Segmentation Covariates spreadsheet based on a two- or three-state copy number model. A second spreadsheet will also be created reporting the number of copy number loss, neutral and gain values for each marker in the segmentation covariates spreadsheet. Also included in the output are the mean value for the marker and the absolute difference between the threshold values and the mean marker value.

Greta Peterson
Golden Helix

11/17/2009

CNV Analysis

CNV PCA Search
Given a spreadsheet, prompt for a principal components spreadsheet, a lower and upper bound on the number of components and a step size. Runs association tests using each components setting, does a linear regression on the least significant 90% of the data and reports the slope of the line and a goodness of fit statistic. This script can be used in conjunction with the CNV PCA Search Tutorial .

Christophe Lambert
Golden Helix

11/5/2009

Edit

Split Column on Specified Delimiter
Splits the specified column using the specified delimiter.

Greta Peterson
Golden Helix

1/5/2010

Plotting

SNP Cluster Plots
This script creates scatter plots based on A and B allele intensities that can be split on SNP genotypes to create tri-colored cluster plots. The script will work for up to 100 SNPs at a time.

Greta Peterson
Golden Helix

11/5/2009

LD Analysis

LD Pairwise Analysis
This script outputs results from LD analysis, both the EM and CHM methods and both R2 and D’ values.

Greta Peterson
Golden Helix

11/5/2009

Import

Import Illumina iControlDB Data
This script will import Illumina data exported from the iControl database.

Jesse Dupre
Golden Helix

11/5/2009

Analysis

Compute Odds Ratio CI
This script takes a logistic regression results spreadsheet and calculates 90, 95 or 99% confidence intervals for the Odds Ratio.

Greta Peterson
Golden Helix

11/5/2009

Analysis

Calculate Expected P-value
This script takes spreadsheet that contains a p-value column and calculates expected p-values for the specified column. It is also optional to export expected –log10 p-values as well.

Greta Peterson
Golden Helix

11/5/2009

Quality Assurance

Autosome Heterozygosity
From a marker mapped spreadsheet with genotype data this script will calculate the heterozygosity rate for all autosomes as well as for each individual chromosome.

Greta Peterson
Golden Helix

11/5/2009

SNP Analysis

Run Multiple Association Tests
This script runs genotypic association tests on multiple phenotype columns. Note, this script DOES NOT run Multivariate Association tests. 

Christophe Lambert
Golden Helix

11/5/2009

Integration

Run MACH
From a spreadsheet containing genotype data, this script uses a reference spreadsheet to create a pedigree file, runs MACH on pooled data to impute missing genotypes and re-imports the imputation results back into SVS. http://www.sph.umich.edu/csg/abecasis/mach

Jesse Dupre
Golden Helix

11/5/2009

Import

Import CHIAMO
This script will import the CHIAMO format from the Wellcome Trust.

Christophe Lambert
Golden Helix

11/5/2009

Analysis

Create Table for Significant Regions
This script creates a spreadsheet with significant regions from a spreadsheet of p-values (in the first column). It also extracts p-values more extreme than a certain significance value (cutoff) and combines the remaining markers into segments. If two markers are on different chromosomes or more than a certain distance apart (split), a new region is created.

Ingo Helbig
UK-SH Kiel

11/5/2009

CNV Analysis

Counts per Gene
This script scans a spreadsheet that has case status joined to segment covariates and counts the number of losses and gains per gene as listed in the knownGene.txt file based on user specified thresholds.

Jesse Dupre
Golden Helix

11/5/2009

CNV Analysis

Count Number of Segments Per Sample
This script takes a segment list spreadsheet and counts how many segments each sample has.

Christophe Lambert
Golden Helix

11/5/2009

Edit

Convert Integers to Genotypes
This script recodes integer genotypes to the standard genotype format of A_A, A_B, and B_B. Prompts for value of A_A, A_B, and B_B. All other numbers are encoded as missing. Thus if there is multi-allelic data in the spreadsheet, all numbers other than those specified will be encoded as "?".

Greta Peterson
Golden Helix

11/5/2009

Edit

Concatenate Columns
This script is run from the Spreadsheet Editor. The user calls the script from the left column to be concatenated. Then the script prompts for the column to concatenate (right column) as well as the delimiter to use.

Greta Peterson
Golden Helix

11/5/2009

Regression

Extract Info from Regression Stats Viewer
This script scans the Regression Statistics Viewer output and prints out the p-value after correcting for any covariates. This script is not meant to be used with moving windows.

Greta Peterson
Golden Helix

4/14/2009

Import

Import dbGap Matrix File
This script imports MATRIX genotype files obtained from dbGaP. Files for multiple chromosomes can be selected at once for import. Assumes that each file has the same sample ID’s and that missings are encoded with “NA”.

Jesse Dupre,
Greta Peterson

Golden Helix

4/14/2009

Import

Import Matrix Text File from BeadStudio
This script imports genotype files obtained from Illumina's BeadStudio exported in the matrix format. See documentation on how to output the file in the proper way.

Jesse Dupre,
Greta Peterson

Golden Helix

3/27/2009

ROH Analysis

Shorten ROH Cluster Output Col Names
This script shortens the column names in the Cluster of Runs spreadsheet from the ROH output. The old format ‘MarkerStart->MarkerEnd (num markers) colStart-colEnd’ is converted into the format ‘MarkerStart:MarkerEnd’.

Greta Peterson
Golden Helix

3/27/2009

CNV Analysis

Save as CNT Files
From a spreadsheet containing numeric data, this script saves the spreadsheet in the CNT file format. All non-numeric data columns should be inactivated before running the script.

Christophe Lambert/
Greta Peterson

Golden Helix

3/27/2009

CNV Analysis

Discretize CN Segment List
This script creates integer copy number calls for each value in the Segment Mean column of the Segment List spreadsheet based on a two- or three-state copy number model.

Greta Peterson
Golden Helix

2/23/2009

Import

Import CRLMM-Calls
This script imports the integer 1, 2, 3 crlmm-calls.txt file that is the result of genotype calls from CRLMM.  It prompts for crlmm-confs.txt and a threshold for missing values and then generates genotype and confidences spreadsheets.

More info about CRLMM see:
http://www.bioconductor.org/workshops/2007/BioC2007/talks/BioC2007_Irizarry.pdf

Christophe Lambert
Golden Helix

3/12/2009

Edit

Convert to Binary by Threshold
This script automatically converts all real and/or integer value columns to binary 0 and 1 values. Possible uses include converting haplotype frequencies to binary values, dichotomizing clusters of runs of homozygosity, and converting 1s and 2s (e.g. 1=control, 2=case) to 0s and 1s.

Jesse Dupre
Golden Helix

Greta Peterson
Golden Helix

2/23/2009

Quality Assurance

Genotype Gender Check
From a marker mapped spreadsheet with only X-chromosome data active, this script will predict the gender of your samples by looking at X-Chromosome data.

Bo Peng
MDACC

Greta Peterson
Golden Helix

2/23/2009

Select

Activate Rows by Row Labels
This script activates only the rows in a spreadsheet from which the script is run whose row labels are in another spreadsheet specified by the user.  Any inactive rows in the other spreadsheet are not used in activating rows in the current spreadsheet. Non-unique rows in the current spreadsheet are supported.

Gabe Rudy
Golden Helix

2/20/2009

CNV Analysis

Create Spreadsheet for Segmentation
Based on a column from a spreadsheet, this script creates a new spreadsheet with a pseudo marker map and generic column headers making it suitable for running CNAM optimal segmenting.

Greta Peterson
Golden Helix

2/20/2009

CNV Analysis

Discretize CN Segment Covariates
This script creates integer copy number calls for each value in a Segmentation Covariates spreadsheet based on a two- or three-state copy number model. 

Greta Peterson
Golden Helix

12/9/2009

Edit

Create Marker Map from Spreadsheet
Creates a new marker map in your MarkerMaps folder based on a spreadsheet with Chromosome and Position as the first two columns.

Greta Peterson
Golden Helix

2/20/2009

CNV Analysis

Row Average By Chromosome
For a spreadsheet with real valued data, this script calculates the mean of each row, creating a new spreadsheet with the respective row means. If the data is marker mapped, it calculates the means by chromosome.

Christophe Lambert
Golden Helix

2/20/2009

CNV Analysis

Convert CNAM Covariates
Converts a segment covariate spreadsheet created in CNAM v6.4 to the new format supported in SVS 7.

Greta Peterson
Golden Helix

2/20/2009

Edit

Create Pseudo Marker Mapped Spreadsheet
From a non-marker mapped spreadsheet this script creates a new marker mapped spreadsheet with a pseudo marker map containing chromosome 1, positions 1 - #Rows.

Greta Peterson
Golden Helix

© 2010 Golden Helix, Inc. All Rights Reserved

Privacy Policy   |   Contact Us