‹‹ Back to SVS Home
Exporting Spreadsheet Data
14.1 Exporting Spreadsheet Data
Data from spreadsheets can be exported to many formats, which include saving as a text or third party file format, saving as PED/MAP, TPED/TFAM, or BED/BIM/FAM format files, saving as a Golden Helix, Inc. file format, saving as a Marker Map and saving as CNT files. Exporting or saving in these file formats are described below.
Saving as a Text or Third Party File
To save a spreadsheet as a text or third party file, select File > Save As... > Text or Third Party
Format.
To select the file format and the location to save the spreadsheet, click the Browse button. This opens a Save As dialog
window, with the default file format of .csv (comma-delimited text file). To save the spreadsheet as a .txt file, add the .txt
extension after the file name. The file format type can be selected from the drop-down menu or by specifying the extension,
such “test.xls”.
There are several Export Options available. You can either export only the active data (this includes columns set as a
dependent variable) or export all data. You can also indicate whether you want to include column headers and row labels in
the saved spreadsheet. If these boxes are not checked, then the first column and first row exported will contain data. Due to
column oriented optimization within SVS, it may be advantageous to transpose the data on export, especially if saving to a
data format with either column or row size limitations (like .xls). If the transpose option is selected, The row label header
may also be specified, with the default header being “Columns.” For text files (.csv or .txt) you have the option of specifying
the field delimiter by choosing from Comma, Tab, Space or specifying a different delimiter by choosing Other ->.
You can also indicate how missing values are to be encoded by editing the “Encode missing values as:” text
box.
If there are genotypic columns in the spreadsheet then exporting options for genotypes will be available. The options that
can be specified for genotypic export include specifying the allele delimiter and the missing allele encoding. There is a list
menu for typical allele delimiters, with ‘_’ (underscore) set as default. A user specific character can be specified by selecting
Other -> and specifying the custom delimiter in the text box. The encoding for missing alleles can also be specified in the
appropriate text box.
Saving as PED/MAP, TPED/TFAM, or BED/BIM/FAM Files
SVS can export to plain text PED/MAP files, plain text TPED/TFAM files, and optimized binary PED or BED files
(which should have corresponding BIM/FAM files). To accommodate the fact that these files are expected to have full marker
map and pedigree information for each SNP, SVS can artificially generate marker map and/or pedigree information if the
spreadsheet is not marker mapped or is not a pedigree spreadsheet.
To save a spreadsheet as PED/MAP, TPED/TFAM, or BED/BIM/FAM files, select File > Save As... >
PED/TPED/BED.
To select the file format and the location to save the spreadsheet, click the Browse button. This opens a Save As dialog window with the PED/MAP file format type selected as default.
To select the file format type, use the “Save as type” drop-down menu. The file format types appear in this menu as follows:
- PED/MAP - 2 files (*.ped *.*)
- TPED/TFAM - 2 files (*.tped *.*)
- BED/FAM/BIM SNP-Major - 3 files (*.bed *.*)
Three option boxes are available:
- A general Export Option allows you to either export only the active data (this includes columns set as a dependent variable) or export all data.
- Family-index column assignments for the first six PED, TFAM, or FAM columns:
- Affection Status
- Patient ID
- Sex
- Family ID
- Mother ID
- Father ID
If your spreadsheet is a pedigree spreadsheet, these column assignments default to their pedigree-spreadsheet counterparts, while other columns are selectable. In addition, for every column except Affection Status, artificial values may be generated.
If your spreadsheet is not a pedigree spreadsheet, this export option is still available–however, in that case, while SVS will try to find reasonable defaults, it may be desirable or even required to generate artificial values for some or all of the columns other than the Affection Status.
- An option to specify whether you are working with human genome data or non-human genome data. If you are
importing non-human genome data, the number of autosomal chromosomes in the data needs to be
specified.
NOTE:
- If your spreadsheet does not have a marker map applied, artificial values will be generated for the marker mapping.
- In all cases, missing phenotypes in the Sex or Affection Status will be encoded as “-9”, and missing genotypes will be encoded as “0”.
- The first four columns of the PED, TFAM, and FAM file formats are identifiers for which missing values will be encoded as “0”.
Saving as Either a DSF or GHD File
DSF File
In order to save a spreadsheet in the Golden Helix, Inc. DSF format, select File > Save As... > DSF File. The DSF
format is SVS’s proprietary sparse data storage format, and is the only format that preserves all marker map information on
export. To select the location and name of the output DSF file, click Browse. You can change the dataset name from the
name when the original dataset was imported. This name will be used when imported into another SVS project. There
is the option to either save only the active data or all of the data in the current spreadsheet. To save only
the active data, make sure that Active Data is selected; to select All Data, click on the label or the radio
button.
Legacy GHD File
In order to save a spreadsheet in the Golden Helix, Inc. Legacy GHD format, select File > Save As... > Legacy GHD
File. This file format is the one sparse storage format that is backwards compatible with previous versions of SVS. To select
the location and name of the output GHD file, click Browse. There is the option to either save only the active data or all of
the data in the current spreadsheet. To save only the active data, make sure that Active Data is selected; to select All Data,
click on the label or the radio button.
Saving as an Annotation Track
In order to create an annotation track from the current spreadsheet, select File > Save As... > Annotation Track. The Creation Preferences allow the user to specify which rows are to be used (Active or All), the Input Coordinate System (Half-open or Indexed), The type of track (e.g. Interval), the expected and additional fields (which depends on the previous type selection, the track name and build and finally the name of the Output file, in .idf format. The track is saved as a local track and can be viewed in the genome browser.
Save as CNT Files
If a marker map is applied to the columns of the current spreadsheet, separate CNT files can be saved for each sample. CNT files have the following format:
Insert screenshot here.
Create Marker Map from Spreadsheet
In order to create a genome browser track from the current spreadsheet, select File > Create Marker Map from Spreadsheet. Columns containing the Marker Name, Chromosome and Position information must be specified. The name containing Marker Name and Chromosome information must be categorical while the column containing position information must be integer-valued. An informative marker map name can be specified with the default name “New Marker Map.”