‹‹ Back to SVS Home

Exporting Spreadsheet Data

14.1 Exporting Spreadsheet Data

Data from spreadsheets can be exported to many formats, which include saving as a text or third party file format, saving as PED/MAP, TPED/TFAM, or BED/BIM/FAM format files, saving as a Golden Helix, Inc. file format, saving as a Marker Map and saving as CNT files. Exporting or saving in these file formats are described below.

Saving as a Text or Third Party File

[Picture]

Figure 105: Saving as a Text or Third Party Window

To save a spreadsheet as a text or third party file, select File > Save As... > Text or Third Party Format.

To select the file format and the location to save the spreadsheet, click the Browse button. This opens a Save As dialog window, with the default file format of .csv (comma-delimited text file). To save the spreadsheet as a .txt file, add the .txt extension after the file name. The file format type can be selected from the drop-down menu or by specifying the extension, such “test.xls”.

There are several Export Options available. You can either export only the active data (this includes columns set as a dependent variable) or export all data. You can also indicate whether you want to include column headers and row labels in the saved spreadsheet. If these boxes are not checked, then the first column and first row exported will contain data. Due to column oriented optimization within SVS, it may be advantageous to transpose the data on export, especially if saving to a data format with either column or row size limitations (like .xls). If the transpose option is selected, The row label header may also be specified, with the default header being “Columns.” For text files (.csv or .txt) you have the option of specifying the field delimiter by choosing from Comma, Tab, Space or specifying a different delimiter by choosing Other ->. You can also indicate how missing values are to be encoded by editing the “Encode missing values as:” text box.

If there are genotypic columns in the spreadsheet then exporting options for genotypes will be available. The options that can be specified for genotypic export include specifying the allele delimiter and the missing allele encoding. There is a list menu for typical allele delimiters, with ‘_’ (underscore) set as default. A user specific character can be specified by selecting Other -> and specifying the custom delimiter in the text box. The encoding for missing alleles can also be specified in the appropriate text box.

Saving as PED/MAP, TPED/TFAM, or BED/BIM/FAM Files

[Picture]

Figure 106: Saving as a Family Indexed Genetic Format Window

SVS can export to plain text PED/MAP files, plain text TPED/TFAM files, and optimized binary PED or BED files (which should have corresponding BIM/FAM files). To accommodate the fact that these files are expected to have full marker map and pedigree information for each SNP, SVS can artificially generate marker map and/or pedigree information if the spreadsheet is not marker mapped or is not a pedigree spreadsheet.

To save a spreadsheet as PED/MAP, TPED/TFAM, or BED/BIM/FAM files, select File > Save As... > PED/TPED/BED.

To select the file format and the location to save the spreadsheet, click the Browse button. This opens a Save As dialog window with the PED/MAP file format type selected as default.

To select the file format type, use the “Save as type” drop-down menu. The file format types appear in this menu as follows:

  • PED/MAP - 2 files (*.ped *.*)
  • TPED/TFAM - 2 files (*.tped *.*)
  • BED/FAM/BIM SNP-Major - 3 files (*.bed *.*)

Three option boxes are available:

  • A general Export Option allows you to either export only the active data (this includes columns set as a dependent variable) or export all data.
  • Family-index column assignments for the first six PED, TFAM, or FAM columns:
    • Affection Status
    • Patient ID
    • Sex
    • Family ID
    • Mother ID
    • Father ID

    If your spreadsheet is a pedigree spreadsheet, these column assignments default to their pedigree-spreadsheet counterparts, while other columns are selectable. In addition, for every column except Affection Status, artificial values may be generated.

    If your spreadsheet is not a pedigree spreadsheet, this export option is still available–however, in that case, while SVS will try to find reasonable defaults, it may be desirable or even required to generate artificial values for some or all of the columns other than the Affection Status.

  • An option to specify whether you are working with human genome data or non-human genome data. If you are importing non-human genome data, the number of autosomal chromosomes in the data needs to be specified.

NOTE:

  1. If your spreadsheet does not have a marker map applied, artificial values will be generated for the marker mapping.
  2. In all cases, missing phenotypes in the Sex or Affection Status will be encoded as “-9”, and missing genotypes will be encoded as “0”.
  3. The first four columns of the PED, TFAM, and FAM file formats are identifiers for which missing values will be encoded as “0”.
Saving as Either a DSF or GHD File

[Picture]

Figure 107: Saving as DSF Window

DSF File

In order to save a spreadsheet in the Golden Helix, Inc. DSF format, select File > Save As... > DSF File. The DSF format is SVS’s proprietary sparse data storage format, and is the only format that preserves all marker map information on export. To select the location and name of the output DSF file, click Browse. You can change the dataset name from the name when the original dataset was imported. This name will be used when imported into another SVS project. There is the option to either save only the active data or all of the data in the current spreadsheet. To save only the active data, make sure that Active Data is selected; to select All Data, click on the label or the radio button.

Legacy GHD File

In order to save a spreadsheet in the Golden Helix, Inc. Legacy GHD format, select File > Save As... > Legacy GHD File. This file format is the one sparse storage format that is backwards compatible with previous versions of SVS. To select the location and name of the output GHD file, click Browse. There is the option to either save only the active data or all of the data in the current spreadsheet. To save only the active data, make sure that Active Data is selected; to select All Data, click on the label or the radio button.

Saving as an Annotation Track

In order to create an annotation track from the current spreadsheet, select File > Save As... > Annotation Track. The Creation Preferences allow the user to specify which rows are to be used (Active or All), the Input Coordinate System (Half-open or Indexed), The type of track (e.g. Interval), the expected and additional fields (which depends on the previous type selection, the track name and build and finally the name of the Output file, in .idf format. The track is saved as a local track and can be viewed in the genome browser.

Save as CNT Files

If a marker map is applied to the columns of the current spreadsheet, separate CNT files can be saved for each sample. CNT files have the following format:

Insert screenshot here.

Create Marker Map from Spreadsheet

In order to create a genome browser track from the current spreadsheet, select File > Create Marker Map from Spreadsheet. Columns containing the Marker Name, Chromosome and Position information must be specified. The name containing Marker Name and Chromosome information must be categorical while the column containing position information must be integer-valued. An informative marker map name can be specified with the default name “New Marker Map.”