‹‹ Back to SVS Home
Annotation Track Manager
13.6 Annotation Track Manager
The Annotation Track Manager window is accessible from the project navigator menu item Tools > Manage Annotation Tracks or in the plot viewer either through the File > Manage Annotation Tracks... menu item, or through the Manage button in the Add Local Track tab. This window provides a filterable list of all local annotation tracks available within SVS. From this window tracks can be imported, exported, created, deleted, and downloaded from the the Golden Helix, Inc. network annotation service: http://data.goldenhelix.com/das.
The buttons at the top of the Annotation Track Manager window are used to initiate export (see Export To File), import (see Import From File), create (see Create From Spreadsheet) and download (see Download From Network) operations.
Controls on the Filter line allow the list of tracks to be filtered by several criteria. The text box is used to display only tracks with names that match a particular pattern. The wild cards ’*’ (Any number of characters) and ’?’ (Any one character) can be used to specify more complicated patterns. To narrow the list down to tracks labeled as gene tracks, type ’gene’ as the filter text value. The type filter is used to limit the displayed track list to a particular type, such as ’Probe’. The species filter is used to limit the displayed track list to those associated with a particular species, such as ’Canis familiaris’. The build filter is used to limit the displayed track list to a particular build, such as ’NCBI_36’. If multiple filters are set, tracks must match all of the filters to appear in the list.
A variety of information about each track is displayed in the list including its name, type, associated species and build, its location within SVS, the names of its data fields, and its source on disk. If the location is ’Sys’ the annotation track is in the system annotation track folder and was likely bundled with SVS. If the location is ’Proj’ the annotation track is in the open project and is only accessible from within the current project. If the location is ’Usr’ the annotation track is in the user annotation track folder and is accessible from anywhere in SVS.
Right-clicking on an item in the annotation track list provides the option to delete it. This option is disabled for ’Sys’ tracks. If the delete option is selected, confirmation will be requested. The delete operation cannot be undone. Large tracks which are not isolated in their own file may take a very long time to delete.
Viewing the User Annotation Track Folder
User created and downloaded annotation tracks are stored in the user annotation tracks folder in the application directory. This directory can be viewed by clicking on the View Annotation Track Folder button from the Manage Annotation Tracks window.Export To File
Select a track to enable the Export To File button. The Export To File dialog provides a short summary of the track that will be exported and a format selection. The Export Preferences allow selection of output formats relevant to the type of track selected. All files can be exported to the Interval Data Format (IDF), which is the native format for genomic interval data within the application.
Many annotation track formats permit exporting to Delimited Text format. When Delimited Text format is selected, the Export Preferences expand to provide delimited text specific controls. When the header option is checked, the first line of the exported file will be the provided Prefix string followed by the list of field names exported. The data delimiter and sub delimiter may be specified. The data delimiter is the string which is inserted between the field values on each line written to the output file. For maximum utility, the delimiter must not occur in any of the data values being exported. The sub delimiter is the string which is inserted between multiple items within the same field. Similarly, the sub delimiter should not occur within any of the sub items in the data values being exported. The Exported Fields list enables individual selection of the fields to be exported. The Exported Field list may also be reordered using drag-and-drop.
Some annotation tracks contain numeric data fields and may be exported to Wiggle format. When Wiggle format is selected, the Export Preferences expand to provide wiggle specific controls. A single numeric field may be selected for export. If no fields are provided in the list, the selected annotation track contains no numerically typed fields and cannot be exported directly to wiggle format.
An annotation track of type ’Allele Sequence’ can be exported to Fasta format. When Fasta format is selected, the Export Preferences expand to provide fasta specific controls. The chromosome header template may be specified. This string is written as the header line for each chromosome exported by substituting the chromosome name where the string literal ’%chr%’ occurs in the template. The number of characters written to each data line can also be specified. If the “Export chromosomes to separate files” option is checked, the string literal ’%chr%’ should be placed in the output file name and will be replaced with the chromosome name for each file written.
An annotation track of type ’Allele Sequence’ can be exported to 2Bit format. When 2Bit format is selected, the Export Preferences expand to provide 2bit specific controls. When the “Preserve masking data” option is checked, lower-case allele’s will be encoded as masked regions in the 2bit output file.
Import From File
The Import From File dialog allows data to be imported into the native Interval Data Format (IDF) as an annotation track from a variety of input formats. Supported formats include 2Bit, Fasta (FA), Wiggle (WIG), and a variety of delimited text formats, such as Comma Separated Values (CSV), Tab Separated Values (TSV, TAB), Browser Extensible Data (BED), etc. Once the data has been successfully imported as an annotation track, it can be visualized in the genome browser in alignment with user data.
Import a 2Bit File
A 2Bit file is a packed binary format described at http://genome.ucsc.edu/FAQ/FAQformat.html\#format7, and is a very efficient method of packing ACGT sequence data. The 2Bit import requires exactly one input file in 2Bit format. The only available output type is ’Allele Sequence’. The resulting track name may be specified, along with an associated species and build. Although any text will be accepted by the species and build controls, selecting one of the options provided in each list is recommended. The list of available builds will change when the species control is modified. Care should be taken to ensure that the species and build information associated with all user imported annotation tracks precisely matches the species and build data in an appropriate genome map. A name may be specified for the output file. This file will be placed in the user annotation track folder. If desired an existing annotation track file may be selected. The new annotation track will be added to the existing file.Import a Fasta File
A Fasta file is a text file where each character of data designates the value of a sequence base at each offset in a segment designated by its simple header. Because sequence data provided in Fasta format is often split into multiple files, one for each chromosome, multiple input files may be specified. At least one input file in Fasta format is required. The only available output type is ’Allele Sequence’. The resulting track name may be specified, along with an associated species and build. Although any text will be accepted by the species and build controls, selecting one of the options provided in each list is recommended. The list of available builds will change when the species control is modified. Care should be taken to ensure that the species and build information associated with all user imported annotation tracks precisely matches the species and build data in an appropriate genome map. A name may be specified for the output file. This file will be placed in the user annotation track folder. If desired an existing annotation track file may be selected. The new annotation track will be added to the existing file.Import a Wiggle File
A Wiggle file is a text file which assigns a floating point value to each position or interval of interest in genomic space. The Wiggle import requires exactly one input file in Wiggle format. Because of the nature of the wiggle format, multiple tracks are often specified in a single input file. These will all be imported into the same annotation track file. The only available output type is ’Intensity’. The resulting track name may be specified, along with an associated species and build. In the case of Wiggle import, the track name serves as a prefix to the name provided for each track in the input file. The prefix and each track name will be concatenated with a dash to construct each annotation track’s name. Although any text will be accepted by the species and build controls, selecting one of the options provided in each list is recommended. The list of available builds will change when the species control is modified. Care should be taken to ensure that the species and build information associated with all user imported annotation tracks precisely matches the species and build data in an appropriate genome map. A name may be specified for the output file. This file will be placed in the user annotation track folder. If desired an existing annotation track file may be selected. The new annotation track(s) will be added to the existing file.When a resulting annotation track is displayed in the genome browser, the track is expected to contain non-overlapping data. The visualization will resolve overlapping data within a single track by displaying the mean of the intensity data at the overlapping region.
Import a Delimited Text File (including CSV, TSV, TAB, and BED formats)
A delimited text file is a text file consisting of rows and columns of data delimited by special characters or strings which are not allowed in the data values themselves. The Delimited Text import requires exactly one input file in a Delimited Text format. Because the input format is customizable, several controls are provided before the input file selection. These controls allow the user to indicate how the input file should be parsed.The Input Coordinate System box is used to specify whether the intervals defined in the input data are half-open, or indexed. Half-open coordinates are zero-based, and the difference between the stop and start positions define the width of an interval. An interval covering the first three positions of a chromosome in a half-open system would be specified as [0,3]. Indexed coordinates are one-based, and the width of an interval is one plus the difference between the stop and start positions. An interval covering the first three positions of a chromosome in an indexed system would be specified as [1,3].
The Delimiter specifies the string that separates data values (or columns) on each line of the input file. For correct alignment of the imported data, the delimiter should not occur within any of the data values. The Sub Delimiter specifies the string that separates items within a single data value. This is for reading input which may encode a list of values as a single value. For correct interpretation of such lists, the sub delimiter should not occur within any of the list items.
The input file may contain a line which defines the names of the data fields (or columns) in the file. The header line may be detected by line number, prefix, or by a more complicated regular expression match. The header line must be near the top of the file and any text above the header line will be ignored during import. To indicate that the input file does not have a header line, choose ’None’ from the Header type selection. The names of the imported fields will default to ’Column 1’, ’Column 2’, etc. To indicate that the field names are defined on the first line of the file with no prefix, choose ’Line #’ from the Header type selection and accept the default value of 0. To indicate that the header line begins with the string literal ’@’, select ’Starts with’ from the Header type options and replace the text in the Header value field with ’@’. When a regular expression is used to detect the header, the entirety of the matched text is removed before extracting the field names from the header line.
The input file may also contain any number of lines which are ignored during import. Such lines are often referred to as comments because they are intended to be read by humans and include additional notes which are not machine readable. The controls used to indicate which lines should be treated as comments are similar to the header line controls. Comments may be detected by prefix, or by a more complicated regular expression match. To indicate that the input file does not contain any comment lines, choose ’None’ from the Comments type selection. To indicate that comment lines begin with ’!’, select ’Starts with’ from the Comments type selection and replace the text in the Comments value field with ’!’. To indicate that comment lines begin with either ’@’ or ’!’, choose ’Regular expression’ from the Comments type options and replace the text in the Comments value field with ’ˆ[@!]’. Any line which starts with either character will be completely ignored during import.
Once the input file format customizations have been set, the input file should be set by selecting Browse. This will initiate a sparse scan of the input file which will gather data to populate the field lists in the Import Preferences. A type for the output annotation track should then be chosen to further refine the Expected Fields list. The default type ’Interval’ is the least restrictive, requiring only chromosome and start values for each line (or row) in the file.
It is assumed that the input file is arranged such that each record or line of data in the file specifies an interval and its properties. The input file cannot be imported as an annotation track without each interval including a chromosome name and either a position value or a start and a stop value. When only a position value is provided, all intervals are taken to be one base pair in width. Any other data fields are optional, but may be required to meet more specific requirements if another annotation track type is chosen. The Additional Data Fields list will contain entries for all the fields detected in the input file. The Field column will display the detected field name and the Type column will display the importer’s best guess for the type of data the field contains. If the contents of the Additional Data Fields list is not as expected, check the Delimiter, Sub Delimiter, Header and Comments control values. The Expected Fields list contains entries for all the fields that should exist in the input file for successful import as the selected annotation track type. The Field column will display the field name and the type column will display the expected data type for each field. The File Column column will be empty unless the importer detected a likely match for the expected field in the list of Additional Data Fields. If a match was detected, the field name (as given in the header line of the input file) will be listed under the File Column heading. The corresponding field in the Additional Data Fields list will be unchecked and disabled. If a match is required, but has not been provided, the background of the File Column cell will be colored yellow.
To change the File Column assignment for an Expected Field, click on its current value or the empty cell and choose a new one from those available in the File Column list. To remove an assignment, select the empty value at the top of the File Column list. The type of the field detected in the file need not match the Expected Field type. Note that if it does not, it is unlikely that the imported data will be correct. Some of the Expected Field names can be modified. To modify an Expected Field’s name, click on it and edit the value. The value provided will be embedded in the output annotation track.
Disabled fields in the Additional Data Field are already set to be included in the output annotation track and cannot be included again. To include an Additional Data Field in the output annotation track, check the box next to its name. To exclude it, uncheck the box. All possible fields may be included by pressing the All button. All Additional Data Fields may be excluded by pressing the Clear button. Additional Data Field names can be modified. To modify an Additional Data Field’s name, click on it and edit the value. The value provided will be embedded in the output annotation track. Additional Data Field types can be modified. To modify an Additional Data Field’s type, click on it and choose an available type from the type list. In general the field types should not need to be changed. It can be useful in the event that the importer guesses a field’s type incorrectly. For best results, all Expected Fields should be assigned to an appropriate File Column.
The output track name may be specified, along with an associated species and build. Although any text will be accepted by the species and build controls, selecting one of the options provided in each list is recommended. The list of available builds will change when the species control is modified. Care should be taken to ensure that the species and build information associated with all user imported annotation tracks precisely matches the species and build data in an appropriate genome map. A name may be specified for the output file. This file will be placed in the user annotation track folder. If desired an existing annotation track file may be selected. The new annotation track will be added to the existing file.
Create From Spreadsheet
The Create From Spreadsheet dialog is accessible after first selecting a spreadsheet from which to create an annotation track. A project must be open. This dialog allows spreadsheet rows to be converted into intervals in an annotation track.
The Row Selection box is used to specify whether all rows will be converted into intervals, or if only active rows will be converted. The Input Coordinate System box is used to specify whether the intervals defined in the input data are half-open, or indexed. Half-open coordinates are zero-based, and the difference between the stop and start positions define the width of an interval. An interval covering the first three positions of a chromosome in a half-open system would be specified as [0,3]. Indexed coordinates are one-based, and the width of an interval is one plus the difference between the stop and start positions. An interval covering the first three positions of a chromosome in an indexed system would be specified as [1,3]. Indexed coordinates are the default for Create From Spreadsheet because marker maps generally provide a single position rather than a start and a stop value. The position is usually one-based.
A type for the output annotation track should be chosen to refine the Expected Fields list. The default type ’Interval’ is the least restrictive, requiring only chromosome and start values for each row of the spreadsheet. A vertical or row marker map applied to the input spreadsheet can be helpful for meeting input requirements. Such a marker map is treated as an extension of each row’s data for the purposes of creating an annotation track. The spreadsheet cannot be converted to an annotation track without each row including a chromosome name and either a position value or a start and a stop value. When only a position value is provided, all intervals are taken to be one base pair in width. Any other data fields are optional, but may be required to meet more specific requirements if another annotation track type is chosen. The Additional Data Fields list will contain entries for all the columns in the spreadsheet. Inactive columns will be unchecked by default. The Field column will display the associated spreadsheet column name and the Type column will display corresponding type of the spreadsheet column. The Expected Fields list contains entries for all the fields that should exist as columns in the input spreadsheet for successful conversion to the selected annotation track type. The Field column will display the field name and the type column will display the expected data type for each field. The Spreadsheet Column column will be empty unless the importer detected a likely match for the expected field in the list of Additional Data Fields. If a match was detected, the field name (as given in the spreadsheet column name) will be listed under the Spreadsheet Column heading. The corresponding field in the Additional Data Fields list will be unchecked and disabled. If a match is required, but has not been provided, the background of the Spreadsheet Column cell will be colored yellow.
To change the Spreadsheet Column assignment for an Expected Field, click on its current value or the empty cell and choose a new one from those available in the Spreadsheet Column list. To remove an assignment, select the empty value at the top of the Spreadsheet Column list. The type of the spreadsheet column need not match the Expected Field type. Note that if it does not, it is unlikely that the imported data will be correct. Some of the Expected Field names can be modified. To modify an Expected Field’s name, click on it and edit the value. The value provided will be embedded in the output annotation track.
Disabled fields in the Additional Data Field are already set to be included in the output annotation track and cannot be included again. To include an Additional Data Field in the output annotation track, check the box next to its name. To exclude it, uncheck the box. All possible fields may be included by pressing the All button. All Additional Data Fields may be excluded by pressing the Clear button. Additional Data Field names can be modified. To modify an Additional Data Field’s name, click on it and edit the value. The value provided will be embedded in the output annotation track. Additional Data Field types can be modified. To modify an Additional Data Field’s type, click on it and choose an available type from the type list. In general the field types should not need to be changed. It can be useful if the spreadsheet column data type is not the desired type for the data within the output annotation track. For best results, all Expected Fields should be assigned to an appropriate Spreadsheet Column.
The output track name may be specified, along with an associated species and build. Although any text will be accepted by the species and build controls, selecting one of the options provided in each list is recommended. The list of available builds will change when the species control is modified. Care should be taken to ensure that the species and build information associated with all user imported annotation tracks precisely matches the species and build data in an appropriate genome map. A name may be specified for the output file. This file will be placed in the user annotation track folder. If desired an existing annotation track file may be selected. The new annotation track will be added to the existing file.
Download From Network
There are two methods of saving data from tracks at http://data.goldenhelix.com/das to disk. The first is to let the network data cache store downloaded data as it is requested either by the Python scripting environment or a visualization window. This method should be sufficient in many cases and allows for automatic updating of track contents. The downloaded sections of the track(s) will remain efficiently accessible from the local disk cache anywhere in the project, even if a connection to the internet is temporarily unavailable. The second method is to download the track directly from the server to a static local file. This method is the most efficient if data from the entire track is required. The Download From Network dialog enables the direct download method for aquiring local copies of network annotation tracks.
Controls on the Filter line allow the list of tracks to be filtered by several criteria. The text box is used to display only tracks with names that match a particular pattern. The wild cards ’*’ (Any number of characters) and ’?’ (Any one character) can be used to specify more complicated patterns. To narrow the list down to tracks labeled as gene tracks, type ’gene’ as the filter text value. The type filter is used to limit the displayed track list to a particular type, such as ’Probe’. The species filter is used to limit the displayed track list to those associated with a particular species, such as ’Canis familiaris’. The build filter is used to limit the displayed track list to a particular build, such as ’NCBI_36’. If multiple filters are set, tracks must match all of the filters to appear in the list.
The default download location is set to the user annotation track folder. Tracks downloaded to this location will be accessible from SVS. A different location may be specified.
To download one or more tracks, check their entries in the list and press the Download button.