-
Notifications
You must be signed in to change notification settings - Fork 386
TDF Format
The ".tdf" file format is an indexed, compressed binary file format to support the display of numeric data. It was developed simultaneously with, and is similar in purpose, to the UCSC "bigWig" format. For must use cases we now recommend the "bigWig" format is it is widely used across many tools, while "tdf" is limited in use to IGV.
TDF files are created with igvtools. Input formats include wig and bedgraph as well as the IGV specific ".igv" and ".cn" formats. TDF files representing alignment coverage can also be created directly from .bam files using igvtools.
Header
Tiles
Datasets
Groups
Master Index
Field | Description | Type | Value |
---|---|---|---|
magic | TDF magic number | int | TDF\0 |
version | format version number | int | |
indexPosition | file position of index section | int | |
indexSize | size in bytes of index section | int | |
headerSize | size in bytes of the remainder of the header section | int | |
nWindowFunctions | number of window functions | int | |
List of window functions | |||
windowFunction | window function name | string | mean median min max percentile2 percentile10 percentile90 percentile98 stddev count density |
End list of window functions | |||
trackType | string | ||
trackLine | UCSC style track line | string | |
nTracks | number of tracks | int | |
List of track names (n = nTracks) | |||
trackName | name of track | string | |
*End list | |||
genomeId | genome identifier (e.g. hg19) | string | |
flags | Flags | int |
This section contains tiles of data. A tile represents a region of the genome at a specific zoom (resolution) level. Each tile is referenced by a tile index entry of a dataset.
Field | Description | Type | Value |
---|---|---|---|
type | tile format | string | fixedStep variableStep bed bedWithName |
Remainder according to type |
Field | Description | Type | Value |
---|---|---|---|
nPositions | Number of genomic positions | int | |
start | genomic start position (zero based) | int | |
span | genomic span for each data point | float | |
List of data points. Track order first. (n= nTracks X nPositions) | |||
datum | data value for track and position | float | |
End list |
Field | Description | Type | Value |
---|---|---|---|
tileStart | genomic position for start of tile | int | |
span | genomic span for each data point | float | |
nPositions | Number of genomic positions | int | |
List of data start positions | |||
start | genomic start position (zero based) | int | |
End list | |||
List of data points. Track order first. (n= nTracks X nPositions) | |||
datum | data value for track and position | float | |
End list |
Field | Description | Type | Value |
---|---|---|---|
nPositions | Number of genomic positions | int | |
List of data start positions. (n= nPositions) | |||
start | genomic start position (zero based) | int | |
End list | |||
List of data end positions. (n= nPositions) | |||
end | genomic end position | int | |
End list | |||
nSamples | Number of samples. Ignored | ||
List of data points. Track order first. (n= nTracks X nPositions) | |||
datum | data value for track and position | float | |
End list | |||
Optional feature names (type = bedWithName) | |||
List of feature names (n=nPositions) | |||
name | feature name | string | |
End list |
A dataset is a container for tiles of data at a given zoom level. Tiles are referenced by file position.
Field | Description | Type | Value |
---|---|---|---|
nAttributes | Number or attributes | int | |
List of attributes | |||
key | Attribute key | string | |
value | Attribute value | string | |
End list | |||
dataType | ignored | string | |
tileWidth | Width of each tile in base pairs | float | |
nTiles | Number of tiles | int | |
List of tile entriee | |||
position | File position for start of tile | long | |
size | Size of tile in bytes | int | |
End list |
A Group is a container of key-value pairs, essentially a dictionary. A TDF file can in theory have an arbitrary number of groups referenced from the group index. In practice only a single group, the "root group" with name "/", has been used. The root group contains meta data and statistics for the file as a whole. See below for common attributes.
Field | Description | Type | Value |
---|---|---|---|
nAttributes | Number or attributes | int | |
List of attributes | |||
key | Attribute key | string | |
value | Attribute value | string | |
End list |
Field | Description | Type | Value |
---|---|---|---|
nDatasets | Number of datasets | int | |
List of datasets | |||
name | dataset name | string | |
position | dataset file position | long | |
nBytes | size of dataset in bytes | int | |
End list | |||
nGroups | Number of groups | int | |
List of groups | |||
name | name of group | string | |
position | file position of group | int | |
nBytes | size of group in bytes | int | |
End list |
TDF files created with igvtools typically include the following attributes in the root group (group name = "/"). The data type for all attributes is "string".
Name | Description |
---|---|
2nd Percentile | 2nd percentile value of all data in this file |
10th Percentile | |
90th Percentile | |
98th Percentile | |
Maximum | |
Mean | |
Median | |
Minimum | |
chromosomes | Comma delimited list of all chromosomes/contig/sequence names in this file |
maxZoom | The maximum pre-computed zoom level |
totalCount | For alignment coverage files only - total number of alignments. |