ft extract

Extract Fiber-seq data into plain text files

Inputs and options

See the help message for details.

Output description

All outputs to ft extract can be (and should be) compressed by simply adding the .gz extension. For example, ft extract input.bam --m6a m6a.bed.gz will output a compressed bed12 file. Use - to output to stdout, e.g. ft extract input.bam --m6a -.

Shared Output columns:

ColumnDescription
ctChromosome or contig
stStart position of the read on the chromosome
enEnd position of the read on the chromosome
fiberThe fiber/read name
scoreThe number of ccs passes for the read (rounded)

Columns specific to the --m6a, --cpg, --nuc, and --msp outputs

All of these files are written in standard bed12 format. The first and last block in each the bed12 record do not reflect real data, and exist only to mark the start and end positions of the read. If you would like to convert these beds into bigBeds be sure to include -allow1bpOverlap in your command.

ColumnDescription
thick startSame as the start (st)
thick endSame as the end (en)
itemRgbColor specifc to the datatype, e.g. m6a marks get a purple RGB
blockCountThe number of blocks in the bed12 record
blockSizesA comma separated list of the lengths of each feature in the bed12 record
blockStartsA comma separated list of the relative start positions of each block in the bed12 record

Columns specific to the --all output

ColumnDescription
sam_flagThe sam flag of the read alignment
HPThe haplotype tag for the read
RGThe read group tag for the read
fiber_lengthThe length of the read in bp
fiber_sequenceThe sequence of the read
ecThe number of ccs passes for the read (no rounding)
rqThe estimated accuracy of the read
total_AT_bpThe total number of AT bp in the read
total_m6a_bpThe total number of m6a bp in the read
total_nuc_bpThe total number of nucleosome bp in the read
total_msp_bpThe total number of MSP bp in the read
total_5mC_bpThe total number of 5mC bp in the read
nuc_startsThe start positions of the nucleosomes in molecular coordinates
nuc_lengthsThe lengths of the nucleosomes in molecular coordinates
ref_nuc_startsThe start positions of the nucleosomes in reference coordinates
ref_nuc_lengthsThe lengths of the nucleosomes in reference coordinates
msp_startsThe start positions of the MSPs in molecular coordinates
msp_lengthsThe lengths of the MSPs in molecular coordinates
ref_msp_startsThe start positions of the MSPs in reference coordinates
ref_msp_lengthsThe lengths of the MSPs in reference coordinates
m6aThe start positions of the m6a in molecular coordinates
ref_m6aThe start positions of the m6a in reference coordinates
m6a_qualThe quality of the m6a positions (ML value)
5mCThe start positions of the 5mC in molecular coordinates
ref_5mCThe start positions of the 5mC in reference coordinates
5mC_qualThe quality of the 5mC positions (ML value)

Note positions in columns starting with ref_ maybe contain -1 (NA) values if the reference sequence has an insertion or deletion relative to the read sequence at that position.