Long-Read Isoform Identification with FLAIR
Introduction
Isoform Identification in Apostichopus japonicus.
Dataset Description
This dataset contains one file of long reads sequenced by PacBio Sequel technology and two BAM files generated using STAR from two pairs of paired-end FASTQ files with short reads sequenced by Illumina HiSeq 2500 technology.
Organism: Apostichopus japonicus.
Instrument: PacBio Sequel and Illumina HiSeq 2500.
Publication
Original Data
PacBio NCBI Project: PRJNA785124.
Illumina NCBI Project: PRJNA687597.
NCBI Genome and Annotation: Apostichopus japonicus.
Bioinformatic Analysis
Application
Long-Read Isoform Identification (FLAIR).
Input
PacBio Long-Reads dataset in FASTQ format.
Aligned Short Reads in BAM format STAR.
NCBI genome in FASTA format.
Annotation File in GTF format.
Reads Manifest in TSV format to quantify final isoforms.
Parameters
Use Own Alignment Files: false
Use Short Reads: True
Quantify Reads: false
Native RNA: false
Minimum Mapping Quality: 1
Retain Secondary Alignments: 0
Window Size: 15
Minimum Supporting Reads: 3
Window Size for TSS and TTS: 100
Ends Determined at Isoform Level: false
Use Supporting Reads for TSS/TTS: false
How to Treat Redundant Isoforms: No redundancy control
How to Filter Isoforms: Filter based on support
Minimum Mapping Quality: 1
Stringent Mode: false
Check Splice Sites: false
Trust Ends: false
Execution Time
3 hours 23 minutes.
Output
flair.transcriptome.gtf: Transcriptome Annotation in GTF format. It can be the input to SQANTI3.
flair.transcriptome.fa: Transcriptome Sequences in FASTA format.
flair.map.txt: Isoform-Read relationships.
flair.counts.tsv: Counts File of each discovered isoform.
flair_report.box: Summary Report
isoforms_length.box: Isoform Length Distribution