Long Reads Transcriptome Analysis with SQANTI3
Introduction
SQANTI3 is a bioinformatics tool designed for the quality control and filtering of full-length transcripts sequenced with PacBio’s long-read technology. It is designed as the next step of the IsoSeq pipeline. The interest in this tool comes from the usefulness of long-read transcriptome sequencing to describe eukaryotic transcriptomes and replace the use of second-generation sequencing. Illumina short-reads cannot contain a whole transcript and are not able to well-characterize eukaryotic transcriptomes.
Dataset Description
Consensus transcripts obtained after using IsoSeq3 in OmicsBox in FASTA format. This FASTA file contains the full-length transcriptome of COLO829T melanoma cell line obtained with long-read sequencing.
Organism: Homo sapiens
Instrument: PacBio
Layout: PacBio Single Molecule, Real-Time (SMRT) Sequencing
Publication
Tseng, E., Galvin, B., Hon, T., Kloosterman, W. P., & Ashby, M. (2019). Full length transcriptome sequencing of melanoma cell line complements long read sequencing assessment of genomic rearrangements.
Original Data
Unprocessed long-read data can be obtained from:
https://downloads-ap.pacbcloud.com/public/dataset/Melanoma2019_IsoSeq/subreads/COLO829T/
Nevertheless, SQANTI3 has as input IsoSeq output, that can be downloaded from this link.
Bioinformatic Analysis
1- Analysis Step
Application
Input
File with PacBio HQ Long Reads:
Reference Genome:
Genome Annotation:
Short-read files:
Transcription Start Site Annotation File:
File with PolyA Motifs:
Parameters
Quality Control Parameters
Ignore Transcript ID Nomenclature: False
Min. Length of Reference Transcript: 200
Skip ORF Prediction: False
Set of Splice Sites: ATAC,GCAG,GTAG
Filtering Parameters
Filtering: True
Adenine Percentage: 0.6
Adenines in a Row: 6
Distance to Annotated TTS: 50
Minimum Short-Read Coverage: 3
Filter Mono Exonic Transcripts: False
Execution Time
90 minutes aprox.
Output
example_dataset.box: classification table with a sidebar to make a summary report and different charts.
example.dataset_classification.txt: file with all the information that SQANTI3 can return for each isoform.
example.dataset_junctions.txt: file with information at splice-junction level.
example.dataset_isoforms.fasta: FASTA file with the curated transcriptome.
example.dataset_transcriptome.gtf: annotation file of the curated transcriptome.
example.dataset_isoforms_aminoacids.faa: FASTA file with the translated and curated transcriptome.
Workflow
The long-read transcriptomics submodule allows the user to use as input the subreads or CSS BAM files from PacBio sequencing, transform them into consensus transcripts and have an analysis and quality control of the generated transcriptome.