De Novo Transcriptome Characterization
Introduction
Transcriptome Analysis of Monilinia laxa.
Dataset Description
Transcriptomes of Monilinia fructicola, Monilinia laxa, and Monilinia fructigena, the causal agents of brown rot of stone and pome fruits. For this tutorial, only the data of Monilinia laxa is used. This dataset comprises paired-end reads that were corresponding to mycelium grown in the dark for 4 days, mycelium grown in the dark for 2 days, and then exposed to light for 2 days, as well as in germinating conidia (2 replicates per each condition).
Organism: Monilinia laxa
Instrument: Illumina HiScanSQ
Layout: Paired-end
Publication
Original Data
NCBI BioProject: PRJNA419302
SRA Experiments: SRR6312174, SRR6312175, SRR6312181, SRR6312182, SRR6312187, and SRR6312190.
NCBI Nucleotide: TSA: Monilinia laxa, transcriptome shotgun assembly.
Bioinformatic Analysis
1- RNA-Seq de novo Assembly
Application
RNA-Seq de novo Assembly (Transcriptomics).
Input
Parameters
Sequencing Data: Paired-End Reads
Sequencing Format: FASTQ
Input Reads: SRR6312174, SRR6312175, SRR6312181, SRR6312182, SRR6312187, and SRR6312190 FASTQ files
Upstream Files Pattern: _1
Downstream Files Pattern: _2
Strand Specificity: Non-Strand Specific
Minimum Contig Length: 200
Assess the Read Content: false
Construct Super Transcripts: false
Do Not Normalize Reads: false
Normalization Max. Read Coverage: 200
Minimizing Falsely Fused Transcripts: true
Pairs Distance: 500
Min. Kmer Coverage: 1
Max. Reads Per Graph: 200000
Min. Glue: 2
Max. Cluster Size: 25
Assembly Algorithm: Original
Path Reinforcement Distance: 25
No Path Merging: false
Min. Percent Identity: 98
Max. Allowed Differences: 2
Max. Internal Gap: 10
Transcript to Gene Mapping File: transcript_to_gene_map.txt
Execution Time
~ 2 hours.
Output
transcripts.box: Sequence project containing assembled transcripts.
rna_seq_de_novo_assembly_report.box: Report about the RNA-Seq de novo assembly results.
2- Completeness Assessment
Application
Completeness Assessment (Transcriptomics).
Input
Assembled transcripts (from the 1- RNA-Seq de novo Assembly step).
Parameters
Lineage: Helotiales (order).
Mode: Transcriptome.
Blast e-value: 1.0E-3.
Execution Time
10-15 minutes.
Output
3- Clustering
Application
Clustering (Transcriptomics).
Input
Assembled transcripts (from the 1- RNA-Seq de novo Assembly step).
Parameters
Sequence Identity Type: Global
Sequence Identity Threshold: 0.95
Band Width: 20
Word Length: 10
Length Cutoff: 10
Length Difference Cutoff: 0.0
Accurate Mode: false
Comparing Both Strands: true
Adjust Longer Sequence Coverage: false
Adjust Shorter Sequence Coverage: false
Longer Sequence Unmatched %: 1.0
Shorter Sequence Unmatched %: 1.0
Alignment Position Constraints: false
Save Cluster File: true
Output Cluster File: clusters.txt
Execution Time
10-15 minutes.
Output
clustering_transcripts.box: Sequence project containing the representative sequence of each cluster.
clustering_results.box: Report about the clustering results.
cluster_distribution_transcripts.box: Bar plot showing the number of clusters of each cluster size.
clusters.txt: This is a text file generated by CD-HIT containing information about each cluster.
ca_results_clustering_transcripts.box: BUSCO assessment for the results of the clustering step.
4- Predict Coding Regions
Application
Predict Coding Regions (Transcriptomics).
Input
Clustered transcripts (from the 3- Clustering step).
Parameters
Genetic Code: Universal
Minimum Protein Length: 100
Strand Specific: false
Provide Gene-Transcript Relationships: false
Pfam Search: true (recommended, but time-consuming)
Retain Long ORFs Mode: Dynamic
Single Best Only: true
No Refine Starts: false
Top Longest ORFs for Training: 500
Execution Time
45 minutes (10-15 minutes without Pfam Search).
Output
protein.box: Sequence project containing peptide sequences for the final candidate ORFs.
predict_coding_regions_results.box: Report about the ‘predict coding regions’ results.
5- Functional Annotation
Application
Functional Annotation Pipeline (Functional Analysis).
Input
Predicted proteins (from the 4- Predict Coding Regions step).
Parameters
CloudBlast
Blast Program: blastp-fast
Blast DB: Non-redundant protein sequences (nr v5)
Taxonomy Filter: 5178 Helotiales
Filter Option: Blast against a subset of taxonomies
Blast Expectation Value (e-Value): 1.0E-3
Number of Blast Hits: 20
Blast Description Annotator: True
Word Size: 6
Low Complexity Filter: True
HSP Length Cutoff: 33
HSP-Hit Coverage: 0
Filter By Description: No filter
CloudIPS
CDD: True
HAMAP: True
HMMPanther: True
HMMPfam: True
HMMPIR: True
FPrintScan: True
ProfileScan: True
HMMTigr: True
PatternScan: False
Gene3D: True
SFLD: True
SuperFamily: True
Coils: False
MobiDBLite: True
GO Mapping
Use latest database version: True
GO Annotation
Annotation CutOff: 55
GO Weight: 5
Filter GO by Taxonomy: No Filter
E-Value-Hit-Filter: 1.0E-6
HSP-Hit Coverage CutOff: 0
Hit Filter: 500
Only hits with GOs: False
Evidence Code Weights: Default Values
Merge InterProScan GOs to Annotation
Execution Time
3 hours with IPS Scan, less than 1 hour without IPS Scan.