De Novo Transcriptome Characterization

Introduction

Transcriptome Analysis of Monilinia laxa.

Dataset Description

Transcriptomes of Monilinia fructicola, Monilinia laxa, and Monilinia fructigena, the causal agents of brown rot of stone and pome fruits. For this tutorial, only the data of Monilinia laxa is used. This dataset comprises paired-end reads that were corresponding to mycelium grown in the dark for 4 days, mycelium grown in the dark for 2 days, and then exposed to light for 2 days, as well as in germinating conidia (2 replicates per each condition).

  • Organism: Monilinia laxa

  • Instrument: Illumina HiScanSQ

  • Layout: Paired-end

Publication

De Miccolis Angelini RM, Abate D, Rotolo C, Gerin D, Pollastro S, Faretra F. De novo assembly and comparative transcriptome analysis of Monilinia fructicola, Monilinia laxa and Monilinia fructigena, the causal agents of brown rot on stone fruits. BMC Genomics. 2018 Jun 5;19(1):436. doi: 10.1186/s12864-018-4817-4. PMID: 29866047; PMCID: PMC5987419.Monilinia fructicolaMonilinia laxa and Monilinia fructigena, the causal agents of brown rot on stone fruits.

 Abstract

Brown rots are important fungal diseases of stone and pome fruits. They are caused by several Monilinia species but M. fructicolaM. laxa, and M. fructigena are the most common all over the world. Although they have been intensively studied, the availability of genomic and transcriptomic data in public databases is still scant. We sequenced, assembled, and annotated the transcriptomes of the three pathogens using mRNA from germinating conidia and actively growing mycelia of two isolates of opposite mating types per species for comparative transcriptome analyses.

Original Data

Bioinformatic Analysis

1- RNA-Seq de novo Assembly

Application

RNA-Seq de novo Assembly (Transcriptomics).

Input

Parameters

  • Sequencing Data: Paired-End Reads

  • Sequencing Format: FASTQ

  • Input Reads: SRR6312174, SRR6312175, SRR6312181, SRR6312182, SRR6312187, and SRR6312190 FASTQ files

  • Upstream Files Pattern: _1

  • Downstream Files Pattern: _2

  • Strand Specificity: Non-Strand Specific

  • Minimum Contig Length: 200

  • Assess the Read Content: false

  • Construct Super Transcripts: false

  • Do Not Normalize Reads: false

  • Normalization Max. Read Coverage: 200

  • Minimizing Falsely Fused Transcripts: true

  • Pairs Distance: 500

  • Min. Kmer Coverage: 1

  • Max. Reads Per Graph: 200000

  • Min. Glue: 2

  • Max. Cluster Size: 25

  • Assembly Algorithm: Original

  • Path Reinforcement Distance: 25

  • No Path Merging: false

  • Min. Percent Identity: 98

  • Max. Allowed Differences: 2

  • Max. Internal Gap: 10

  • Transcript to Gene Mapping File: transcript_to_gene_map.txt

  • Sequence Identity Type: Global

  • Sequence Identity Threshold: 0.95

  • Band Width: 20

  • Word Length: 10

  • Length Cutoff: 10

  • Length Difference Cutoff: 0.0

  • Accurate Mode: false

  • Comparing Both Strands: true

  • Adjust Longer Sequence Coverage: false

  • Adjust Shorter Sequence Coverage: false

  • Longer Sequence Unmatched %: 1.0

  • Shorter Sequence Unmatched %: 1.0

  • Alignment Position Constraints: false

  • Save Cluster File: true

  • Output Cluster File: clusters.txt

Execution Time

10-15 minutes.

Output

4- Predict Coding Regions

Application

Predict Coding Regions (Transcriptomics).

Input

Parameters

  • Genetic Code: Universal

  • Minimum Protein Length: 100

  • Strand Specific: false

  • Provide Gene-Transcript Relationships: false

  • Pfam Search: true (recommended, but time-consuming)

  • Retain Long ORFs Mode: Dynamic

  • Single Best Only: true

  • No Refine Starts: false

  • Top Longest ORFs for Training: 500

Execution Time

45 minutes (10-15 minutes without Pfam Search).

Output

5- Functional Annotation

Application

Functional Annotation Pipeline (Functional Analysis).

Input

Parameters

CloudBlast

  • Blast Program: blastp-fast

  • Blast DB: Non-redundant protein sequences (nr v5)

  • Taxonomy Filter: 5178 Helotiales

  • Filter Option: Blast against a subset of taxonomies

  • Blast Expectation Value (e-Value): 1.0E-3

  • Number of Blast Hits: 20

  • Blast Description Annotator: True

  • Word Size: 6

  • Low Complexity Filter: True

  • HSP Length Cutoff: 33

  • HSP-Hit Coverage: 0

  • Filter By Description: No filter

CloudIPS

  • CDD: True

  • HAMAP: True

  • HMMPanther: True

  • HMMPfam: True

  • HMMPIR: True

  • FPrintScan: True

  • ProfileScan: True

  • HMMTigr: True

  • PatternScan: False

  • Gene3D: True

  • SFLD: True

  • SuperFamily: True

  • Coils: False

  • MobiDBLite: True

GO Mapping

  • Use latest database version: True

GO Annotation

  • Annotation CutOff: 55

  • GO Weight: 5

  • Filter GO by Taxonomy: No Filter

  • E-Value-Hit-Filter: 1.0E-6

  • HSP-Hit Coverage CutOff: 0

  • Hit Filter: 500

  • Only hits with GOs: False

  • Evidence Code Weights: Default Values

Merge InterProScan GOs to Annotation

Execution Time

3 hours with IPS Scan, less than 1 hour without IPS Scan.

Output

Workflow