Short Read Eukaryotic Analysis
Introduction
Genome analysis of Botryotinia fuckeliana (anamorph Botrytis cinerea) strain BcDW1.
Dataset Description
Botrytis cinerea (teleomorph Botryotinia fuckeliana) is an opportunistic pathogen with a broad host range and is particularly aggressive on fleshy fruit, such as tomato, strawberries, and grape berries. DNA from an axenic BcDW1 culture was extracted using a modified cetyltrimethylammonium bromide (CTAB) method, and 6.9 Gb of Illumina sequence reads was generated to achieve >100× coverage of the genome.
Instrument: Illumina HiSeq 2000.
Layout: Paired-end.
Publication
Original Data
NCBI BioProject: PRJNA188482.
SRA Experiment: SRR680162.
NCBI Assembly: GCA_000349525.1.
Bioinformatic Analysis
1- DNA-Seq de novo Assembly
Application
DNA-Seq de novo Assembly (ABySS).
Input
Illumina sequencing data in FASTQ format (SRR680162_1.fastq.gz and SRR680162_2.fastq.gz).
Parameters
Input Reads: [Paired-End] SRR680162_1.fastq.gz and SRR680162_2.fastq.gz
Upstream Files Pattern: _1
Downstream Files Pattern: _2
Use Additional Data: false
K-mer Size: 64
Use Paired de Bruijn graph: false
Minimum Alignment Length: 40
Hash Functions: 1
K-mer Count Threshold: 2
Unitigs Fasta: unitigs.fasta
Contigs Fasta: contigs.fasta
Scaffolds Fasta: scaffolds.fasta
Save Graph Files: true
Execution Time
70-90 minutes.
Output
unitigs.fasta: FASTA file containing assembled unitig sequences.
contigs.fasta: FASTA file containing assembled contig sequences.
scaffolds.fasta: FASTA file containing assembled scaffold sequences.
report_abyss.box: Report about DNA-Seq de novo Assembly results.
nx_plot.box: Line chart about DNA-Seq de novo Assembly results.
assembly-contigs.dot: Text file containing the contig overlap graphs in the GraphViz DOT syntax.
assembly-scaffolds.dot: Text file containing the scaffold overlap graphs in the GraphViz DOT syntax.
2- Repeat Masking
Application
Input
Assembled genomic scaffolds in FASTA format (from the 1- DNA-Seq de novo Assembly step).
Parameters
Input Sequences: scaffolds.fasta
Search Engine: RMBlast
Repeat Database: Dfam Consensus
Species: 40559 Botrytis cinerea
Speed/Sensitivity: Default
Masking Options: Soft masking (lowercase)
Apply Divergence Cutoff: false
Only Alu elements: false
Type of repeat: interspersed,simple_low
Not mask RNA genes: false
Output FASTA: masked_sequences.fasta
Execution Time
10-15 minutes.
Output
masked_sequences.fasta: FASTA file containing repeat masked sequences.
scaffolds_gff.box: GFF project containing repeat coordinates.
report_repeat_masking.box: Report about Repeat Masking results.
repeat_type_distribution.box: Pie chart showing the abundance of each type of detected repeats.
3- Gene Finding
Application
Eukaryotic Gene Finding by AUGUSTUS.
Input
Repeat masked sequences in FASTA format (from the 2- Repeat Masking step).
Parameters
Input Sequences: masked_sequences.fasta
Closest Species: Botrytis cinerea [Fungi - Ascomycota - Leotiomycetes]
Strand: Both Strands
Allowed Gene Structure: Partial
Output Genomic Features: introns,start,stop
Ignore Strand Conflicts: false
UTR Prediction: false
No In-frame Stop Codons: false
Stop Codons Excluded From CDS: false
Softmasked Sequences: true
Sample: 100
Alternatives From Sampling: false
Gene Finding Mode: Ab Initio Prediction
Execution Time
5-10 minutes
Output
cds.box: Sequence project containing predicted gene sequences.
protein.box: Sequence project containing predicted protein sequences.
coordinates.box: GFF project containing predicted gene coordinates.
report_eukaryotic_gene_finding.box: Report about Eukaryotic Gene Finding results.
cds_length_distribution.box: Bar chart showing the length distribution of the predicted genes.
4- BLAST & InterProScan
Application
CloudBLAST & InterProScan Annotation.
Input
Predicted gene sequences in an OmicsBox project (from the 3- Gene Finding step).
Parameters
CloudBlast
Blast Program: blastp-fast
Blast DB: Non-redundant protein sequences (nr v5)
Taxonomy Filter: 5178 Helotiales
Filter option: Blast against a subset of taxonomies
Blast Expectation Value (e-Value): 1.0E-3
Number of Blast Hits: 20
Blast Description Annotator: true
Word Size: 6
Low Complexity Filter: true
HSP Length Cutoff: 33
HSP-Hit Coverage: 0
Filter by Description: No filter
Save results as XML2 files: false
Blast Program: blastx-fast
Blast DB: Non-redundant protein sequences (nr v5)
Taxonomy Filter: 2 Bacteria <bacteria>
Filter option: Blast against a subset of taxonomies
Blast Expectation Value (e-Value): 1.0E-3
Number of Blast Hits: 20
Blast Description Annotator: true
Word Size: 6
Low Complexity Filter: true
HSP Length Cutoff: 33
HSP-Hit Coverage: 0
Filter by Description: No filter
Save results as XML2 files: false
InterProScan
FPrintScan: true
HMMPIR: true
HMMPfam: true
HMMTigr: true
ProfileScan: true
HAMAP: true
PatternScan: false
SuperFamily: true
HMMPanther: true
Gene3D: true
Coils: false
CDD: true
SFLD: true
MobiDBLite: true
Execution Time
~ 1 hour.