Functional Characterization
Introduction
The Microbiome Analysis feature in OmicsBox allows describing the bacterial diversity between two different soda lakes (Salina Preta and Salina Verde) thanks to the Taxonomic Classification Workflow, and also enables the identification of the functional genetic potential of these microbial communities.
In order to generate the metagenomic dataset, first it was necessary to follow some previous steps using OmicsBox: FastQC and Preprocessing.
Dataset description
The metagenomic analysis was done using a dataset that consists of 12 single-end metagenomic samples. Those samples were collected in two different soda lakes: Salina Verde and Salina Preta. There are three replicates for each lake, taken at two different times, morning and afternoon. The metagenomic libraries were prepared using Nextera XT DNA Sample Preparation kit and sequenced using the Illumina MiSeq platform and the MiSeq Reagent Kit V3.
Lake | Time of sampling | Replicates | Sample names |
---|---|---|---|
Preta | Morning (10 AM) | 3 | PMB1, PMB2, PMB3 |
Preta | Afternoon (3 PM) | 3 | PAB1, PAB2, PAB3 |
Verde | Morning (10 AM) | 3 | VMB1, VMB2, VMB3 |
Verde | Afternoon (3 PM) | 3 | VAB1, VAB2, VAB3 |
Publication
Original Data
The Data was downloaded from the MG-RAST server: mgp10309
Bioinformatic analysis
1- Assembly
Application
Metagenomic Assembly (MEGAHIT)
Input
Processed Illumina sequencing data in FASTQ format:
f.PAB1.fastq.gz
f.PAB2.fastq.gz
f.PAB3.fastq.gz
f.PMB1.fastq.gz
f.PMB2.fastq.gz
f.PMB3.fastq.gz
f.VAB1.fastq.gz
f.VAB2.fastq.gz
f.VAB3.fastq.gz
f.VMB1.fastq.gz
f.VMB2.fastq.gz
f.VMB3.fastq.gz
Parameters
Sequencing Data: single
Minimum Multiplicity: 2
K-mer Sizes: 29,39,59,79,99,119,141
No Mercy K-mers: false
Bubble Level: high
Bubble Merge Level L: 20
Bubble Merge Level S: 0.95
Prune Level: high
Prune Depth: 2
Low Local Ratio: 0.2
Max Tip Length: 2
Disable Local Assembly: false
Execution Time
It varies with the number of reads per sample between 3-6 minutes aproximately.
Sample group | Execution time |
---|---|
PAB | 2min 43s |
PMB | 3min 48s |
VAB | 4min 47s |
VMB | 5min 50s |
Output
Multifasta files with the assembled contigs for each sample group.
Statistical reports for each assembly.
2- Gene Finding
Application
Gene Finding (FragGeneScan)
Input
Assemblies made in the previous step:
contigs.PAB.fasta
contigs.PMB.fasta
contigs.VAB.fasta
contigs.VMB.fasta
Parameters
Type of Data: Complete Genomic Sequences.
Model for Input Data: Complete genomic sequences or short sequence reads without sequencing error.
Execution Time
One minute each fasta file.
Output
Multifasta files with the nucleotide sequence for each predicted gene.
Multifasta files with the amino acid sequence for each predicted gene.
Reports with the number and length of the predicted genes.
3- Functional Annotation
Application
Functional Annotation (PfamScan and EggNOG Mapper)
Functional Annotation can also be done using the Blast2GO methodology that can be found also in this manual.
Input
Multifasta files with amino acid sequences predicted in the previous step:
PAB.proteins.fasta
PMB.proteins.fasta
VAB.proteins.fasta
VMB.proteins.fasta
Parameters
PfamScan | EggNOG Mapper |
---|---|
No parameters | Target Orthologs: All |
GO Evidence: Non-Electronic |
Execution Time
Tool | Execution time |
---|---|
PfamScan | 10 minutes each sample group |
EggNOG | 30-35 minutes each sample group |
Output
PfamScan | EggNOG Mapper |
---|---|
Table that summarizes all PfamScan annotations (Type of motif, HMM information and GO information). | Table that summarizes all annotations that could be transferred with EggNOG Mapper (EggNOG description, GO information and KEGG information). |
Report with general information and distribution of different types of motifs. | Report with general information and distribution of different COG categories and Orthologous groups. |
Workflow
More information can be found in this review.