Functional Characterization

Introduction

The Microbiome Analysis feature in OmicsBox allows describing the bacterial diversity between two different soda lakes (Salina Preta and Salina Verde) thanks to the Taxonomic Classification Workflow, and also enables the identification of the functional genetic potential of these microbial communities.

In order to generate the metagenomic dataset, first it was necessary to follow some previous steps using OmicsBox: FastQC and Preprocessing.

Dataset description

The metagenomic analysis was done using a dataset that consists of 12 single-end metagenomic samples. Those samples were collected in two different soda lakes: Salina Verde and Salina Preta. There are three replicates for each lake, taken at two different times, morning and afternoon. The metagenomic libraries were prepared using Nextera XT DNA Sample Preparation kit and sequenced using the Illumina MiSeq platform and the MiSeq Reagent Kit V3.

Lake

Time of sampling

Replicates

Sample names

Preta

Morning (10 AM)

3

PMB1, PMB2, PMB3

Preta

Afternoon (3 PM)

3

PAB1, PAB2, PAB3

Verde

Morning (10 AM)

3

VMB1, VMB2, VMB3

Verde

Afternoon (3 PM)

3

VAB1, VAB2, VAB3

Publication

Andreote, A. P., Dini-Andreote, F., Rigonato, J., Machineski, G. S., Souza, B. C., Barbiero, L., ... & Fiore, M. F. (2018). Contrasting the genetic patterns of microbial communities in soda lakes with and without cyanobacterial bloom. Frontiers in microbiology, 9, 244.

 Abstract

Soda lakes have high levels of sodium carbonates and are characterized by salinity and elevated pH. These ecosystems are found across Africa, Europe, Asia, Australia, North, Central, and South America. Particularly in Brazil, the Pantanal region has a series of hundreds of shallow soda lakes (ca. 600) potentially colonized by a diverse haloalkaliphilic microbial community. Biological information of these systems is still elusive, in particular data on the description of the main taxa involved in the biogeochemical cycling of life-important elements. Here, we used metagenomic sequencing to contrast the composition and functional patterns of the microbial communities of two distinct soda lakes from the sub-region Nhecolândia, state of Mato Grosso do Sul, Brazil. These two lakes differ by permanent cyanobacterial blooms (Salina Verde, green-water lake) and by no record of cyanobacterial blooms (Salina Preta, black-water lake). The dominant bacterial species in the Salina Verde bloom was Anabaenopsis elenkinii. This cyanobacterium altered local abiotic parameters such as pH, turbidity, and dissolved oxygen and consequently the overall structure of the microbial community. In Salina Preta, the microbial community had a more structured taxonomic profile. Therefore, the distribution of metabolic functions in Salina Preta community encompassed a large number of taxa, whereas, in Salina Verde, the functional potential was restrained across a specific set of taxa. Distinct signatures in the abundance of genes associated with the cycling of carbon, nitrogen, and sulfur were found. Interestingly, genes linked to arsenic resistance metabolism were present at higher abundance in Salina Verde and they were associated with the cyanobacterial bloom. Collectively, this study advances fundamental knowledge on the composition and genetic potential of microbial communities inhabiting tropical soda lakes.

Original Data

The Data was downloaded from the MG-RAST server: mgp10309

Bioinformatic analysis

1- Assembly

Application

Metagenomic Assembly (MEGAHIT)

Input

Processed Illumina sequencing data in FASTQ format:

  • f.PAB1.fastq.gz

  • f.PAB2.fastq.gz

  • f.PAB3.fastq.gz

  • f.PMB1.fastq.gz

  • f.PMB2.fastq.gz

  • f.PMB3.fastq.gz

  • f.VAB1.fastq.gz

  • f.VAB2.fastq.gz

  • f.VAB3.fastq.gz

  • f.VMB1.fastq.gz

  • f.VMB2.fastq.gz

  • f.VMB3.fastq.gz

Parameters

  • Sequencing Data: single

  • Minimum Multiplicity: 2

  • K-mer Sizes: 29,39,59,79,99,119,141

  • No Mercy K-mers: false

  • Bubble Level: high

  • Bubble Merge Level L: 20

  • Bubble Merge Level S: 0.95

  • Prune Level: high

  • Prune Depth: 2

  • Low Local Ratio: 0.2

  • Max Tip Length: 2

  • Disable Local Assembly: false

Execution Time

It varies with the number of reads per sample between 3-6 minutes aproximately.

Sample group

Execution time

PAB

2min 43s

PMB

3min 48s

VAB

4min 47s

VMB

5min 50s

Output

2- Gene Finding

Application

Gene Finding (FragGeneScan)

Input

Assemblies made in the previous step:

  • contigs.PAB.fasta

  • contigs.PMB.fasta

  • contigs.VAB.fasta

  • contigs.VMB.fasta

Parameters

Type of Data: Complete Genomic Sequences.

Model for Input Data: Complete genomic sequences or short sequence reads without sequencing error.

Execution Time

One minute each fasta file.

Output

3- Functional Annotation

Application

Functional Annotation (PfamScan and EggNOG Mapper)

Functional Annotation can also be done using the Blast2GO methodology that can be found also in this manual.

Input

Multifasta files with amino acid sequences predicted in the previous step:

  • PAB.proteins.fasta

  • PMB.proteins.fasta

  • VAB.proteins.fasta

  • VMB.proteins.fasta

Parameters

PfamScan

EggNOG Mapper

No parameters

Target Orthologs: All

GO Evidence: Non-Electronic

Execution Time

Tool

Execution time

PfamScan

10 minutes each sample group

EggNOG

30-35 minutes each sample group

Output

PfamScan

EggNOG Mapper

Table that summarizes all PfamScan annotations (Type of motif, HMM information and GO information).

Table that summarizes all annotations that could be transferred with EggNOG Mapper (EggNOG description, GO information and KEGG information).

Report with general information and distribution of different types of motifs.

Report with general information and distribution of different COG categories and Orthologous groups.

Workflow

More information can be found in this review.