Functional Annotation

Introduction

Functional Annotation of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins.

Dataset description

The protein sequences have been downloaded from NCBI.

Original Data

  • Reference genome: Severe acute respiratory syndrome coronavirus 2 ASM985889v2

  • RefSeq: NC_045512.2

  • BioProject: PRJNA485481

Bioinformatic Analysis

1- CloudBlast

Application

Homology search using BLAST

Input

SARS-CoV-2 protein Fasta file that has to be loaded in OmcisBox.

Parameters

  • Blast Program: blastp-fast

  • Blast DB: Non-redundant protein sequences (nr v5)

  • Blast Expectation Value (e-Value): 1.0E-3

  • Number of Blast Hits: 20

  • Blast Description Annotator: True

  • Word Size: 6

  • Low Complexity Filter: True

  • HSP Length Cutoff: 33

  • HSP-Hit Coverage: 0

  • Filter By Description: No filter

  • Save XML results in a folder

Execution Time

19 min

Consumed Units

Consumed 1174 CloudUnits during this execution.

Output

Blast XML files in zip.

2- Cloud InterProScan

Application

Retrieve protein family domains with InterProScan Annotation.

Input

Load the Fasta file or use the saved Fasta project.

Parameters

  • CDD: True

  • HAMAP: True

  • HMMPanther: True

  • HMMPfam: True

  • HMMPIR: True

  • FPrintScan: True

  • ProfileScan: True

  • HMMTigr: True

  • PatternScan: False

  • Gene3D: True

  • SFLD: True

  • SuperFamily: True

  • Coils: False

  • MobiDBLite: True

  • Save InterProScan XML files in a folder

Execution Time

4 min

Consumed Units

Consumed 194 CloudUnits during this execution.

Output

InterProScan XML files

3- Blast2GO Mapping

Application

Retrieve Gene Ontology terms using Gene Ontology Mapping.

Input

Fasta and Blast XML files from step 1- BLAST or the saved blast and IPS project.

Parameters

  • Use latest database version: True
    In this example, it was June 2021

Execution Time

2 min

Output

The same project with Gene Ontology terms.

4- Blast2GO Annotation

Application

Apply the Gene Ontology Annotation rule to all GO terms.

Input

Use the project output from step 3- Mapping.

Parameters

  • Annotation CutOff: 55

  • GO Weight: 5

  • Filter GO by Taxonomy: No Filter

  • E-Value-Hit-Filter: 1.0E-6

  • HSP-Hit Coverage CutOff: 0

  • Hit Filter: 500

  • Only hits with GOs: False

  • Evidence Code Weights: Default Values

Execution Time

Very Fast

Output

The same project with Gene Ontology terms that passed the Annotation rule CutOff.

5- EggNOG Annotation

Application

Retrieve additional Gene Ontology terms from orthologs by running EggNOG.

Input

Load the Fasta file or use the saved fasta project.

Parameters

  • Target Orthologs: All

  • GO Evidence: Non-Electronic

Execution Time

6 min

Output

A new project with EggNOG annotations.
Report with the ortholog information.

6- Merge EggNOG to Annotation

Application

Merge the Gene Ontology terms retrieved from EggNog to existing Annotation.

Input

The EggNOG annotation project from step 5- EggNOG Annotation has to be opened in OmcisBox.

Parameters

  • Sequence Project: The project with Gene Ontology terms that passed the Annotation rule CutOff from step 4- Blast2GO Annotation.

  • Seed Ortholog E-Value Filter: 1E-3

  • Seed Ortholog Bit-Score Filter: 60

Execution Time

Very fast

Output

The annotation project will open with the added Gene Ontology terms.
A chart with information on the number of GOs that has been merged.

7- Merge InterProScan to Annotation

Application

Merge the Gene Ontology terms retrieved from InterProScan to existing Annotation.

Input

Open the project with Gene Ontology terms merged from EggNOG and InterProScan results.
If no InterProScan results are available in the project, it is possible to run or load the results from step 2- Cloud InterProScan

Execution Time

Very fast

Output

The same project with the Gene Ontology retrieved from InterProScan.
A chart with information on the number of GOs that has been merged.

8- Functional Enrichment Analysis (Fisher’s Exact Test)

Application

In this case, a comparative analysis between SARS-CoV and SARS-CoV-2 will be performed to see if there is a function that is specific to the SARS-CoV-2 strand.

Input

The protein sequences of SARS-CoV have been downloaded from NCBI and analyzed with the above pipeline. Both annotated projects (SARS-CoV and SARS-CoV-2) have to be merged into a single project and a test set has to be generated. The test set for the Enrichment Analysis will be the identifiers from SARS-CoV-2.

Merge projects

It is possible to combine 2 projects in OmicsBox by adding the results to the other.
This has to be done in the file manager, by selecting both projects, right-clicking on the first project, and selecting Merge. All results have to be added and a new Merged project will open.

Create test set id list

It is possible to create an id list in OmicsBox from an annotated project.
The SARS-CoV-2 project has to be opened in OmcisBox. First, all sequences have to be marked with Ctrl + A (Windows and Linux) / Cmd + A (Mac) and then right-click on a sequence name to choose “Create ID List of Column: SeqName”. A new tab will open with the sequence identifiers in single columns. This list has to be saved and used as the test set for the Enrichment Analysis.

Parameters

Open the Merged project in OmicsBox and this will be used as the reference.

  • Test-Set Files: SARS-CoV2_idlist.box

  • Reference-Set Files: false

  • Do Not Filter: false

  • Filter Value: 0.01

  • Filter Mode: P-VALUE

  • Two Tailed: false

  • Remove Double IDs: true

  • Annotations: GO IDs

  • GO Categories: biological_process,molecular_function,cellular_component

Execution Time

Very fast

Output

Project containing results of the functional enrichment analysis. In this case, only 1 Gene Ontology term is enriched which is specific for SARS-CoV-2 which is “host cell endosome”.

Workflow

Example Workflow