Functional Annotation
Introduction
Functional Annotation of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins.
Dataset description
The protein sequences have been downloaded from NCBI.
Layout: Fasta
Original Data
Reference genome: Severe acute respiratory syndrome coronavirus 2 ASM985889v2
RefSeq: NC_045512.2
BioProject: PRJNA485481
Bioinformatic Analysis
1- CloudBlast
Application
Input
SARS-CoV-2 protein Fasta file that has to be loaded in OmcisBox.
Parameters
Blast Program: blastp-fast
Blast DB: Non-redundant protein sequences (nr v5)
Blast Expectation Value (e-Value): 1.0E-3
Number of Blast Hits: 20
Blast Description Annotator: True
Word Size: 6
Low Complexity Filter: True
HSP Length Cutoff: 33
HSP-Hit Coverage: 0
Filter By Description: No filter
Save XML results in a folder
Execution Time
19 min
Consumed Units
Consumed 1174 CloudUnits during this execution.
Output
Blast XML files in zip.
2- Cloud InterProScan
Application
Retrieve protein family domains with InterProScan Annotation.
Input
Load the Fasta file or use the saved Fasta project.
Parameters
CDD: True
HAMAP: True
HMMPanther: True
HMMPfam: True
HMMPIR: True
FPrintScan: True
ProfileScan: True
HMMTigr: True
PatternScan: False
Gene3D: True
SFLD: True
SuperFamily: True
Coils: False
MobiDBLite: True
Save InterProScan XML files in a folder
Execution Time
4 min
Consumed Units
Consumed 194 CloudUnits during this execution.
Output
3- Blast2GO Mapping
Application
Retrieve Gene Ontology terms using Gene Ontology Mapping.
Input
Fasta and Blast XML files from step 1- BLAST or the saved blast and IPS project.
Parameters
Use latest database version: True
In this example, it was June 2021
Execution Time
2 min
Output
The same project with Gene Ontology terms.
4- Blast2GO Annotation
Application
Apply the Gene Ontology Annotation rule to all GO terms.
Input
Use the project output from step 3- Mapping.
Parameters
Annotation CutOff: 55
GO Weight: 5
Filter GO by Taxonomy: No Filter
E-Value-Hit-Filter: 1.0E-6
HSP-Hit Coverage CutOff: 0
Hit Filter: 500
Only hits with GOs: False
Evidence Code Weights: Default Values
Execution Time
Very Fast
Output
The same project with Gene Ontology terms that passed the Annotation rule CutOff.
5- EggNOG Annotation
Application
Retrieve additional Gene Ontology terms from orthologs by running EggNOG.
Input
Load the Fasta file or use the saved fasta project.
Parameters
Target Orthologs: All
GO Evidence: Non-Electronic
Execution Time
6 min
Output
A new project with EggNOG annotations.
Report with the ortholog information.
6- Merge EggNOG to Annotation
Application
Merge the Gene Ontology terms retrieved from EggNog to existing Annotation.
Input
The EggNOG annotation project from step 5- EggNOG Annotation has to be opened in OmcisBox.
Parameters
Sequence Project: The project with Gene Ontology terms that passed the Annotation rule CutOff from step 4- Blast2GO Annotation.
Seed Ortholog E-Value Filter: 1E-3
Seed Ortholog Bit-Score Filter: 60
Execution Time
Very fast
Output
The annotation project will open with the added Gene Ontology terms.
A chart with information on the number of GOs that has been merged.
7- Merge InterProScan to Annotation
Application
Merge the Gene Ontology terms retrieved from InterProScan to existing Annotation.
Input
Open the project with Gene Ontology terms merged from EggNOG and InterProScan results.
If no InterProScan results are available in the project, it is possible to run or load the results from step 2- Cloud InterProScan
Execution Time
Very fast
Output
The same project with the Gene Ontology retrieved from InterProScan.
A chart with information on the number of GOs that has been merged.
8- Functional Enrichment Analysis (Fisher’s Exact Test)
Application
In this case, a comparative analysis between SARS-CoV and SARS-CoV-2 will be performed to see if there is a function that is specific to the SARS-CoV-2 strand.
Input
The protein sequences of SARS-CoV have been downloaded from NCBI and analyzed with the above pipeline. Both annotated projects (SARS-CoV and SARS-CoV-2) have to be merged into a single project and a test set has to be generated. The test set for the Enrichment Analysis will be the identifiers from SARS-CoV-2.
Merge projects
It is possible to combine 2 projects in OmicsBox by adding the results to the other.
This has to be done in the file manager, by selecting both projects, right-clicking on the first project, and selecting Merge. All results have to be added and a new Merged project will open.
Create test set id list
It is possible to create an id list in OmicsBox from an annotated project.
The SARS-CoV-2 project has to be opened in OmcisBox. First, all sequences have to be marked with Ctrl + A (Windows and Linux) / Cmd + A (Mac) and then right-click on a sequence name to choose “Create ID List of Column: SeqName”. A new tab will open with the sequence identifiers in single columns. This list has to be saved and used as the test set for the Enrichment Analysis.
Parameters
Open the Merged project in OmicsBox and this will be used as the reference.
Test-Set Files: SARS-CoV2_idlist.box
Reference-Set Files: false
Do Not Filter: false
Filter Value: 0.01
Filter Mode: P-VALUE
Two Tailed: false
Remove Double IDs: true
Annotations: GO IDs
GO Categories: biological_process,molecular_function,cellular_component
Execution Time
Very fast
Output
Project containing results of the functional enrichment analysis. In this case, only 1 Gene Ontology term is enriched which is specific for SARS-CoV-2 which is “host cell endosome”.