InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. To classify proteins in this way, InterPro uses predictive models, known as signatures, provided by several different databases (referred to as member databases) that make up the InterPro consortium. InterPro combines protein signatures from these member databases into a single searchable resource, capitalising on their individual strengths to produce a powerful integrated database and diagnostic tool.
Please Cite InterProScan:
Blum M, Chang H, Chuguransky S, Grego T, Kandasaamy S, Mitchell A, Nuka G, Paysan-Lafosse T, Qureshi M, Raj S, RichardsonL, Salazar GA, Williams L, Bork P, Bridge A, Gough J, Haft DH, Letunic I, Marchler-Bauer A, Mi H, Natale DA, Necci M, Orengo CA, Pandurangan AP, Rivoire C, Sigrist CJA, Sillitoe I, Thanki N, Thomas PD, Tosatto SCE, Wu CH, Bateman A and Finn RD The InterPro protein families and domains database: 20 years on. Nucleic Acids Research, Nov 2020, (doi: 10.1093/nar/gkaa977)
The functionality of InterPro annotations in OmicsBox allows to retrieved domain/motif information in a sequence-wise manner. Corresponding GO terms are then transferred to the sequences and merged with already existing GO terms. InterProScan results can be viewed through the Single Sequence Menu (right-click on a sequence) and saved in TXT and XML format (figure 4). When working with nucleotide sequences, OmicsBox translates them to the longest open reading frame and then sends them to InterProScan.
The following options can be found under functional analysis → InterProScan or from the Side Panel when a project has been loaded.
InteProScan. Start sending sequences to the EBI or OmicsBox Cloud.
Remove InterProScan. Delete InterProScan results for the selected sequences.
Merge GOs. Add GO terms obtained through motifs/domains to the current annotations.
There are two options to run InterProScan in OmicsBox, either with CloudIPS or via the public web service at EBI.
CloudIPS is a cloud-based OmicsBox community resource for fast and reliable InterPro analysis for everything from small to big data sets. It allows executing the original InterPro algorithms against up-to-date databases in our dedicated computing cloud. This is a high-performance, secure and cost-optimized solution for your analysis.
The public EMBL-EBI InterPro web service scans your sequences against InterPro's signatures and performance and results depend on the EBI web server.
Figure 1: Choose InterProScan
The first two configuration pages (figure2 and 3) show the databases that will be used to retrieve the protein families, domains, etc.
The last page allows to save the InterProScan results in different file formats, in tab-separated values (TVS), XML, which is the default output, GFF3, and the input (query) sequence itself (figure 4).
Once the InterProScan has finished it is possible to view the results of each sequence via the context menu (figure 5). The sequences will turn violet if no other analysis has been executed before.
InteProScan can only be performed if the sequences are shown in the sequence table that contains the actual sequence information (loaded via fasta file). You have to be careful if you created a project via a blast XML file or if you loaded a .annot file.
To add the sequences to the current OmicsBox project see Add sequences to existing OmicsBox project section.
Figure 2: Selection of Member Databases
Figure 3: Selection of Member Databases
Figure 4: Save InterProScan Results
Figure 5: InterProScan Results
It is possible to select InterProScan statistics to see how many sequences still do or do not have IPS results and how many sequences have GOs resulting from InterProScan.
InterProScan Results: This chart reflects the effect of adding the GO terms retrieved through the InterProScan results (figure 7). When comparing this chart with the chart in figure 4 “Analysis Progress” the bar “Only with InterProScan” includes the number of sequences “With and Without IPS” in figure 7.
InterProScan Families Distribution: Bar chart representing the number of sequences that belong to a particular IPS family.
InterProScan Domains Distribution: Bar chart showing the number of sequences that belong to a particular IPS domain.
InterProScan Repeats Distribution: Bar chart reflecting the number of sequences that belong to a particular IPS repeat.
InterProScan Sites Distribution: Bar chart representing the number of sequences that belong to a particular IPS site.
InterProScan IDs Distribution: Bar chart showing the number of sequences that have been annotated with that InterProScan IDs.
InterProScan IDs by Database: Pie chart reflecting the number of sequences of the InterProScan IDs for a particular InterProScan Database. In figure 6 the Pfam database is selected.
Figure 6: InterProScan Statistics Configuration Window
Figure 7: InterProScan Statistics
The InterProScan GOs results can now be added to the already existing annotations based on the BLAST results. This option is available from the InterProScan submenu.
Once the merge has finished a distribution chart is displayed in the Results menu showing the number of GOs that have been added to (or confirmed) the current annotation results.
Figure 8: Statistics after merging InterProScan to GO Annotation