Time Course Expression Analysis

Introduction

This tool is designed to perform time-course expression analysis of count data arising from RNA-seq technology. Based on the maSigPro program, this application allows the detection of genomic features (e.g. genes) with significant temporal expression changes and significant differences between experimental groups. The software package maSigPro, which belongs to the Bioconductor project, implements a two steps regression strategy to find genes for which there are significant expression profile differences in time course RNA-seq experiments.

Please cite maSigPro as:
Conesa A, Nueda MJ, Ferrer A, Talón M. maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics. 2006 May 1;22(9):1096-102. doi:10.1093/bioinformatics/btl056

Nueda MJ, Tarazona S, Conesa A. Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series. Bioinformatics. 2014;30(18):2598-2602. doi:10.1093/bioinformatics/btu333

Figure 1: Time Course Expression Interface

Expression Data

The pairwise differential expression analysis application expects gene expression levels in the form of a count table. In OmicsBox, count tables can be generated via the Create Count Table application.

Count tables can also be imported from a text file. Go to transcriptomics → Load → Load RNA-Seq Count Table (Figure 2) and select your .txt file containing the count table.

Figure 2: Count Table File

Run Analysis

Go to transcriptomics → Differential Expression Analysis. If there’s no count table project opened, the first wizard page (Figure 3) will ask to upload either a Count Table Project (.box file) or a Count Table File (.txt, .csv, or .tsv file). On the second wizard page, choose the "Time Course Expression Analysis" option.

If a count table is already loaded in OmicsBox (see above section), this one will be used to perform the analysis. In this case, the analysis can be run by going to the “Diff. Expression Analysis” in the Side Panel of the object as well. Now the first wizard page will be to select the type of differential expression analysis (Figure 4).

In the next pages, it is possible to specify different analysis parameters, which are divided into three different sections: Preprocessing Data (Figure 5), Experimental Design (Figure 6), and Analysis Options (Figure 7).


Figure 3. Input Wizard Page.

Figure 4: Differential Expression Analysis Options wizard page.

Preprocessing Data Page

  • Filter low count genes:

    • CPM Filter: Establish a filter to exclude genes with low counts across libraries, as those genes may interfere with the subsequent statistical approximations. Filtering is performed on a count-per-million (CPM) basis to account for differences in library size between samples (e.g. a CPM of 1 corresponds to a count of 6 in a sample with 6 million reads).

    • Samples reaching CPM Filter: Set a minimum number of samples in which the gene's CPM is above the filter level (is expressed). If this value is set to e.g. five, at least 5 of the samples have to be above the given CPM. The number of samples of the smallest group is usually taken (e.g. in an experiment that has two replicates for each condition (or group), a gene should be expressed in at least two samples). Set value to 0 if no filter is desired.

  • Normalization procedure:

    • Normalization Method: Normalization is an important step to make the samples comparable and to remove possible biases (as sequencing depth bias) in count data. You can select the normalization method to be used:

      • TMM: Weighted trimmed mean of M-values. In this method, weights are obtained from the delta method on Binomial Data (this method is recommended).

      • RPKM: Reads Per Kilobase per Million mapped reads. This method corrects for gene length and the number of sequencing reads (gene length is required).

      • Upper-quartile: 75% quantile for the counts for each library is used to calculate the scale factors for normalization.

      • None: No normalization method is applied.

    • Feature Length File: For RPKM normalization load a tab-delimited file (or ID-Value object) with two columns containing the name and length of each gene or genomic feature.

Figure 5: Preprocessing Data Page

Experimental Design Page

  • Experimental design file: Select your .txt file containing your experiment descriptors associated with each sample in tab-delimited format. As shown below, rows correspond to samples and columns to experimental descriptors. A column must contain the associated time points for each sample, and another column should show the assignment of samples to experimental groups. Make sure that the names in the first column of the experimental design table are exactly the same as the sample names in the count table header. If your experimental design file has fewer samples than the count table, only the samples contained in this file will be analyzed.

 Click here to expand...
Sample	Time	Group
B12_A6_06hpi_1	6	A6
B12_A6_06hpi_2	6	A6
B12_A6_06hpi_3	6	A6
B12_A6_12hpi_1	12	A6
B12_A6_12hpi_2	12	A6
B12_A6_12hpi_3	12	A6
B12_A6_18hpi_1	18	A6
B12_A6_18hpi_2	18	A6
B12_A6_18hpi_3	18	A6
B12_A6_24hpi_1	24	A6
B12_A6_24hpi_2	24	A6
B12_A6_24hpi_3	24	A6
B12_K1_06hpi_1	6	K1
B12_K1_06hpi_2	6	K1
B12_K1_06hpi_3	6	K1
B12_K1_12hpi_1	12	K1
B12_K1_12hpi_2	12	K1
B12_K1_12hpi_3	12	K1
B12_K1_18hpi_1	18	K1
B12_K1_18hpi_2	18	K1
B12_K1_18hpi_3	18	K1
B12_K1_24hpi_1	24	K1
B12_K1_24hpi_2	24	K1
B12_K1_24hpi_3	24	K1
pps_A6_06hpi_1	6	A6
pps_A6_06hpi_2	6	A6
pps_A6_06hpi_3	6	A6
pps_A6_12hpi_1	12	A6
pps_A6_12hpi_2	12	A6
pps_A6_12hpi_3	12	A6
pps_A6_18hpi_1	18	A6
pps_A6_18hpi_2	18	A6
pps_A6_18hpi_3	18	A6
pps_A6_24hpi_1	24	A6
pps_A6_24hpi_2	24	A6
pps_A6_24hpi_3	24	A6
pps_K1_06hpi_1	6	K1
pps_K1_06hpi_2	6	K1
pps_K1_06hpi_3	6	K1
pps_K1_12hpi_1	12	K1
pps_K1_12hpi_2	12	K1
pps_K1_12hpi_3	12	K1
pps_K1_18hpi_1	18	K1
pps_K1_18hpi_2	18	K1
pps_K1_18hpi_3	18	K1
pps_K1_24hpi_1	24	K1
pps_K1_24hpi_2	24	K1
pps_K1_24hpi_3	24	K1

Figure 6: Experimental Design Page

Analysis Options

  • Design Type: Choose the design type to adjust the analysis.

    • Single Series Time Course: Detects genes that show significant expression changes over time. You only have to select the time factor of your experimental design in “Targets".

    • Multiple Series Time Course: Find genes with significant temporal expression changes and significant differences between experimental groups. You have to establish the time and experimental factors, and select the control condition of your experimental design in “Targets".

  • Statistical Settings:

    • Significance Level (Alfa): The level of FDR control used for variable selection in the stepwise regression.

    • R-squared Cutoff: Cutoff value for the R-squared of the regression model.

  • Visualization of Results:

    • Number of Clusters: Establish a number of clusters to group genes by similar expression profiles.

    • Clustering Method: Choose a clustering method for data partitioning.

      • Hierarchical Clustering: Performs a hierarchical cluster analysis using a set of dissimilarities for the features being clustered.

      • K-Means Clustering: Is intended to divide the points into K clusters such that the sum of squares of the points to the centers of the clusters assigned is minimized.

      • Model-Based Clustering: The optimal model according to BIC for EM initialized by hierarchical clustering for Gaussian mixture models. This method computes an optimal number of clusters. Keep in mind that this method requires more time.

Figure 7: Analysis Options

Results

Once the input counts have been processed and analyzed via the “Time Course Expression Analysis" tool, a new tab is opened containing statistical results obtained by the stepwise regression statistical test (Figure 8):

  • Tags: Indicate the list/s of significant genes in which the feature appears (R-squared ≥ R-squared Cutoff).

    • Red tags: Lists of significant genes for each experimental group (only available in “Multiple Series Time Course”).

    • Blue tags: List of significant genes for each variable of the regression model.

  • Name: feature name.

  • P-Value: if it’s significant, it is indicating that the gene expression changes over time.

  • R-squared: how well the data fits the model obtained for that gene’s expression. 

  • P-Value_beta0: if it’s significant, it means that the gene’s expression at time point 0 is different from 0.

  • P-Value_Time:  if it’s significant, it means that the gene expression follows a linear trend, especially at the beginning. That is, that it increases or decreases linearly.

  • P-Value_Time2: if it’s significant, that means that the gene expression profile has a curvature. That is, that it changes the expression behavior at some point (i.e first the expression increases and then decreases). 

For the “Multiple Series Time Course” additional p-values are calculated, one for each control vs condition combination. For two conditions named “A” (control) and “B” :

  • P-Value_BvsA: if it’s significant, it means that the gene’s expression at time point 0 in condition B is different than in A.

  • P-Value_TimexB: if it’s significant, it means that the linear gene expression is different between conditions A and B. That is, that the gene expression in one condition increases or decreases more than in the other condition.

  • P-Value_Time2xB: if it’s significant, that means that the change in expression is different for A and B (i.e in condition A it increases and then decreases, but in condition B it never decreases). 

Only the genes that have passed the established Significance Level are shown in the new tab. For further details please refer to the maSigPro User's Guide.

There could be missing p-values. That means that this characteristic is not significative in the gene expression profile. So it is not considered for constructing the gene’s expression model, and thus it’s value is not stored.

Results can be saved as a TC Results object. Note that is not possible to perform the analysis on this object. For this purpose, you have to open the Count Table object.

Figure 8: Table Viewer

A result page will show a summary of the time-course expression analysis results, including the cluster of features with similar expression profiles (Figure 9). Go to Side Panel → Result Summary in order to visualize the result summary and to export it in pdf.

During the Time Course Expression Analysis, raw counts are transformed according to the normalization method selected in the analysis configuration. Go to Export Normalized Counts (sidebar) to export normalized counts to a tabular text file. 

Figure 9: Summary Report

Charts and Statistics

Different statistics charts can be generated for a global visualization of the results. These charts can be found under the Side Panel of the TimeCourse Results viewer. 

MDS Plot

Generates a two-dimensional scatterplot in which the distances represent the typical log2 fold changes between samples. You can select an experimental factor by which you want to color the MDS graphic (Figure 10).

Figure 10: MDS Plot

Venn Diagram

Diagram showing all possible logical relations between a finite collection of different feature sets (Figure 11). You can choose between two types of Venn Diagram (“Pairwise” or “Triple”), and select the sets of significant genes to display.

Figure 11: Venn Diagram

Expression Profile by Gene

Graph of gene expression profiles over time for a particular gene (Figure 12). It is possible to see them by right-clicking on the chosen gene and selecting the “Show Expression Profile” option.

Figure 12: Gene Expression Profile

Experiment-wide Expression Profiles

Plot showing the expression level levels across samples for each cluster of genes (Figure 13).

Figure 13: Experiment-wide Expression Profile

Summary Expression Profiles

Plot showing the median level expression of each cluster of genes across time (Figure 14).

Figure 14: Summary Expression Profile