RNASeq and differential gene expression analysis

Differential gene expression, commonly abbreviated as DG or DGE analysis refers to the analysis and interpretation of differences in abundance of gene transcripts within a transcriptome according to phenotype or experimental conditions. The goal of differential expression testing is to determine which genes are expressed at different levels between conditions. These genes can offer biological insight into the processes affected by the condition(s) of interest.

Because the count of genes that are differentially expressed between samples may be high, a method to understand and interpret the meaning of so many gene expression changes is needed that enables grouping of genes that belong to a particular category enriched in one sample compared to another sample. For example, if a breast cancer sample has more genes regulated that are annotated to the “cell cycle genes group” than a control sample. Grouping of genes can be performed based on their annotation to a number of sources, one of these being The Gene ontology resource (GO).

The Gene Ontology (GO) resource is the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research. The GO defines concepts/classes used to describe gene function, and relationships between these concepts.

Our DGE lab & data analysis pipeline therefore includes:

RNA sample processing and sequencing
Raw data quality control, trimming of bases with low quality and clipping of adaptor sequences
Mapping to reference sequence (genome or transcriptome)
Counting reads + Normalization
Gene expression analysis using DESeq2 and edgeR packages
Gene annotation and gene ontology (GO) analysis
Evaluation and summary, data analysis report

Should you have raw data from another provider already and need only help with its analysis, we are at your disposal too.

Note: Lists of genes that differ between two or more sample sets are often provided by RNA-seq data analysis tools. RNA-Seq enables to look not only at changes in gene expression over time, or differences in gene expression in different groups or treatments. You can employ this technique to investigate also alternative gene spliced transcripts, post-transcriptional modifications, gene fusion or mutations/SNPs. If these are your scientific goals and you need sample or data analysis, do not hesitate to contact us.

Required read depth (data amount)

The amount of data needed per sample can be determined by the concept of depth. For example, given that the human transcriptome accounts for 3% of the human genome (3 Gb), having 90 Mb data would be 1×depth and on average cover each nucleotide of interest once. However, some genes are more highly expressed than others and some genes are rarely expressed, so even 1000×depth would only provide an even chance of sequencing a transcript that is 1 in a thousand in a cell.

The amount of data needed therefore depends on the library preparation strategy (ribodepletion, polyA selection, …), the source organism and the size of its transcriptome / genome and the genes we want to target (whether we assume high or low levels of their transcripts).

At least 30 mil. reads for DGE of human samples is generally recommended and a pilot study might be needed in case of non-model organism.

Laboratory processing

Should you request wetlab processing of your samples, you must provide RNA or sequencing libraries. If you need data analysis only, please scroll down.

For wetlab processing you must perform RNA isolation. Then, in our lab your samples will be processed as follows:

Sample QC
Library preparation - Total RNA or individual populations of RNA, such as mRNA or small RNA are captured and converted to cDNA and subjected to analysis. rRNA as well as abundant mRNA transcripts such as globin can be targeted for depletion in order to focus sequencing throughput on RNA of interest. rRNA depletion can be performed for the following organisms:
- Human, mouse, rat rRNA
- Bacterial and yeast rRNA
- Plant rRNA
- Fish rRNA
- Caenorhabditis elegans and Drosophila melanogaster rRNA
Quantification of libraries by qPCR
Sequencing on Illumina NovaSeq6000, paired-end, 150 b (unless agreed/requested otherwise)
Data evaluation

Results guarantee

The sequences obtained will be sorted according to the combination of indexes into files representing individual samples and analysis of sequencing quality indicators such as count and length of reads, phred score, %GC, duplication level, etc. will be performed.

As output you will receive data in FASTQ format divided into files according to individual samples.

Data analysis

Requirements to perform data analysis:

It is important to understand that a gene is declared differentially expressed if an observed difference (change in read counts) between two experimental conditions is statistically significant, that is if the difference is greater than what would be expected just due to random variation. Therefore, DGE is a statistical technique and as such must meet basic statistical requirements regarding count of samples/groups to compare. To successfully perform DGE on your data, these are the “musts”:

Your project must be designed to have at least 2 groups of samples to compare with at least 3 biological replicates per sample/group. The smallest data set therefore is 2 groups (samples) with 3 biological replicates each = 6 data sets.
You must specify the control and treated group.
Genome or transcriptome reference sequence (FASTA format) and genome annotation (GFF/GTF format) must be available, preferably from public repositories such as NCBI, ENSEMBL, UCSC, etc. The reference sequence can be a genome sequence from the same organism as the source RNA or a closely related species. Reference sequences can also be closely related transcriptome sequences.
If higher sensitivity and specificity of the experiment are needed, we recommend using RNA spike-in transcripts. Data analysis can also be performed in case of missing reference sequence or having less than 3 replicates per sample. In all these cases please contact us before ordering.

Data analysis outputs:

Trimmed data in fastq format, multiqc report
Aligned data in bam format, qualimap reports
Matrix table with transcripts abundance
Rescaled data according to the TMM normalization factors
Expression values for all transcripts
Volcano and MA plot in pdf format
Count of differentially expressed genes
Correlation heat map of each samples in pdf format
Heat map of differentially expressed genes
GO terms table + graphical output, web link to GO results

Course or workshop

If you are interested in learning how to analyze the data, visit our regularly organized workshop!
Beginners may also consider our 2 day NGS introductory course.

Sample requirements

Follow our Sample submission guidelines. To perform data analysis, at least 6 samples / data sets are required.

Please note that the success of the wetlab analysis depends very much on the integrity of RNA you provide! The use of degraded RNA can compromise standard library preparation. If you are not able to isolate high-RIN RNA (>7), you may think of ordering 3' mRNA library prep (QuantSeq/UMI) instead where the quality requirements are not strict and allow analysis of degraded RNA samples.

How to order the service

The analysis can be ordered and commissioned online including wet lab sample processing or data analysis only. If you intend to order sample processing by Illumina technology, select Illumina - À la carte sequencing.

If you intend to order data analysis only, select Data analysis services.

RNASeq and differential gene expression analysis