Introduction

1.1 Transcriptomics Project - Analysis of Gene Expression

The use of RNA-Sequencing techniques for measuring gene expression is relatively new and replaces microarrays, though in some cases microarrays are still used. Gene expression data gives valuable insights into the workings of cells in certain conditions. Especially when comparing for instance healthy and diseased samples it can become clear which genes are causal or under influence of a specific condition. Finding the genes of interest (genes showing differing expression across conditions, called the Differentially Expressed Genes (DEGs)) is the goal of this project.

While there is no golden standard for analyzing RNA-sequencing datasets as there are many tools (all manufacturers of sequencing equipment also deliver software packages) we will use R combined with proven libraries for processing, visualizing and analyzing publicly available data sets. While in some cases you are allowed to use the actual raw (read) data that is available, it is highly recommended to use the pre-processed data which often is a table with a count value for each gene. This count is the number of reads that was mapped to that gene which corresponds to the relative number of transcripts (mRNA sequences) of that gene present in the cell at the time of sampling.

1.2 Project Deliverables

The end products of this course consist of three deliverables; a knitted ‘lab journal’ where you have logged all steps performed to get to the end result, a separate Shiny dashboard application and a final report in the form of a short article. This chapter briefly describes the requirements and contents of these products and ends with instructions on how to use an RMarkdown template for writing the article. See the grading section below for more details on the grading of these products.

1.3 Project Schedule

The aim is to keep to the below schedule during this course. Use the first two weeks to see if you need to focus more on one of the points below (depending on your data set) and discuss changes to the planning with your teacher.

Find a public experiment of interest [week one]
- Using online resources (sections 2.1, 2.2)
Data Gathering and Literature Research [week two]
- Make a final project choice
- Retrieve the accompanying publication
- Starting with Exploratory Data Analysis (chapter 3)
Data Analysis [weeks three and four]
- Finalizing Exploratory Data Analysis
- Discovering Differentially Expressed Genes (chapter 4)
- Data Annotation
- Techniques used: R with bioconductor, the EdgeR and/ or DESeq2 packages, RMarkdown
Result Analysis [week five and six]
- Analyzing and Visualizing your results (chapter 5)
- Techniques used: clustering, pathway analysis, gene-enrichment analysis
Creating an Interactive Shiny Dashboard [week six and seven]
- Techniques used: Shiny, shinydashboard
- Note: final techniques will be presented later
Finalizing analysis and start writing the final report (article) [weeks seven and eigth]

Further reading material supplied are in the appendices:

A1: Batch Loading Expression Data in R
A2: Annotating an RNA-Seq Experiment

1.4 Learning Outcomes (LOs)

This module has the following learning outcomes:

You write a project proposal in which you explain the processing steps to be followed and the materials to be used in the context of selected research found in a relevant database.
You apply statistical analyses to transcriptomics data in relation to phenotypes and external factors and visually represent and interpret the results.
You link biological knowledge through annotation to the findings. For example, interpreting the effects of a transcriptomics experiment based on pathway and/or (gene-set) enrichment analysis results.
You can develop a basic interactive R Shiny dashboard displaying the relevant results from the downstream DEG-analysis within the biological context.
You describe the findings and key results of the analysis in article form according to the style of a relevant journal.

1.5 Grading

This project is assessed by taking the weighted average of the following deliverables:

Analysis, 7EC
- The written project proposal clearly describes the experimental setup, the techniques and tools used for processing the raw RNA-Seq sequencing reads into count matrices for further downstream analysis (LO 1).
- Proven methods have been researched and applied to the count data to identify the differentially expressed genes. The lab journal clearly specifies the decisions used to determine the performed comparison(s) and reflects on the acquired results (LOs 2, 3).
- Results of the DEG analysis are clearly presented using an interactive dashboard that – depending on the experimental setup and experiment – can be used to browse through taken steps and allows comparing between sample groups (LO 4).
Article, 3EC
- Results from multiple functional analysis steps (e.g. clustering, pathway and enrichment analysis) are related to the context of the experiment (LO 5).

1.6 Lab Journal

As you know from previous projects and most likely from working in the laboratory, it is essential to keep a proper lab journal detailing every step you have done during the experiment. Here, the log is to be kept in an R markdown file, showing which steps have been taken in the analysis of the data set. This markdown should be knitted into a single PDF-file once the project is completed thus containing text detailling the steps and any decisions you’ve made, R-code (always visible!) and their resulting output/ images.

Notes: As a general advice; do not wait with knitting this whole document until the project is done as knitting is very prone to errors and trying to fix these in a large document is not easy. Give each code chunk the proper attributes, including a name at the minimum. This helps spot errors during knitting as that process mentiones which chunk has been processed. Note that chunk-names must be unique. And try to make proper use of chapters and sections and include an (optional) table of contents.

1.7 Dashboard

The dashboard is an interactive web application that allows you to visualize the results of your analysis. It should be user-friendly and provide clear insights into the data. The dashboard should include the following features:

A summary of the dataset, including the number of samples, genes, and any relevant metadata.
Interactive visualizations of the data, such as heatmaps, PCA plots, and volcano plots.
The ability to filter and subset the data based on user-defined criteria.
A section for displaying the results of the differential expression analysis, including tables of DEGs and their associated statistics.

The techniques used will be further explained during lectures.

1.8 Article

The final report is written in article form which is a bit different from a usual report, mainly in its size. The article-report has a maximum number of pages of 4 including all images and references (no appendices!). Contents for this article should be extracted from the lab journal combined with part introduction and part conclusion/ discussion. The sections below describe a template that is available for writing this report and example report(s) will be made available for inspiration. Refer to this chapter again once you start writing the report.

1.8.1 Installing the Article Template

The templates are available in an R package and contains both RMarkdown and (another layer on top of the actual markup language, yes, another language…) files. RStudio can use templates for a number of documents, including article-templates. These templates can be installed from a package called rticles by running the install.packages function (note: might already be installed):

install.packages("rticles", type = "source")

1.8.2 Using the Template

Now that you have the templates package (rticles) installed you can download and use the template project (ZIP-file) available from here. Download this file to your project folder, extract its contents and open the report-template.Rmd file contained within the folder in RStudio. Verify that everything is setup correctly by hitting the Knit button at the top, this should create a PDF version of the report. Note that - somehow - the resulting PDF file is named RJwrapper.pdf instead of the expected report-template.pdf. If everything checks out OK you can rename the file to your liking and start editing.

This template is based on the R Journal Submission template that you can also find in RStudio in the New file -> R Markdown -> From Template menu. Articles published in the R-journal are based on this template which you can browse for inspiration at the R Journal Website.

The available template shows an example of segments/ chapters and briefly describes what each section could or should contain. If you want to write your report in the Dutch language, you can create a new file from the template and change the segment names and content to Dutch. There will however still be some headings added by the template in English which is fine with me, but if you want to keep everything Dutch you can either edit the RJournal.sty file manually (not recommended) or start a new file by yourself and add some nice headings and page options.

The top part of the template (between the dashes ---) contains some settings that you need to change such as title, authors and abstract. Compare for instance a newly created article from this template with the one offered from the project-repository website.

Transcriptomics Project - Gene Expression Analysis