Introduction

1.1 Capstone Project - Analysis of Gene Expression

The use of RNA-Sequencing techniques for measuring gene expression is relatively new and replaces microarrays, though in some cases microarrays are still used. Gene expression data gives valuable insights into the workings of cells in certain conditions. Especially when comparing for instance healthy and diseased samples it can become clear which genes are causal or under influence of a specific condition. Finding the genes of interest (genes showing differing expression accross conditions, called the Differentially Expressed Genes (DEGs)) is the goal of this project.

While there is no golden standard for analyzing RNA-sequencing datasets as there are many tools (all manufacturers of sequencing equiptment also deliver software packages) we will use R combined with proven libraries for processing, visualizing and analyzing publically available datasets. While in some cases you are allowed to use the actual raw (read) data that is available, it is highly recommended to use the pre-processed data which often is a table with a count value for each gene. This count is the number of reads that was mapped to that gene which corresponds to the relative number of transcripts (mRNA sequences) of that gene present in the cell at the time of sampling.

1.2 Project Deliverables

The end products of this course consist of two deliverables; a PDF output file from an RMarkdown ‘lab journal’ where you have logged all steps performed to get to the end result and a final report in the form of a short article. This chapter briefly describes the requirements and contents of these products and ends with instructions on how to use an RMarkdown template for writing the article.

1.3 Project Schedule

The aim is to keep to the below schedule during this course. Use the first two weeks to see if you need to focus more on one of the points below (depending on your dataset) and discuss changes to the planning with your teacher.

  • Find a public experiment of interest [week one]
    • Using online resources (sections 2.1, 2.2)
  • Data Gathering and Literature Research [week two]
    • Make a final project choice
    • Retrieve the accompanying publication
    • Starting with Exploratory Data Analysis (chapter EDA)
  • Data Analysis [weeks three, four and five]
    • Finalizing Exploratory Data Analysis
    • Discovering Differentially Expressed Genes (chapter 4)
    • Data Annotation (optional; appendix a2)
    • Techniques used: R with bioconductor, the EdgeR and/ or DESeq2 packages, RMarkdown
  • Result Analysis [week five and six]
    • Analyzing and Visualizing your results (chapter 5)
    • Techniques used: clustering, pathway analysis, gene-enrichment analysis
  • Finalizing analysis and start writing the final report (article) [weeks 7 and 8]

1.4 Grading

The final grade consists of a weighted average of the work done for the lab journal containing all performed steps, their code and outputs (70%) and the article report (30%). A grade >= 5.5 will give a total of 8 EC. The rest of this chapter explains the expected contents for the three graded products (each of these elements is explained in greater detail in other chapters).

1.5 Lab Journal

As you know from previous projects and most likely from working in the laboratory, it is essential to keep a proper lab journal detailing every step you have done during the experiment. Here, the log is to be kept in an R markdown file, showing which steps have been taken in the analysis of the data set. This markdown should be knitted into a single PDF-file once the project is completed thus containing text detailling the steps and any decisions you’ve made, R-code (always visible!) and their resulting output/ images.

Notes: As a general advice; do not wait with knitting this whole document until the project is done as knitting is very prone to errors and trying to fix these in a large document is not easy. Give each code chunk the proper attributes, including a name at the minimum. This helps spot errors during knitting as that process mentiones which chunk has been processed. Note that chunk-names must be unique. And try to make proper use of chapters and sections and include an (optional) table of contents.

1.6 Article

The final report is written in article form which is a bit different from a usual report, mainly in its size. The article-report has a maximum number of pages of 4 including all images and references (no appendices!). Contents for this article should be extracted from the lab journal combined with part introduction and part conclusion/ discussion. The sections below describe a template that is available for writing this report and example report(s) will be made available for inspiration. Refer to this chapter again once you start writing the report.

1.6.1 Installing the Article Template

The templates are available in an R package and contains both RMarkdown and (another layer on top of the actual markup language, yes, another language…) files. RStudio can use templates for a number of documents, including article-templates. These templates can be installed from a package called rticles by running the install.packages function (note: might already be installed):

install.packages("rticles", type = "source")

1.6.2 Using the Template

Now that you have the templates package (rticles) installed you can download and use the template project (ZIP-file) available from here. Download this file to your project folder, extract its contents and open the report-template.Rmd file contained within the folder in RStudio. Verify that everything is setup correctly by hitting the Knit button at the top, this should create a PDF version of the report. Note that - somehow - the resulting PDF file is named RJwrapper.pdf instead of the expected report-template.pdf. If everything checks out OK you can rename the file to your liking and start editing.

This template is based on the R Journal Submission template that you can also find in RStudio in the New file -> R Markdown -> From Template menu. Articles published in the R-journal are based on this template which you can browse for inspiration at the R Journal Website.

The available template shows an example of segments/ chapters and briefly describes what each section could or should contain. If you want to write your report in the Dutch language, you can create a new file from the template and change the segment names and content to Dutch. There will however still be some headings added by the template in English which is fine with me, but if you want to keep everything Dutch you can either edit the RJournal.sty file manually (not recommended) or start a new file by yourself and add some nice headings and page options.

The top part of the template (between the dashes ---) contains some settings that you need to change such as title, authors and abstract. Compare for instance a newly created article from this template with the one offered from the project-repository website.