Introduction

Course 2.1.2 involves two separate projects performed on a single, published, data set. First, we start by performing a Genomics analysis similar to the case study from module 2.1.1. Then, we take the result of the mapping step, process this into read counts and perform a subsequent Transcriptomics analysis.

1.1 2.1.2 Genomics

Tijdens het genomics project ga je op zoek naar variaties tussen de gevonden data set en een referentie genoom. Een belangrijk aspect hierbij is het beoordelen van de kwaliteit van de data (de ‘ruwe-reads’) alvorens een analyse pipeline op te zetten in een workflow-manager. Gevonden varianten annoteer en interpreteer je op biologisch effect.

For this Genomics analysis we are going to look at variants between the chosen data and a reference genome. One important aspect during the analysis is often assessing the quality, both from the input data set as well as the output of certain steps. Once a set of variants have been determined they can be annotated and finally linked to the biology underlying the original experiment, either reproducing their findings or aiming for novel ideas.

1.1.1 Project Deliverables

  • Genomics pipeline: The genomics analysis pipeline is well documented and contains the correct tools based on the research question and data set.
  • Presentation: The research results will be presented in an attractive method and proper level given the target audience.

1.2 2.1.2 Transcriptomics

The use of RNA-Sequencing techniques for measuring gene expression is relatively new and replaces microarrays, though in some cases microarrays are still used. Gene expression data gives valuable insights into the workings of cells in certain conditions. Especially when comparing for instance healthy and diseased samples it can become clear which genes are causal or under influence of a specific condition. Finding the genes of interest (genes showing differing expression accross conditions, called the Differentially Expressed Genes (DEGs)) is the goal of this project.

While there is no golden standard for analyzing RNA-sequencing data sets as there are many tools (all manufacturers of sequencing equipment also deliver software packages) we will use proven R libraries for processing, visualizing and analyzing publicly available data sets. This manual describes the steps often performed in a transcriptomics experiment. Use this as a very general guideline to process data accompanying the chosen published research. Note that this manual was originally written for processing count data and therefore does not describe the processing of NGS read files up to the count data as part of the Genomics analysis performed in this course. RNA-Seq counts describe the number of reads mapped to a gene which corresponds to the relative number of transcripts (mRNA sequences) of that gene present in the cell at the time of sampling.

1.2.1 Project Deliverables

  • Lab Journal: The lab journal is readable, complete and aimed at the reproducability of the research. Relevant visualisations are include and statistical methods performed to support the biological interpretation of the results.

1.3 Lab Journal

As you may know from previous projects, it is essential to keep a proper journal detailing every step you have done during the analysis. This journal is to be kept in an R markdown file, showing which steps have been taken in the analysis of the data set. This markdown should be knitted into a single PDF-file once the project is completed thus containing text detailing the steps and any decisions you’ve made, R-code (always visible!) and their resulting output/ images.

Notes: As a general advice; do not wait with knitting this whole document until the project is done as knitting is very prone to errors and trying to fix these in a large document is not easy. Give each code chunk the proper attributes, including a name at the minimum. This helps spot errors during knitting as that process mentiones which chunk has been processed. Note that chunk-names must be unique. And try to make proper use of chapters and sections and include an (optional) table of contents.

1.4 Schedule

The Genomics and Transcriptomics parts both have their own schedule as listed below

1.4.1 Genomics

Week Lesson Subject
1 1 Finding a suitable experiment
2 Statistics - recap and experimental setup
3 Writing a Project Proposal
2 1 Presenting Project Proposal/ NGS Quality Control
2
3
3 1 Read Mapping and QC
2 Variant Calling
3
4 1 Annotation
2
3
4
5 1
2

1.4.2 Transcriptomics

Week Lesson Subject
5 1 Writing a Project Proposal
2
6 1 Presenting Project Proposal
2 Read Mapping and Quantification
3
4 Statistics (distributions, normalization, PCA)
7 1 Exploratory Data Analysis (chapter 2)
2
3 Statistics (batch effects, regression, linear models)
4 Finding Differentially Expressed Genes (chapter 3)
5 Statistics (Enrichment analysis)
8 1 Data Analysis and Visualization (chapter 3)
2
3
4
5
6 Hand in Lab Journal

1.5 Literature

The following (online) documents can be used throughout this course: