Publications by Shirin's playgRound

exprAnalysis package

27.09.2016

I created the R package exprAnalysis designed to streamline my RNA-seq data analysis pipeline. Below you find the vignette for installation and usage of the package. This package combines functions from various packages used to analyze and visualize expression data from NGS or expression chips. It supports normalized input as e.g. from Cufflink...

16894 sym R (20934 sym/50 pcs) 14 img 1 tbl

DESeq2 Course Work

28.09.2016

The following workflow has been designed as teaching instructions for an introductory course to RNA-seq data analysis with DESeq2. The course is designed for PhD students and will be given at the University of Münster from 10th to 21st of October 2016. For questions or other comments, please contact me. Go to exprAnalysis or this post for insta...

5077 sym R (11765 sym/56 pcs) 32 img

USA/ Canada Roadtrip 2016

15.10.2016

Mapping GPS data from our USA/ Canada Roadtrip This September we went on a roadtrip to the US and Canada. Of course, we had our trusty GPS to guide us along the way – the data from which I downloaded and used to play around with. (If you want to see a few photos from the trip and don’t care about the rest, skip to the bottom…) Loading the...

3507 sym R (22655 sym/23 pcs) 82 img

Exploring the human genome (Part 1) – Gene Annotations

22.10.2016

When working with any type of genome data, we often look for annotation information about genes, e.g. what’s the gene’s full name, what’s its abbreviated symbol, what ID it has in other databases, what functions have been described, how many and which transcripts exist, etc. However, when looking for this information we (luckily) find a num...

6488 sym R (12720 sym/16 pcs) 14 img

Exploring the human genome (Part 2) – Transcripts

31.10.2016

How many transcripts and proteins do genes have? In Exploring the human genome (Part 1) – Gene Annotations I examined Ensembl, Entrez and HGNC gene annotations with AnnotationDbi via three R packages: org.Hs.eg.db, EnsDb.Hsapiens.v79 and TxDb.Hsapiens.UCSC.hg38.knownGene. Now, I want to know how many transcripts there are for genes in these dat...

8834 sym R (29982 sym/58 pcs) 18 img

Is ‘Yeah’ Josh and Chuck’s favorite word?

05.11.2016

Text mining and sentiment analysis of a Stuff You Should Know Podcast Stuff You Should Know (or SYSK) is one of the many great podcasts from How Stuff Works. The two SYSK hosts Josh and Chuck have taught me so many fascinating things over the years, and today I want to use one of their podcasts to learn a little bit about text analysis in R. Ini...

11309 sym R (24652 sym/57 pcs) 18 img

Creating a Gilmore Girls character network with R

12.11.2016

With the impending (and by many – including me – much awaited) Gilmore Girls Revival, I wanted to take a somewhat different look at our beloved characters from Stars Hollow. I had recently read a few cool examples of how to create co-occurrence networks and wanted to combine this with an analysis similar to what David Robinson did for Love Ac...

6970 sym R (18579 sym/21 pcs) 8 img

Analysing the Gilmore Girls’ coffee addiction with R

21.11.2016

Last week’s post showed how to create a Gilmore Girls character network. In this week’s short post, I want to explore the Gilmore Girls’ famous coffee addiction by analysing the same episode transcripts that were also used last week. I am also showcasing how to use the recently updated ggplot2 2.2.0. The transcripts were prepared as describ...

1523 sym R (10551 sym/12 pcs) 6 img

Can we predict flu deaths with Machine Learning and R?

26.11.2016

Among the many R packages, there is the outbreaks package. It contains datasets on epidemics, on of which is from the 2013 outbreak of influenza A H7N9 in China, as analysed by Kucharski et al. (2014): A. Kucharski, H. Mills, A. Pinsent, C. Fraser, M. Van Kerkhove, C. A. Donnelly, and S. Riley. 2014. Distinguishing between reservoir exposure and...

10489 sym R (50934 sym/66 pcs) 20 img

Extreme Gradient Boosting and Preprocessing in Machine Learning – Addendum to predicting flu outcome with R

01.12.2016

In last week’s post I explored whether machine learning models can be applied to predict flu deaths from the 2013 outbreak of influenza A H7N9 in China. There, I compared random forests, elastic-net regularized generalized linear models, k-nearest neighbors, penalized discriminant analysis, stabilized linear discriminant analysis, nearest shrun...

5413 sym R (41365 sym/69 pcs) 6 img