Publications by Shirin's playgRound

Characterizing Twitter followers with tidytext

27.06.2017

Lately, I have been more and more taken with tidy principles of data analysis. They are elegant and make analyses clearer and easier to comprehend. Following the tidyverse and ggraph, I have been quite intrigued by applying tidy principles to text analysis with Julia Silge and David Robinson’s tidytext. In this post, I will explore tidytext wit...

5838 sym R (13832 sym/41 pcs) 26 img

How to do Optical Character Recognition (OCR) of non-English documents in R using Tesseract?

16.07.2017

One of the many great packages of rOpenSci has implemented the open source engine Tesseract. Optical character recognition (OCR) is used to digitize written or typed documents, i.e. photos or scans of text documents are “translated” into a digital text on your computer. While this might seem like a trivial task at first glance, because it is ...

7400 sym R (7334 sym/17 pcs) 2 img

Social Network Analysis and Topic Modeling of codecentric’s Twitter friends and followers

27.07.2017

I have written the following post about Social Network Analysis and Topic Modeling of codecentric’€™s Twitter friends and followers for codecentric’s blog: Recently, Matthias Radtke has written a very nice blog post on Topic Modeling of the codecentric Blog Articles, where he is giving a comprehensive introduction to Topic Modeling. In thi...

1905 sym 3 img

Migrating from GitHub to GitLab with RStudio (Tutorial)

03.09.2017

GitHub vs. GitLab Git is a distributed implementation of version control. Many people have written very eloquently about why it is a good idea to use version control, not only if you collaborate in a team but also if you work on your own; one example is this article from RStudio’s Support pages. In short, its main feature is that version contro...

5159 sym Python (576 sym/6 pcs) 10 img

Data Science for Fraud Detection

05.09.2017

I have written the following post about Data Science for Fraud Detection at my company codecentric’s blog: Fraud can be defined as “the crime of getting money by deceiving people” (Cambridge Dictionary); it is as old as humanity: whenever two parties exchange goods or conduct business there is the potential for one party scamming the other....

1690 sym 2 img

Moving my blog to blogdown

13.09.2017

It’s been a long time coming but I finally moved my blog from Jekyll/Bootstrap on Github pages to blogdown, Hugo and Netlify! Moreover, I also now have my own domain name www.shirin-glander.de. 🙂 I followed the blogdown ebook to set up my blog. I chose Thibaud Leprêtre’s tranquilpeak theme. It looks much more polished than my old blog. My...

980 sym

Why I use R for Data Science – An Ode to R

18.09.2017

I have written a blog post about why I love R and prefer it to other languages. The post is on my new site, but since it isn’t on R-bloggers yet I am also posting the link here: Working in Data Science, I often feel like I have to justify using R over Python. And while I do use Python for running scripts in production, I am much more comfortab...

1368 sym

From Biology to Industry. A Blogger’s Journey to Data Science.

19.09.2017

Today, I have given a webinar for the Applied Epidemiology Didactic of the University of Wisconsin – Madison titled “From Biology to Industry. A Blogger’s Journey to Data Science.” I talked about how blogging about R and Data Science helped me become a Data Scientist. I also gave a short introduction to Machine Learning, Big Data and Neur...

885 sym 2 img

Blockchain & distributed ML – my report from the data2day conference

27.09.2017

Yesterday and today I attended the data2day, a conference about Big Data, Machine Learning and Data Science in Heidelberg, Germany. Topics and workshops covered a range of topics surrounding (big) data analysis and Machine Learning, like Deep Learning, Reinforcement Learning, TensorFlow applications, etc. Distributed systems and scalability were ...

8610 sym 2 img

Explore Predictive Maintenance with flexdashboard

01.11.2017

I have written the following post about Predictive Maintenance and flexdashboard at my company codecentric’s blog: Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below). A common use-case for Pr...

1718 sym 2 img