Publications by Shirin's playgRound
Characterizing Twitter followers with tidytext
Lately, I have been more and more taken with tidy principles of data analysis. They are elegant and make analyses clearer and easier to comprehend. Following the tidyverse and ggraph, I have been quite intrigued by applying tidy principles to text analysis with Julia Silge and David Robinson’s tidytext. In this post, I will explore tidytext wit...
5838 sym R (13832 sym/41 pcs) 26 img
How to do Optical Character Recognition (OCR) of non-English documents in R using Tesseract?
One of the many great packages of rOpenSci has implemented the open source engine Tesseract. Optical character recognition (OCR) is used to digitize written or typed documents, i.e. photos or scans of text documents are “translated” into a digital text on your computer. While this might seem like a trivial task at first glance, because it is ...
7400 sym R (7334 sym/17 pcs) 2 img
Social Network Analysis and Topic Modeling of codecentric’s Twitter friends and followers
I have written the following post about Social Network Analysis and Topic Modeling of codecentric’s Twitter friends and followers for codecentric’s blog: Recently, Matthias Radtke has written a very nice blog post on Topic Modeling of the codecentric Blog Articles, where he is giving a comprehensive introduction to Topic Modeling. In thi...
1905 sym 3 img
Migrating from GitHub to GitLab with RStudio (Tutorial)
GitHub vs. GitLab Git is a distributed implementation of version control. Many people have written very eloquently about why it is a good idea to use version control, not only if you collaborate in a team but also if you work on your own; one example is this article from RStudio’s Support pages. In short, its main feature is that version contro...
5159 sym Python (576 sym/6 pcs) 10 img
Data Science for Fraud Detection
I have written the following post about Data Science for Fraud Detection at my company codecentric’s blog: Fraud can be defined as “the crime of getting money by deceiving people” (Cambridge Dictionary); it is as old as humanity: whenever two parties exchange goods or conduct business there is the potential for one party scamming the other....
1690 sym 2 img
Moving my blog to blogdown
It’s been a long time coming but I finally moved my blog from Jekyll/Bootstrap on Github pages to blogdown, Hugo and Netlify! Moreover, I also now have my own domain name www.shirin-glander.de. 🙂 I followed the blogdown ebook to set up my blog. I chose Thibaud Leprêtre’s tranquilpeak theme. It looks much more polished than my old blog. My...
980 sym
Why I use R for Data Science – An Ode to R
I have written a blog post about why I love R and prefer it to other languages. The post is on my new site, but since it isn’t on R-bloggers yet I am also posting the link here: Working in Data Science, I often feel like I have to justify using R over Python. And while I do use Python for running scripts in production, I am much more comfortab...
1368 sym
From Biology to Industry. A Blogger’s Journey to Data Science.
Today, I have given a webinar for the Applied Epidemiology Didactic of the University of Wisconsin – Madison titled “From Biology to Industry. A Blogger’s Journey to Data Science.” I talked about how blogging about R and Data Science helped me become a Data Scientist. I also gave a short introduction to Machine Learning, Big Data and Neur...
885 sym 2 img
Blockchain & distributed ML – my report from the data2day conference
Yesterday and today I attended the data2day, a conference about Big Data, Machine Learning and Data Science in Heidelberg, Germany. Topics and workshops covered a range of topics surrounding (big) data analysis and Machine Learning, like Deep Learning, Reinforcement Learning, TensorFlow applications, etc. Distributed systems and scalability were ...
8610 sym 2 img
Explore Predictive Maintenance with flexdashboard
I have written the following post about Predictive Maintenance and flexdashboard at my company codecentric’s blog: Predictive Maintenance is an increasingly popular strategy associated with Industry 4.0; it uses advanced analytics and machine learning to optimize machine costs and output (see Google Trends plot below). A common use-case for Pr...
1718 sym 2 img