Publications by sindri

Using the xlsx package to create an Excel file

17.06.2017

Microsoft Excel is perhaps the most popular data anlysis tool out there. While arguably convenient, spreadsheet software is error prone and Excel code can be very hard to review and test. After successfully completing this exercise set, you will be able to prepare a basic Excel document using just R (no need to touch Excel yourself), leaving behi...

2391 sym 4 img

Working with the xlsx package Exercises (part 2)

28.06.2017

This exercise set provides (further) practice in writing Excel documents using the xlsx package as well as importing and general data manipulation. Specifically we have loops in order for you to practice scaling. A previous exercise set focused on writing a simple sheet with the same package, see here. We will use a subset of commuting data from ...

2777 sym 4 img

Soccer data sparring: Scraping, merging and analyzing exercises

08.08.2017

While understanding and spending time improving specific techniques, and strengthening indvidual muscles is important, occasionally it is necessary to do some rounds of actual sparring to see your flow and spot weaknesses. This exercise sets forces you to use all that you have practiced: to scrape links, download data, regular expressions, merge ...

3258 sym 4 img

Answer probability questions with simulation

20.08.2017

Probability is at the heart of data science. Simulation is also commonly used in algorithms such as the bootstrap. After completing this exercise, you will have a slightly stronger intuition for probability and for writing your own simulation algorithms. Most of the problems in this set have an exact analytical solution, which is not the case fo...

4353 sym 4 img

Basics of data.table: Smooth data exploration

23.08.2017

The data.table package provides perhaps the fastest way for data wrangling in R. The syntax is concise and is made to resemble SQL. After studying the basics of data.table and finishing this exercise set successfully you will be able to start easing into using data.table for all your data manipulation needs. We will use data drawn from the 1980 U...

2743 sym 4 img

Beyond the basics of data.table: Smooth data exploration

05.09.2017

This exercise set provides practice using the fast and concise data.table package. If you are new to the syntax it is recommended that you start by solving the set on the basics of data.table before attempting this one. We will use data on used cars (Toyota Corollas) on sale during 2004 in the Netherlands. There are 1436 observations with informa...

2423 sym 4 img

Answer probability questions with simulation (part-2)

20.09.2017

This is the second exercise set on answering probability questions with simulation. Finishing the first exercise set is not a prerequisite. The difficulty level is about the same – thus if you are looking for a challenge aim at writing up faster more elegant algorithms. As always, it pays off to read the instructions carefully and think about w...

3017 sym 4 img

Loops in R – Exercises

30.03.2018

Using loops is generally discouraged in R when it is possible to avoid them using vectorized alternatives. Vectorized solution are be both faster to write, read and execute – except sometimes they aren’t and the definition of vectorization isn’t always straightforward. In any event, solutions using loops can be: The fastest to prototype Th...

3208 sym R (150 sym/1 pcs) 2 img

K-Means Clustering in R – Exercises

13.04.2018

K-means is efficient, and perhaps, the most popular clustering method. It is a way for finding natural groups in otherwise unlabeled data. You specify the number of clusters you want defined and the algorithm minimizes the total within-cluster variance. In this exercise, we will play around with the base R inbuilt k-means function on some labele...

3198 sym 4 img

Well-Behaved Functions – Exercises

26.04.2018

It is said that, in R, everything that happens is a function call. So, if we want to improve our ability to make things happen the way we want them to, maybe it’s worth getting very comfortable with how functions work in R. In this exercise set, we’ll try to gain better fluency and deepen our understanding of the R logic by (mostly) writing ...

3103 sym 2 img