Publications by sindri
Using the xlsx package to create an Excel file
Microsoft Excel is perhaps the most popular data anlysis tool out there. While arguably convenient, spreadsheet software is error prone and Excel code can be very hard to review and test. After successfully completing this exercise set, you will be able to prepare a basic Excel document using just R (no need to touch Excel yourself), leaving behi...
2391 sym 4 img
Working with the xlsx package Exercises (part 2)
This exercise set provides (further) practice in writing Excel documents using the xlsx package as well as importing and general data manipulation. Specifically we have loops in order for you to practice scaling. A previous exercise set focused on writing a simple sheet with the same package, see here. We will use a subset of commuting data from ...
2777 sym 4 img
Soccer data sparring: Scraping, merging and analyzing exercises
While understanding and spending time improving specific techniques, and strengthening indvidual muscles is important, occasionally it is necessary to do some rounds of actual sparring to see your flow and spot weaknesses. This exercise sets forces you to use all that you have practiced: to scrape links, download data, regular expressions, merge ...
3258 sym 4 img
Answer probability questions with simulation
Probability is at the heart of data science. Simulation is also commonly used in algorithms such as the bootstrap. After completing this exercise, you will have a slightly stronger intuition for probability and for writing your own simulation algorithms. Most of the problems in this set have an exact analytical solution, which is not the case fo...
4353 sym 4 img
Basics of data.table: Smooth data exploration
The data.table package provides perhaps the fastest way for data wrangling in R. The syntax is concise and is made to resemble SQL. After studying the basics of data.table and finishing this exercise set successfully you will be able to start easing into using data.table for all your data manipulation needs. We will use data drawn from the 1980 U...
2743 sym 4 img
Beyond the basics of data.table: Smooth data exploration
This exercise set provides practice using the fast and concise data.table package. If you are new to the syntax it is recommended that you start by solving the set on the basics of data.table before attempting this one. We will use data on used cars (Toyota Corollas) on sale during 2004 in the Netherlands. There are 1436 observations with informa...
2423 sym 4 img
Answer probability questions with simulation (part-2)
This is the second exercise set on answering probability questions with simulation. Finishing the first exercise set is not a prerequisite. The difficulty level is about the same – thus if you are looking for a challenge aim at writing up faster more elegant algorithms. As always, it pays off to read the instructions carefully and think about w...
3017 sym 4 img
Loops in R – Exercises
Using loops is generally discouraged in R when it is possible to avoid them using vectorized alternatives. Vectorized solution are be both faster to write, read and execute – except sometimes they aren’t and the definition of vectorization isn’t always straightforward. In any event, solutions using loops can be: The fastest to prototype Th...
3208 sym R (150 sym/1 pcs) 2 img
K-Means Clustering in R – Exercises
K-means is efficient, and perhaps, the most popular clustering method. It is a way for finding natural groups in otherwise unlabeled data. You specify the number of clusters you want defined and the algorithm minimizes the total within-cluster variance. In this exercise, we will play around with the base R inbuilt k-means function on some labele...
3198 sym 4 img
Well-Behaved Functions – Exercises
It is said that, in R, everything that happens is a function call. So, if we want to improve our ability to make things happen the way we want them to, maybe it’s worth getting very comfortable with how functions work in R. In this exercise set, we’ll try to gain better fluency and deepen our understanding of the R logic by (mostly) writing ...
3103 sym 2 img