Publications by r on Tony ElHabr
(Yet Another) Migration to Blogdown Post
As of today, I’ve officially made the jump to using the R package blogdown (which uses the Hugo static-site generator under the hood) for my personal website. Previously, I had been using WordPress for my blogging purposes. In sync with the change in platform, I’m changing the name of this site from “Number Sense” (www.numbersense.org) to...
1682 sym
Visualizing an NBA Team’s Schedule Using R
If you’re not completely new to the data science community (specifically, the #rstats community), then you’ve probably seen a version of the “famous” data science workflow diagram. 1 If one is fairly familiar with a certain topic, then one might not spend much time with the initial “visualize” step of the workflow. Such is the case w...
1869 sym R (1821 sym/1 pcs) 6 img
Personal Coding Conventions
As a person who’s worked with various programming languages over time, I have become interested in the nuances and overlaps among languages. In particular, concepts related to code syntax and organization–everything from technical concepts such as lexical scoping, to more broad concepts such as importing and naming data–really fascinate me....
16012 sym R (1190 sym/8 pcs)
A Tidy Text Analysis of My Google Search History
While brainstorming about cool ways to practice text mining with R I came up with the idea of exploring my own Google search history. Then, after googling (ironically) if anyone had done something like this, I stumbled upon Lisa Charlotte’s blog post. Lisa’s post (actually, a series of posts) are from a while back, so her instructions for how...
10681 sym R (15869 sym/18 pcs) 20 img
Dealing with Interval Data and the nycflights13 package using R
In my job, I often work with data sampled at regular intervals. Samples may range from 5-minute intervals to daily intervals, depending on the specific task. While working with this kind of data is straightforward when its in a database (and I can use SQL), I have been in a couple of situations where the data is spread across .csv files. In these...
4245 sym R (4000 sym/5 pcs) 4 img
Dealing with Interval Data and the nycflights13 package using R, Part 2
In this post, I’ll continue my discussion of working with regularly sampled interval data using R. (See my previous post for some insight regarding minute data.) The discussion here is focused more so on function design. Daily Data When I’ve worked with daily data, I’ve found that the .csv files tend to be much larger than those for data sa...
11197 sym R (7781 sym/7 pcs) 2 img
A Tidy Text Analysis of R Weekly Posts
I’m always intrigued by data science “meta” analyses or programming/data-science. For example, Matt Dancho’s analysis of renown data scientist David Robinson. David Robinson himself has done some good ones, such as his blog posts for Stack Overflow highlighting the growth of “incredible” growth of python, and the “impressive” grow...
7812 sym R (7354 sym/9 pcs) 14 img
NBA Team Twitter Analysis Flexdashboard
I just wrapped up a mini-project that allowed me to do a handful of things I’ve been meaning to do: Try out the {flexdashboard} package, which is supposed to be good for prototypying larger dashboards (perhaps created with {shinydashboard}. Test out my (mostly completed) personal {tetext} package for quick and tidy text analysis. (It implement...
1071 sym
Analyzing Professional Sports Team Colors with R
When working with the ggplot2 package, I often find myself playing around with colors for longer than I probably should be. I think that this is because I know that the right color scheme can greatly enhance the information that a plot portrays; and, conversely, choosing an uncomplimentary palette can suppress the message of an otherwise good vis...
9623 sym R (5668 sym/13 pcs) 14 img 8 tbl
Analyzing Professional Sports Team Colors with R, Part 2
NOTE: This write-up picks up where the previous one left off. All of the session data is carried over. Color Similarity Now, I’d like to evaluate color similarity more closely. To help verify any quantitative deductions with some intuition, I’ll consider only a single league for this–the NBA, the league that I know the best. Because I’ll ...
7592 sym R (988 sym/3 pcs) 8 img 5 tbl