Publications by r on Tony ElHabr

(Yet Another) Migration to Blogdown Post

24.11.2017

As of today, I’ve officially made the jump to using the R package blogdown (which uses the Hugo static-site generator under the hood) for my personal website. Previously, I had been using WordPress for my blogging purposes. In sync with the change in platform, I’m changing the name of this site from “Number Sense” (www.numbersense.org) to...

1682 sym

Visualizing an NBA Team’s Schedule Using R

25.11.2017

If you’re not completely new to the data science community (specifically, the #rstats community), then you’ve probably seen a version of the “famous” data science workflow diagram. 1 If one is fairly familiar with a certain topic, then one might not spend much time with the initial “visualize” step of the workflow. Such is the case w...

1869 sym R (1821 sym/1 pcs) 6 img

Personal Coding Conventions

09.02.2018

As a person who’s worked with various programming languages over time, I have become interested in the nuances and overlaps among languages. In particular, concepts related to code syntax and organization–everything from technical concepts such as lexical scoping, to more broad concepts such as importing and naming data–really fascinate me....

16012 sym R (1190 sym/8 pcs)

A Tidy Text Analysis of My Google Search History

15.02.2018

While brainstorming about cool ways to practice text mining with R I came up with the idea of exploring my own Google search history. Then, after googling (ironically) if anyone had done something like this, I stumbled upon Lisa Charlotte’s blog post. Lisa’s post (actually, a series of posts) are from a while back, so her instructions for how...

10681 sym R (15869 sym/18 pcs) 20 img

Dealing with Interval Data and the nycflights13 package using R

16.02.2018

In my job, I often work with data sampled at regular intervals. Samples may range from 5-minute intervals to daily intervals, depending on the specific task. While working with this kind of data is straightforward when its in a database (and I can use SQL), I have been in a couple of situations where the data is spread across .csv files. In these...

4245 sym R (4000 sym/5 pcs) 4 img

Dealing with Interval Data and the nycflights13 package using R, Part 2

18.02.2018

In this post, I’ll continue my discussion of working with regularly sampled interval data using R. (See my previous post for some insight regarding minute data.) The discussion here is focused more so on function design. Daily Data When I’ve worked with daily data, I’ve found that the .csv files tend to be much larger than those for data sa...

11197 sym R (7781 sym/7 pcs) 2 img

A Tidy Text Analysis of R Weekly Posts

04.03.2018

I’m always intrigued by data science “meta” analyses or programming/data-science. For example, Matt Dancho’s analysis of renown data scientist David Robinson. David Robinson himself has done some good ones, such as his blog posts for Stack Overflow highlighting the growth of “incredible” growth of python, and the “impressive” grow...

7812 sym R (7354 sym/9 pcs) 14 img

NBA Team Twitter Analysis Flexdashboard

10.03.2018

I just wrapped up a mini-project that allowed me to do a handful of things I’ve been meaning to do: Try out the {flexdashboard} package, which is supposed to be good for prototypying larger dashboards (perhaps created with {shinydashboard}. Test out my (mostly completed) personal {tetext} package for quick and tidy text analysis. (It implement...

1071 sym

Analyzing Professional Sports Team Colors with R

30.03.2018

When working with the ggplot2 package, I often find myself playing around with colors for longer than I probably should be. I think that this is because I know that the right color scheme can greatly enhance the information that a plot portrays; and, conversely, choosing an uncomplimentary palette can suppress the message of an otherwise good vis...

9623 sym R (5668 sym/13 pcs) 14 img 8 tbl

Analyzing Professional Sports Team Colors with R, Part 2

31.03.2018

NOTE: This write-up picks up where the previous one left off. All of the session data is carried over. Color Similarity Now, I’d like to evaluate color similarity more closely. To help verify any quantitative deductions with some intuition, I’ll consider only a single league for this–the NBA, the league that I know the best. Because I’ll ...

7592 sym R (988 sym/3 pcs) 8 img 5 tbl