Publications by r on Tony ElHabr

Fuzzy Matching with Texas High School Academic Competition Results and SAT/ACT Scores

03.08.2018

Introduction As a follow-up to a previous post about correlations between Texas high school academic UIL competition scores and SAT/ACT scores, I wanted explore some of the “alternatives” to joining the two data sets—which come from different sources. In that post, I simply perform a an inner_join() using the school and city names as keys. ...

2179 sym R (1269 sym/4 pcs) 4 img 5 tbl

The Split-Apply-Combine Technique for Machine Learning with R

04.08.2018

Introduction Much discussion in the R community has revolved around the proper way to implement the “split-apply-combine”. In particular, I love the exploration of this topic in this blog post. It seems that the “preferred” approach is dplyr::group_by() + tidyr::nest() for splitting, dplyr::mutate() + purrr::map() for applying, and tidyr:...

10353 sym R (10794 sym/12 pcs) 4 img

Converting nested JSON to a tidy data frame with R

19.10.2018

In this “how-to” post, I want to detail an approach that others may find useful for converting nested (nasty!) json to a tidy (nice!) data.frame/tibble that is should be much easier to work with. 1 For this demonstration, I’ll start out by scraping National Football League (NFL) 2018 regular season week 1 score data from ESPN, which involve...

6820 sym R (13392 sym/13 pcs)

Re-creating a Voronoi-Style Map with R

21.12.2018

Introduction I’ve written some “tutorial”-like content recently—see here, here, and here—but I’ve been lacking on ideas for “original” content since then. With that said, I thought it would to try to re-create something with R. (Not too long ago I saw that Andrew Heiss did something akin to this with Charles Minard’s well-known ...

7572 sym 8 img

A Newbie’s Guide to Making A Pull Request (for an R package)

19.01.2019

I had the wonderful opportunity to participate in the {tidyverse} Developer Day the day after rstudio::conf2019 officially wrapped up. 1 One of the objectives of the event was to encourage open-source contributor newbies (like me ????) to gain some experience, namely through submitting pull requests to address issues with {tidyverse} packages. H...

11294 sym R (339 sym/2 pcs) 4 img

Summarizing rstudio::conf 2019 Summaries with Tidy Text Techniques

26.01.2019

To be honest, I planned on writing a review of this past weekend’s rstudio::conf 2019, but several other people have already done a great job of doing that—just check out Karl Broman’s aggregation of reviews at the bottom of the page here! (More on this in a second.) In short, my thoughts on the whole experience are captured perfectly by Ni...

2793 sym R (10019 sym/1 pcs) 4 img

Text Parsing and Text Analysis of a Periodic Report (with R)

28.06.2019

Some Context Those of you non-academia folk who work in industry (like me) are probably conscious of any/all periodic reports that an independent entity publishes for your company’s industry. For example, in the insurance industry in the United States, the Federal Insurance Office of the U.S. Department of the Treasury publishes several reports...

16597 sym 24 img 2 tbl

Making a Cheat Sheet with Rmarkdown

06.07.2019

Unfortunately, I haven’t had as much time to make blog posts in the past year or so. I started taking classes as part of Georgia Tech’s Online Master of Science in Analytics (OMSA) program last summer (2018) while continuing to work full-time, so extra time to code and write hasn’t been abundant for me. Anyways, I figured I would share one ...

7811 sym R (199 sym/2 pcs) 6 img

Generating a Gallery of Visualizations for a Static Website (using R)

19.07.2019

While I was browsing the website of fellow R blogger Ryo Nakagawara1, I was intrigued by his “Visualizations” page. The concept of creating an online “portfolio” is not novel 2, but I hadn’t thought to make one as a compilation of my own work (from blog posts)… until now ????. The code that follows shows how I generated the body of my...

2458 sym R (9331 sym/1 pcs)

A Bayesian Approach to Ranking English Premier League Teams (using R)

28.12.2019

As I mentioned back in July, I haven’t had as much time (since summer of 2018) to write due to taking classes in pursuit of a degree from Georgia Tech’s Online Master of Science in Analytics (OMSA) program. On the other hand, the classes have given me some ideas for future content. And, in the case of the Bayesian Statistics class that I took...

10878 sym R (13877 sym/2 pcs) 12 img