Publications by Benjamin Smith
RObservations #32: Creating an Instant Answer Oracle with httr and Shiny
Introduction Knowing how to write API requests and handle their responses is a valuable skill that a developer, dataengineer or data analyst/scientist needs to know. In this short blog I share how its possible leverage DuckDuckGo’s instant answer API to create a oracle which can answer (some) of your questions using the httr package and Shiny. ...
2092 sym R (3179 sym/4 pcs) 6 img
RObservations #33: Merging Excel Spreadsheets with Base R and openxlsx
Introduction I was recently asked as part of a larger task to combine multiple sheets from an excel workbook into a into a single sheet. When approached about the problem I immediately was asked if I was going to use VBA to do it. While I know my way around VBA, since VBA does not have a native way to undo its operations I was uncomfortable with ...
4036 sym R (4066 sym/4 pcs) 12 img
RObservations #34: Using NLP with keras to understand market sentiment with LSTM networks
Introduction Natural Language Processing (NLP) is a powerful tool in the Machine Learning landscape that can (among other things) allow users to classify sentiment and predict text. Many of recent my blogs have been about data manipulation and data engineering, so I decided change things up to look into showing some applications of machine learni...
3310 sym R (2379 sym/3 pcs) 6 img
RObservations #35 : Predicting Rubik’s Cube Rotations with CNNs
Disclaimer: While working on this project on my local machine I noticed that the code was making my computer heat up. To avoid the risk of overheating my computer I opted to use a Kaggle notebook. As a bonus, I got to use some GPU computing which made training this model much faster than it would be on my machine! Feel free to run the code on you...
4149 sym R (3209 sym/5 pcs) 8 img 1 tbl
RvsPython #5.1: Making the Game even with Python’s Best Practices
Well, it turns out that my last blog that R was over 220 times faster than Python got a lot of (constructive) criticism saying that I wasn’t using “best practices” with Python, which was why my Python code was so slow. This is a totally acceptable critique; thus, I’ve decided to write a follow up and rewrite the code I used making a more ...
3481 sym R (3312 sym/5 pcs) 2 img
RObservations #4 Using Base R to Clean Data
A friend of mine had some data which was mixed with character strings and was interested in looking at the numeric data only; Because the data set was quite large, cleaning it manually wasn’t viable. Besides for being too great of a task to do manually – tampering with the raw data can be very dangerous if you don’t have a way to track the...
3867 sym R (935 sym/6 pcs)
RObservations #5.1 arrR! Exploring Data about Pirates with R
Introduction In light of starting my YouTube channel (shameless plug, I know) on working on a series of Exploring data with R I thought I would write some blogs about it and share my experiences. Often when I’m asked what my go-to language is for data science, my response usually sounds quite Pirate-y. With this in mind I looked around to see i...
6885 sym R (3155 sym/6 pcs) 8 img
RObservations #6- #TidyTuesday – Analyzing data on the Australian Bush Fires
Since April 2018 the R4DS community has been putting out unique datasets as part of its “Tidy Tuesday” series, which are open to explore and to hone your skills as a data scientist. I know I’m quite late to the party, but now that I have some time I thought it would be a cool idea to explore one of these datasets myself. I decided to look a...
13264 sym R (15603 sym/24 pcs) 22 img
RObservations #7 – #TidyTuesday – Analysing Coffee Ratings Data
Introduction Around four years ago I was given a copy of Time Magazine’s specialty issue on Coffee together with a French press as a gift. At the time, I was satisfied with a regular instant cup of joe and did not know much about the vastness and culture of the industry. However, it was thanks to these gifts that I was able to learn a lot abou...
7735 sym R (15678 sym/9 pcs) 18 img
YouTube Channel Update: Coffee Ratings Analysis now up!
I decided to do something new on my Youtube Channel by putting my latest blog post on analyzing the Coffee Ratings dataset from the Tidy Tuesday project into video form. This video does mostly skip over the R code I used and focuses more on the actual analysis I did. I’m still very new to this and would love to hear some feedback on this video!...
765 sym