Publications by Benjamin Smith

RObservations #8- #TidyTuesday- Analyzing the Art Collections Dataset

19.01.2021

I’ve really been enjoying the weekly data sets that the R4DS community has been putting out every Tuesday as part of the Tidy Tuesday project. In my previous blog I had a blast exploring the “Coffee Ratings” data set and putting out a YouTube video to share the insights that I found from the data. In this blog we’re going to explore a mor...

12768 sym R (22346 sym/17 pcs) 14 img

Benstats Talks #1: A presentation on CNNS

24.02.2021

Its been a while since I last posted anything on my blog, so I thought I would share an update. Recently, I had the opportunity to present for my midterm examination a deep learning framework of my choosing. I chose to give a talk on (you guessed it!) convolutional neural networks (CNNs) and their application in the space of computer vision. For...

1867 sym 54 img

RObservations #9: The Hosmer-Lemeshow Test follows a Chi-Square distribution (water is wet)

21.04.2021

Introduction The Hosmer-Lemeshow test (HL test) is a goodness of fit test for binary classification models which tells how well data fits a given model. Specifically, the HL test calculates if the observed event rates match the expected event rates in population subgroups and could be used as a supporting diagnostic as to whether to accept or rej...

5126 sym R (1402 sym/4 pcs) 24 img

RObservations #10: An Analysis of the Donner-Reed Party with Logistic Regression

06.05.2021

Introduction The story of the Donner-Reed party is one of the more tragic stories of the American Pioneers. The story is of a group of families and individuals who migrated to California on a wagon train from Midwestern America. The group initially traveled via the Oregon Trail but then opted to take the Hastings Cutoff- a route which was never ...

7747 sym R (9049 sym/15 pcs) 33 img

RObservations #11 Within()- Base R’s Mutate() function

22.06.2021

Introduction As surprising as it may sound, many new R programmers are unaware of Base R syntax and the powerful functions that come with it- many of which do what functions from packages like dplyr and the rest of the tidyverse do. In this short blog post I am going to talk about the within() function and its synonmity of the dplyr package’s m...

3479 sym R (672 sym/5 pcs) 4 img 4 tbl

PythonMusings #6: dplyr in Python? First impressions of the siuba (小巴) module

14.09.2021

Whats great about Blogging and social media is that you get to learn so much if you use it right. After sharing my last blog on LinkedIn, Casper Crause told me about the siuba module created by Michael Chow from which allows Python users experience to using R’s powerful dplyr package (developed by Hadley Wickham) for data wrangling. What impres...

4258 sym R (3951 sym/10 pcs)

RObservations #12: Making a Candlestick plot with the ggplot2 and tidyquant packages

14.09.2021

Candlestick plots are something you see regularly when dealing with stocks. Whether you are an investor, an analyst or even an outsider, this type of chart always in interesting to look at. In this brief blog, I’m going to share a custom function I made for making a candlestick charts using ggplot2 and the tidyquant packages. Sure tidyquant has...

1387 sym R (1443 sym/3 pcs) 4 img

RObservations #13: Simulating FSAs in lieu of real postal code data.

30.09.2021

Introduction Source: The Toronto Star Often when scraping data, websites will ask a user to enter a postal code to get the locations near it. If you are interested in collecting data on locations in Canada for an entire Province or the entire province from a site, it might be hard to find a list of all postal codes or FSAs in Canada or in a given...

1504 sym R (730 sym/2 pcs) 2 img

RObservations #14: Comparing the Calculated Square Roots of Symmetric Postive Matrices

07.10.2021

Introduction “There is no unique definition of a square root of a matrix. While there are a few ways to calculate it in closed form, the results differ”. It was from these notes that I read in my multivariate statistics class that inspired me to explore the the geometric interpretations of varying results. In this blog, I am going to explore ...

2458 sym R (2461 sym/5 pcs) 26 img

RvsPython #6: LinkedIn has spoken!

17.10.2021

Introduction Over the past while with my time on LinkedIn, I got to have exposure to many people from many different lines of work. I also managed to have carved a space for myself there where I can post about Data Science topics and share my blogs along the way. There have always been posts and polls comparing R and Python as well as the subsequ...

4527 sym 14 img