Publications by Tom Buonora
Data607_Week1
New York, New York : A Tale of two Teams On July 24th, the talk of the baseball world was how the New York Mets were a big success while the Yankees were closer to last place than first Team Division Race Record Playoff Chances Mets 1st Place 51-43 63% Yankees 3rd Place 50-44 35% 40 days later, the Mets odds of a post season appearance plu...
1178 sym R (2301 sym/5 pcs) 1 img 5 tbl
Data607_Week3
library(stringr) # more consistent than base r Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS” # note: github wraps data in formatting, to get raw data link, click the "Raw" button at the top-left of the data data_file ...
1831 sym R (1789 sym/12 pcs)
Project 1 Elo Ratings
library(stringr) library(knitr) library(kableExtra) Elo scores and Expected Outcomes This project parses a particular chess crosstable and it translates the results into 2 metrics meant to measure each players performance in relation to their expected perfromance The expected performance is measured as a function of the Elo scores of each ...
2382 sym R (6139 sym/13 pcs) 1 img 2 tbl
Data607_Week6_Project2
Basic Data Extraction : NYC Parks Overview This project will explore the NYC park data from the “NYC Open Data” project and the question we will answer is what neighborhood has the most or best park lands. We will look at it by acreage vs. population. Lets begin. Imports. Constants. library(readxl) library(tidyverse) # ggplo...
2297 sym R (6673 sym/18 pcs) 5 img 5 tbl
Data607CreateTidyverseVignette
Examples of tidyverse Tom Buonora and … 2021-10-24 library(tidyverse) # ggplot2, dplyr, tidyr, readr, tibble, sringr and more Tidyverse Vignette readr : read_csv() read_csv is part of readr whereas read.csv is base R. Im not sure that read_csv is tidier than read.csv CURR_PATH<-str_trim(getwd()) # to do : use the kaggle api # htt...
1029 sym R (2939 sym/7 pcs) 1 img
Data607_Week10
Sentiment analysis library(wordcloud) library(reshape2) library(janeaustenr) library(tidytext) library(lexicon) Analysis of Jane Austin The Jane Austen analysis is reproduced courtesy of ORiely : Text Mining With R. Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States License Text Mining with R : Chapt...
4142 sym R (7519 sym/32 pcs) 5 img 5 tbl
Data607_Week11
Recommender System Amazon Music Unlimited Overview Amazon Music Unlimited is a service that charges a monthly fee of 10 dollars ( or 8 for Prime members ). Its home page consists of a main banner entitled My Likes and More and underneath that, 4 ribbons consisting of 7 ablums, which the user can scroll left and right to see more. Recommenda...
2930 sym 1 img
Tidyverse Extend
TidyverseCreate Source: Kagle data - Data Analysis Jobs, Based on NYC Jobs - October 2021 This dataset contains current job postings available on the City of New York’s official jobs site ( http://www.nyc.gov/html/careers/html/search/search.shtml ). Internal postings available to city employees and external postings available to the general pub...
3838 sym R (7192 sym/20 pcs) 2 img
Data607_Project4
Document Classification : Emails Overview The Ham or Spam classification problem is a common and ongoing pursuit in the academic and professional world. There are several approaches that one can explore, and there are many references available to review. For example, Mala Deep wrote an article on Towards Data Science, and he offers the follo...
4907 sym R (5756 sym/19 pcs) 5 tbl
Data605: Week 2
Transposes and LDU Factorization Commutativity of a A and T(A) Show that \[A^T \cdot A \ \neq \ A \cdot A^T\] The dot product of 2 matrices is an operation on the row of operand 1 and the column of operand 2 It will be shown that by flipping the operands the values within the respective row and column can change This function decomposes a...
2863 sym R (3700 sym/15 pcs)