Publications by Philip Tanofsky
DATA 607 Final Project Presentation
May 13, 2020 Introduction Goal: Analyze individual player advanced statistics as a predictor of game success. Step 1: Evaluate player contribution impact on team’s single game success Step 2: Identify player advanced statistic to use as predictor of team’s game outcome Step 3: Attempt machine learning model based on player statistics and gam...
3381 sym R (721 sym/1 pcs) 3 img
DATA 607 Final Project
Introduction Individual player impact on game outcome Hypothesis: Load management (purposely resting a player for an entire game) does impact a team’s win-loss record. The success of a single game is determined by the players that play in the game. (This hypothesis is ignoring the overall season success of a team in which an organization may ch...
26393 sym R (51278 sym/122 pcs) 14 img
DATA 607 Tidyverse Extension Assignment
Introduction We will see some uses of the dplyr package by loading a data set of contestants on the Bachelorette season’s 11-15. library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff,...
2540 sym R (11472 sym/24 pcs)
DATA 607 Week 12 Assignment
Spotify’s Discover Weekly: Recommender System Spotify creates an individualized, algorithmically-curated weekly playlist for every user called Discover Weekly with the tagline “your weekly mixtape of fresh music. Enjoy new discoveries and deep cuts chosen just for you. Updated every Monday, so save your favorites!” The weekly playlist is co...
10568 sym
DATA 607 Week 10 Assignment
Introduction This assignment performs the sentiment analysis of the novel “The Variable Man” by science-fiction author Philip K. Dick primarily using the Syuzhet lexicon. The Afinn lexicon is also used for comparison to the Syuzhet lexicon. The novel is available from the Gutenberg Project. Libraries The following libraries are used for the ...
8904 sym R (11245 sym/76 pcs) 14 img
DATA 607 Tidyverse Create Assignment
Introduction Vignette for the popular stringr functions from the Tidyverse packages. The stringr library provides a suite of commonly used string manipulation functions to assist in data cleaning and data preparation tasks. The 8 most popular stringr verbs: detect: Identifies a match to the pattern count: Counts the number of instances of the pa...
2829 sym R (7577 sym/21 pcs)
DATA 607 Week 9 Assignment
Introduction The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis. Based on the example provided in the NY Times web site fro the movie reviews API, I have constructed an interface to read in the JSON data, and transform it into an R dataframe. NOTE: This R markdown file must be “Knit wi...
2277 sym R (10791 sym/9 pcs)
DATA 607 Project 3 Team MPSV - Very Rough Draft
Introduction Attempt to determine the top skills for a data scientist based on job listings for data scientist. Different text mining techniques are used to identify skills from the description section of the job listings. # Read in CSV file of the 10000 jobs listings for Data Scientist jobs_df <- read.csv(file = 'data_scientist_united_states_job...
5856 sym R (59095 sym/126 pcs) 15 img
DATA 607 Week 7 Assignment
Introduction This assignment presents the conversion of three common file types of data - XML, JSON, and HTML - into an R dataframe. The results are largely based on Google searches to find code examples and R documentation on the libraries capable of performing the different conversions. Each source file containing bibliographic descriptions of ...
3729 sym R (4282 sym/18 pcs)
DATA 607 Presentation: Decision Stump
March 11, 2020 Evaluating a Model Important to consider carefully what would be a reasonable baseline against which to compare model performance. Baselines approaches for classification tasks Majority classifier: Naive classifier that always chooses the majority class of the training dataset Decision Stump: Decision tree with only one internal ...
3155 sym R (560 sym/1 pcs) 1 img