Publications by Susanna Wong

DATA 607 Extra Credit - Global Baseline Estimate

26.04.2023

Assignment Prompt Using the information you collected on movie ratings, implement a Global Baseline Estimate recommendation system in R. Most recommender systems use personalized algorithms like “content management” and “item-item collaborative filtering.” Sometimes non-personalized recommenders are also useful or necessary. One of the best...

3371 sym R (3241 sym/13 pcs)

DATA 607 Recommender System

20.04.2023

Assignment Prompt Your task is to analyze an existing recommender system that you find interesting. You should: Perform a Scenario Design analysis as described below. Consider whether it makes sense for your selected recommender system to perform scenario design twice, once for the organization (e.g. Amazon.com) and once for the organization’s ...

5496 sym

DATA 607 Tidyverse Create

15.04.2023

Assignment Prompt In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions. GitHub repository: https://github.com/acatlin/SPRING2023TIDYVERSE FiveThirtyEight.com datasets. Kaggle datasets. Your task here is to Cr...

5827 sym R (4398 sym/15 pcs) 7 img 3 tbl

DATA 607

02.04.2023

Assignment Prompt In Text Mining with R, Chapter 2 looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways: Work with a different corpus of your choosing...

4105 sym R (10385 sym/48 pcs) 5 img 2 tbl

DATA 607 NYT API Assignment

27.03.2023

Assignment Prompt The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame. Retreive Data from A...

2448 sym R (77912 sym/22 pcs) 1 img

DATA 607 Extra Credit API Novel Prize

22.03.2023

Extra Credit Prompt Working with the two JSON files available through the API at nobelprize.org, ask and answer 4 interesting questions, e.g. “Which country”lost” the most nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?” Retreive Data from API Prize Data library(jsonlite) library(...

3376 sym R (8934 sym/29 pcs) 5 img

DATA 607 Project 3 Acquiring Dataset

20.03.2023

Introduction There was a concerning amount of duplicates in our initial dataset (‘Raw_Data_Linkedin1’). Throughout the week, I used Octoparse to web scrap job postings in the United States, using different keywords: “data”, “data analyst”, “data scientist”, and “data engineering”. Then, I combined all the files together, and rem...

1515 sym R (1731 sym/6 pcs)

DATA 607 Project 3 Tidying Part 2

20.03.2023

Introduction The tidying part of our project 3 was split into two parts. In Project 3 Tidying Part 1 , we did the following: Remove leading and trailing white spaces for multiple columns Split the Job location column into two columns: city and state Fill in the missing values in the state column. Remove duplicated job posting. This part was parti...

4011 sym R (5215 sym/17 pcs)

DATA 607 Project 3 Tidying Part 1

20.03.2023

Introduction The goal of the project is to use data to answer the question, “Which are the most valued data science skills?”. The most valued data science skills are the skills that appear most often on job postings. Using Octoparse and Parsehub, we acquired job postings from LinkedIn. Project 3 Acquiring the data The tidying part of our proj...

2986 sym R (23983 sym/9 pcs)

DATA 607 Project 3 Acquiring Dataset

19.03.2023

Introduction There was a concerning amount of duplicates in our initial dataset (‘Raw_Data_Linkedin1’). Throughout the week, I used Octoparse to web scrap job postings in the United States, using different keywords: “data”, “data analyst”, “data scientist”, and “data engineering”. Then, I combined all the files together, and rem...

2116 sym R (2713 sym/10 pcs)