Publications by Susanna Wong
DATA 607 Recommender System
Assignment Prompt Your task is to analyze an existing recommender system that you find interesting. You should: Perform a Scenario Design analysis as described below. Consider whether it makes sense for your selected recommender system to perform scenario design twice, once for the organization (e.g. Amazon.com) and once for the organization’s ...
5496 sym
DATA 607 Tidyverse Create
Assignment Prompt In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions. GitHub repository: https://github.com/acatlin/SPRING2023TIDYVERSE FiveThirtyEight.com datasets. Kaggle datasets. Your task here is to Cr...
5827 sym R (4398 sym/15 pcs) 7 img 3 tbl
DATA 607
Assignment Prompt In Text Mining with R, Chapter 2 looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways: Work with a different corpus of your choosing...
4105 sym R (10385 sym/48 pcs) 5 img 2 tbl
DATA 607 NYT API Assignment
Assignment Prompt The New York Times web site provides a rich set of APIs, as described here: https://developer.nytimes.com/apis You’ll need to start by signing up for an API key. Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame. Retreive Data from A...
2448 sym R (77912 sym/22 pcs) 1 img
DATA 607 Extra Credit API Novel Prize
Extra Credit Prompt Working with the two JSON files available through the API at nobelprize.org, ask and answer 4 interesting questions, e.g. “Which country”lost” the most nobel laureates (who were born there but received their Nobel prize as a citizen of a different country)?” Retreive Data from API Prize Data library(jsonlite) library(...
3376 sym R (8934 sym/29 pcs) 5 img
DATA 607 Project 3 Acquiring Dataset
Introduction There was a concerning amount of duplicates in our initial dataset (‘Raw_Data_Linkedin1’). Throughout the week, I used Octoparse to web scrap job postings in the United States, using different keywords: “data”, “data analyst”, “data scientist”, and “data engineering”. Then, I combined all the files together, and rem...
1515 sym R (1731 sym/6 pcs)
DATA 607 Project 3 Tidying Part 2
Introduction The tidying part of our project 3 was split into two parts. In Project 3 Tidying Part 1 , we did the following: Remove leading and trailing white spaces for multiple columns Split the Job location column into two columns: city and state Fill in the missing values in the state column. Remove duplicated job posting. This part was parti...
4011 sym R (5215 sym/17 pcs)
DATA 607 Project 3 Tidying Part 1
Introduction The goal of the project is to use data to answer the question, “Which are the most valued data science skills?”. The most valued data science skills are the skills that appear most often on job postings. Using Octoparse and Parsehub, we acquired job postings from LinkedIn. Project 3 Acquiring the data The tidying part of our proj...
2986 sym R (23983 sym/9 pcs)
DATA 607 Project 3 Acquiring Dataset
Introduction There was a concerning amount of duplicates in our initial dataset (‘Raw_Data_Linkedin1’). Throughout the week, I used Octoparse to web scrap job postings in the United States, using different keywords: “data”, “data analyst”, “data scientist”, and “data engineering”. Then, I combined all the files together, and rem...
2116 sym R (2713 sym/10 pcs)
Data 607 Project 3 Tidying Part 1
Introduction The goal of the project is to use data to answer the question, “Which are the most valued data science skills?”. We web scrap a major job search engine (ex: Indeed, Linkedin, Glassdoor, etc) for job postings. Create a large csv file that contains the job title, job URL, company name, job salary, and job description. Then, we will c...
3871 sym R (28110 sym/22 pcs)