Publications by Susanna Wong

DATA 607 Project 3

18.03.2023

Introduction The goal of the project is to use data to answer the question, “Which are the most valued data science skills?”. We web scrap a major job search engine (ex: Indeed, Linkedin, Glassdoor, etc) for job postings. Create a large csv file that contains the job title, job URL, company name, job salary, and job description. Then, we will c...

1542 sym R (13276 sym/15 pcs) 4 img

DATA 607 Project 3 Tidying

15.03.2023

Tidy up Linkedin Dataset Load the Linkedin csv file into R. Below are some things we need to tidy: Remove the leading and trailing white spaces Split the ‘Job_location’ column into two columns: city and state Extract the salary Matrix of raw file There leading and trailing white spaces in several columns. They are indicated by ‘’. Linkedi...

3172 sym R (26527 sym/22 pcs)

Project 3 Tidying Version 1

15.03.2023

Tidy up Linkedin Dataset Load the Linkedin csv file into R. Below are some things we need to tidy: Remove the leading and trailing white spaces Split the ‘Job_location’ column into two columns: city and state Extract the salary Matrix of raw file The leading and trailing white spaces Linkedin <- read.csv('https://raw.githubusercontent.com/sus...

2105 sym R (22895 sym/24 pcs)

Project 3 Tidying Version 2

15.03.2023

Tidy up Linkedin Dataset Load the Linkedin csv file into R. Below are some things we need to tidy: Remove the leading and trailing white spaces Split the ‘Job_location’ column into two columns: city and state Extract the salary Matrix of raw file The leading and trailing white spaces Linkedin <- read.csv('https://raw.githubusercontent.com/sus...

2109 sym R (67603 sym/28 pcs)

DATA 607 Project 3 Part 1

13.03.2023

Project 3 Part 1 Prompt Create a short document, with the names of group members. You should briefly describe your collaboration tool(s) you’ll use as a group, including for communication, code sharing, and project documentation. You should have identified your data sources, where the data can be found, and how to load it. And you should have cre...

3691 sym 1 img

DATA 607 Assignment 5

13.03.2023

Introduction Assignment Prompt Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Take the information that you’ve selected about these three books, and separately cre...

2575 sym R (3115 sym/14 pcs)

DATA 607 Israel Vaccination Extra Credit

08.03.2023

Introduction Load CSV The raw data is stored here. The data is imported into R. library(tidyr) library(dplyr) library(DT) raw_data <- read.csv('https://raw.githubusercontent.com/suswong/DATA-607-Extra-Credit/main/Israeli%20Vaccination%20Data%20Extra%20Credit.csv') datatable(raw_data) Tidy Data The first row contains part of the header. We need t...

4156 sym R (4252 sym/23 pcs)

Data 607 Project 2 Dataset 2

06.03.2023

Introduction For Project 2 Dataset 2, I chose to analyze “MTA Daily Ridership” provided by John. As suggest by John in the Week 5 discussion forum, we should compare the MTA daily riderships during Covid and prior to Covid 19. Load CSV file into R library(dplyr) #library(tidyverse) library(tidyr) library(DT) raw_data <- read.csv('https://raw.g...

5357 sym R (6607 sym/32 pcs) 8 img

Data 607 Project 2 Dataset 1

06.03.2023

Introduction For Project 2 Dataset 1, I chose to analyze “Bob Ross Painting” provided by Taha. As suggest by Taha in the Week 5 discussion forum, we should find which color did Bob Ross used most often. Load CSV into R Load the csv file into R. library(tidyr) library(dplyr) library(DT) raw_data <- read.csv('https://raw.githubusercontent.com/jw...

1681 sym R (2313 sym/11 pcs) 1 img

Data 607 Project 2 Dataset 3

06.03.2023

Load CSV into R Load the csv file into R. library(tidyr) library(dplyr) library(DT) raw_data <- read.csv('https://raw.githubusercontent.com/suswong/DATA-607-Project-2/main/Gasoline_Retail_Prices_Weekly_Average_by_Region__Beginning_2007.csv') raw_data$Date <- as.Date(raw_data$Date, format = "%m/%d/%Y") datatable(raw_data) Ti...

557 sym R (2534 sym/11 pcs) 1 img