Publications by Irene Jacob

Data606_Data Project

09.12.2020

library(DATA606) library(ggplot2) library(openintro) library(tidyverse) library(dplyr) library("plotly") library(fivethirtyeight) Part 1 - Introduction My Research question is: To understand if there is a link between the hate crimes reported in the 10 days after the 2016 election and the different areas of data that are available in the d...

2469 sym R (10852 sym/36 pcs) 15 img

Data606_Data Project Presentation

06.12.2020

Introduction My Research question is: To understand if there is a link between the hate crimes reported in the 10 days after the 2016 election and the different areas of data that are available in the dataset(variables). Type of study: This is an observational study. Data Source For the Kaggle data link click here The story that this data is base...

2371 sym R (6882 sym/15 pcs) 14 img

DATA 607_Final Project

03.12.2020

New York City Shooting Rates Purpose I will be focusing on 2 point in this project. They are as follows: 1. The location having the highest shooting rates. 2. Analyze the crime rate in each location. Data Source I have 2 data sources. Both are from NYCOpenData - https://opendata.cityofnewyork.us/. Historic data from 2006 to 2019. This dataset ...

3326 sym R (10154 sym/40 pcs) 10 img

DATA 607_Discussion_11

05.11.2020

Recommender Systems Goal Your task is to analyze an existing recommender system that you find interesting. You should: Perform a Scenario Design analysis as described below. Consider whether it makes sense for your selected recommender system to perform scenario design twice, once for the organization (e.g. Amazon.com) and once for the organiz...

3443 sym

Data606_Meetup Presentation

01.11.2020

Chapter 8 : INTRODUCTION TO LINEAR REGRESSION Exercise 8.23: The Coast Starlight, Part II. Exercise 8.11 introduces data on the Coast Starlight Amtrak train that runs from Seattle to Los Angeles. The mean travel time from one stop to the next on the Coast Starlight is 129 mins, with a standard deviation of 113 minutes. The mean distance traveled...

2264 sym R (341 sym/11 pcs)

DATA 607_Assignment_10

31.10.2020

Goal In Text Mining with R, Chapter 2 looks at Sentiment Analysis. In this assignment, you should start by getting the primary example code from chapter 2 working in an R Markdown document. You should provide a citation to this base code. You’re then asked to extend the code in two ways: Work with a different corpus of your choosing, and Incor...

1542 sym R (35595 sym/64 pcs) 6 img

Data606_Homework 8

31.10.2020

Nutrition at Starbucks, Part I. The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on its calorie conte...

5196 sym R (368 sym/12 pcs) 10 img

DATA 607_Project_4

14.11.2020

Goal It can be useful to be able to classify new “test” documents using already classified “training” documents. A common example is using a corpus of labeled spam and ham (non-spam) e-mails to predict whether or not a new document is spam. For this project, you can start with a spam/ham dataset, then predict the class of new documents (e...

1194 sym R (5193 sym/13 pcs) 2 img

Data606_Homework 9

29.11.2020

Multiple and Logistic Regression Baby weights, Part I. The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The ...

6958 sym R (646 sym/14 pcs) 2 img

Data606_Lab 8

29.11.2020

The data The data we’re working with is in the openintro package and it’s called hfi, short for Human Freedom Index. Exercise 1 What are the dimensions of the dataset? data(hfi) dim(hfi) ## [1] 1458 123 Exercise 2 What type of plot would you use to display the relationship between the personal freedom score, pf_score, and one of the other...

9282 sym R (5415 sym/33 pcs) 9 img