Publications by Shoshana Farber
DATA 624 - Homework 4
Exercise 3.1 The UC Irvine Machine Learning Repository contains a data set related to glass identification. The data consist of 214 glass samples labeled as one of seven class categories. There are nine predictors, including the refractive index and percentages of eight elements: Na, Mg, Al, Si, K, Ca, Ba, and Fe. The data can be accessed via: ...
3059 sym R (4644 sym/19 pcs) 42 img
DATA 624 - Homework 3
Exercise 5.1 Produce forecasts for the following series using whichever of NAIVE(y), SNAIVE(y) or RW(y ~ drift()) is more appropriate in each case: Australian Population (global_economy) For this, let’s predict the Australian population for the next ten years. The population is steadily increasing without seasonal trends so we can use the d...
5013 sym 20 img
DATA 624 - Homework 2
Exercise 3.1 Consider the GDP information in global_economy. Plot the GDP per capita for each country over time. Which country has the highest GDP per capita? How has this changed over time? data("global_economy") global_economy <- global_economy |> mutate(GDP_cap = GDP/Population) global_economy |> autoplot(GDP_cap, show.legend=F) + ...
4937 sym 26 img
DATA 624 Homework 1
library(fpp3) Exercise 2.1 Explore the following four time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec. data(aus_production, pelt, gafa_stock, vic_elec) Use ? (or help()) to find out about the data in each series. ?aus_production ?pelt ?gafa_stock ?vic_elec What is the time interval of e...
4338 sym R (6039 sym/53 pcs) 26 img 2 tbl
DATA 621 Homework 3
Homework 3 - Logistic Regression Overview: In this homework assignment, you will explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not (0). Your objective is to build a...
16258 sym R (25340 sym/15 pcs) 16 img 9 tbl
DATA 608 - Story 1
Assignment Details This assignment is based on data on the present allocation of the Infrastructure Investment and Jobs Act (IIJA) funding by State and Territory. The goal of the assignment is to use data visualizations to address the following questions: Is the allocation equitable based on the population of each of the States and Territories...
5405 sym 5 img
DATA 605 - Final Project
Problem 1 Generate Distributions Probability Density 1: X~Gamma. Using R, generate a random variable \(X\) that has 10,000 random Gamma pdf values. A Gamma pdf is completely described by \(n\) (a size parameter) and \(\lambda\) (a shape parameter). Choose any \(n\) greater than 3 and an expected value (\(\lambda\)) between 2 and 10. set.seed(...
12313 sym Python (55810 sym/146 pcs) 32 img
Final Project - Motor Vehicle Collisions
Abstract This study investigates factors associated with motor vehicle collisions and their relationships with collision severity. The analysis was conducted using car crash data obtained from the New York City Police Department, which contains approximately 1.98 million collision records. The highest number of recorded collisions occurred betw...
20831 sym Python (15060 sym/62 pcs) 15 img 2 tbl
Tidyverse EXTEND
Required Libraries library(tidyverse) ## Warning: package 'tidyverse' was built under R version 4.2.2 ## Warning: package 'ggplot2' was built under R version 4.2.2 library(lubridate) library(httr) library(jsonlite) Tidyvserse Packages Tidyverse contains many packages within it that allows users to work with strings, mutate and rearange datafram...
3326 sym R (2005 sym/19 pcs) 2 img 4 tbl
DATA 607 - EC 5
The goal of this project is to implement a Global Baseline Estimate recommendation system in R based on movie ratings. This is an extension of assignment 2, where movie ratings were collected from different individuals. For this, we will use the ratings collected in assignment 2 by connecting to the SQL database in which they are stored. Conne...
1887 sym 1 tbl