Publications by J.Falck, A.Haque, M.Matanos. E.Rodrigues
Data_606_Lab2
Some define statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information – the data. In this lab we explore flights, specifically a random sample of domestic flights that departed from the three major New York City airports in 2013. We will generate s...
12289 sym R (4749 sym/31 pcs) 9 img
Data_606_Lab4
In this lab, you’ll investigate the probability distribution that is most central to statistics: the normal distribution. If you are confident that your data are nearly normal, that opens the door to many powerful statistical methods. Here we’ll use the graphical tools of R to assess the normality of our data and also learn how to generate ra...
12059 sym R (5278 sym/49 pcs) 15 img
Data_606_Lab3
The Hot Hand Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by Gilovich, Vallone, and Tversky collected evidence that contradicted this belief ...
13087 sym R (3251 sym/28 pcs) 2 img
Data_606_Lab6
Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) rm(list = ls(...
12756 sym R (5152 sym/41 pcs) 1 img
Data_607_Assignment_6
Introduction I have placed three files in the the three formats (XML, JSON, HTML) in a AWS S3 folder so they can be accesed directly from the internet. Links are below: https://cuny-msds.s3.amazonaws.com/books.xml https://cuny-msds.s3.amazonaws.com/books.json https://cuny-msds.s3.amazonaws.com/books.html We will ready the files, parse them and en...
1025 sym R (4708 sym/20 pcs)
Data_606_Lab7
Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. rm(list=ls()) library(tidyverse) library(openintro) library(infer)...
7149 sym R (8612 sym/60 pcs) 6 img
Data_607_Assignment_Week9
NY Times Web API Introduction The New York Times offers a standardized way for people to search their database of articles under different criteria. This API allows for user to embedd within their code the proper calls to pull the information they need. Signup Process and Personal API Key To access their data through their API, the NY Times req...
2250 sym R (12672 sym/22 pcs)
Data 606 Proj. Proposal
Global Life Expectancy (WHO) Context The World Health Organization collect annually data for all countries on indicators (metrics) which it believes are important factors for human development. The purpose of this analysis will be to determine what are the best predictors through a linear regression model to predict life expectancy. Process Thi...
6599 sym R (9364 sym/32 pcs) 10 img
Data_607_Final_Project
1 Intro The objective of this project was to use a fairly large dataset (IMDB Movie Reviews) which has 50,000 movie review and define the best sentiment classifier of their reviews. The goal is not really to get into the review themselves, but to test two things: Tidymodels and run several models comparing the for accuracy and execution time...
5452 sym R (13295 sym/81 pcs) 6 img 2 tbl
Data_607_Project_4_Classification
library(tidyverse) library(tidymodels) library(tidytext) library(textrecipes) library(vip) # Some initialization #setwd("./Data607_Project4") rm(list = ls()) path_ham <- "./easy_ham" path_spam <- "./spam_2" path_hamhard <- "./hard_ham" myseed <- 8888 1 Introduction to Project 4 It can be useful to be able to classify new “test” do...
6393 sym R (10787 sym/53 pcs) 5 img