Publications by Kenan Sooklall

DATA-607 Project 3

25.03.2021

Read in the job listings jdf <- read.csv(paste0(path, 'ds_job_listing_software.csv')) jdf <- jdf[1:37,] jdf <- jdf %>% select(c(Keyword, LinkedIn, Indeed, SimplyHired, Monster)) %>% rename('skill'='Keyword') jdf[,2:5] <- lapply(jdf[,2:5],function(x){as.numeric(gsub(",", "", x))}) jdf <- jdf %>% mutate(total = rowSums(across(where(is.numeric)))) ...

1218 sym R (5330 sym/13 pcs) 5 img

DATA-607 Homework 6

19.03.2021

Assignment – Working with XML and JSON in R Pick three of your favorite books on one of your favorite subjects. At least one of the books should have more than one author. For each book, include the title, authors, and two or three other attributes that you find interesting. Topic Computer Science books = c('Hello World', 'The signa...

986 sym R (1718 sym/9 pcs)

DATA-605 Homework 8

16.03.2021

Exercise 11 A company buys 100 lightbulbs, each of which has an exponential lifetime of 1000 hours. What is the expected time for the first of these bulbs to burn out? (See Exercise 10.) n=100 lifetime=1000 lifetime/n ## [1] 10 Exercise 14 Assume that X_1 and X_2 are independent random variables, each having an exponential density with parameter...

1435 sym R (151 sym/8 pcs)

DATA-606 Homework 5

13.03.2021

Heights of adults. (7.7, p. 260) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender, for 507 physically active individuals. The histogram below shows the sample distribution of heights in centimeters. What is the point estimate for the average height...

8802 sym R (1066 sym/19 pcs) 5 img

KSooklall_Homework4 DATA-606

05.03.2021

Area under the curve, Part I. (4.1, p. 142) What percent of a standard normal distribution \(N(\mu=0, \sigma=1)\) is found in each region? Be sure to draw a graph. \(Z < -1.35\) DATA606::normalPlot(0, 1, c(-10, -1.35)) \(Z > 1.48\) DATA606::normalPlot(0, 1, c(1.48, 10)) \(-0.4 < Z < 1.5\) DATA606::normalPlot(0, 1, c(-0.4, 1.5)) \(|Z| > 2\...

6483 sym R (941 sym/30 pcs) 7 img

KSooklall_Homework5 DATA-607

05.03.2021

Read in the data Read csv and transform to have shape nx4. The dataset can be found here Airline - ALASKA or AM WEST Status - On time or Delayed Location - Destination of flight Count - The number of flights to a certain location df <- read.csv('https://raw.githubusercontent.com/ksooklall/CUNY-SPS-Masters-DS/main/DATA_607/homework/homework5/fli...

679 sym R (891 sym/5 pcs) 4 img

DATA-607 Project 2

13.03.2021

For this project I will be cleaning and visualize 3 different datasets Country prision admissions Clean country_prision_admissions.csv data set This dataset has 16 columns and 3143 rows pdf <- read.csv('/home/kenan/Documents/learning/masters/CUNY-SPS-Masters-DS/DATA_607/projects/project_2/datasets/country_prision_admissions.csv') glimpse(pdf) #...

2543 sym R (9516 sym/26 pcs) 10 img

DATA-605 Homework 9

26.03.2021

Problem 1 The price of one share of stock in the Pilsdorff Beer Company (see Exercise 8.2.12) is given by \(Y_n\) on the nth day of the year. Finn observes that the differences \(X_n = Y_{n+1} − Y_n\) appear to be independent random variables with a common distribution having mean \(\mu = 0\) and variance \(\sigma^2 = 1/4\). If \(Y_1 = 100\), e...

1519 sym R (204 sym/6 pcs)

DATA-605 Lab 7

01.04.2021

Getting Started Load packages In this lab, we will explore and visualize the data using the tidyverse suite of packages, and perform statistical inference using infer. The data can be found in the companion package for OpenIntro resources, openintro. Let’s load the packages. library(tidyverse) library(openintro) library(infer) The data Every ...

6826 sym R (4994 sym/35 pcs) 2 img

DATA-606 Protein taste and Covid-19

19.05.2021

Abstract One sign of Covid-19 is the loss of taste and smell, also known as anosmia. To test the loss of taste reviews of protein powder were scraped from Amazon.com and parsed for the proportion of tasteless or no taste comments before and during 2020. A significance level of 0.01 was chosen and a sample size of 3465 reviews were collected. Thro...

9293 sym R (7664 sym/23 pcs) 20 img 1 tbl