Publications by Vinayak Kamath

DATA 606 Data Project

02.05.2020

Part 1 - Introduction The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors. Each year, ~1,000 individuals are sampled fr...

4200 sym R (12969 sym/39 pcs) 19 img 4 tbl

Data607-Tydiverse_create_extend

15.04.2020

The Data I’ll be using the data from SARS 2003 Outbreak Complete Dataset from Kaggle. The raw github link is here Loading Packages library(tidyverse) ## -- Attaching packages ------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.2.1 v purrr 0.3.3 ## v ti...

2157 sym R (3502 sym/26 pcs) 3 tbl

Data607-Assignment Discussion - Week12 - Recommender System

15.04.2020

4/15/2020 Introduction Best Buy utilizes Recommender System to increase revenues and improve customer experience. Best Buy has been using its recommendation system for eCommerce since 2015. The system works by predicting what a customer is interested in based on their individual browsing and purchase data. Best Buy decided to focus on their onli...

1740 sym 2 img

Data606-HomeWork-Chapter 8 - Introduction to Linear Regression

15.04.2020

Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on i...

7402 sym R (1189 sym/14 pcs) 10 img

Data606 - Lab 9 - Multiple linear regression

15.04.2020

Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...

11602 sym R (8751 sym/40 pcs) 20 img 1 tbl

Data606-HomeWork-Chapter 9 - Multiple and Logistic Regression

15.04.2020

Baby weights, Part I. (9.1, p. 350) The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is c...

7485 sym R (2328 sym/17 pcs) 2 img

Data607-Week10-Sentiment Analysis

05.04.2020

Sentiment Analysis We will use Sentiment analysis to have text analysis to systematically identify, extract, quantify, and study affective states and subjective information. We will do this on the corpus of Novels and using different sentiment lexicon as discussed further in below sections. Loading the required libraries: #install.packages("tidy...

4129 sym R (15318 sym/59 pcs) 9 img 2 tbl

DATA 606 Data Project Proposal

30.03.2020

Data Preparation Life Expectancy and Happiness; We will use the following data-sets from Kaggle on relating happiness with life expectancy. Life expectancy could be linked to many factors like monetary or physical needs or living condition or external factors with politics or like. Over here we will aim to correlate life expectancy with happiness...

3780 sym R (8672 sym/33 pcs) 4 img 6 tbl

Data607-Week09-Web APIs

28.03.2020

New York Times web APIs The New York Times web site provides a rich set of APIs, as described here; For this assignment I have chosen Article Search API from New York Times web APIs; we will construct an interface in R to read in the JSON data, and transform it into an R DataFrame. Article Search API Overview: Use the Article Search API to look ...

894 sym R (1407 sym/6 pcs)

Data606-HomeWork-Chapter 7 - Inference for Numerical Data

22.03.2020

Working backwards, Part II. (5.24, p. 203) A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sampl...

5988 sym R (3259 sym/17 pcs) 4 img 1 tbl