Publications by Vinayak Kamath
DATA 606 Data Project
Part 1 - Introduction The World Happiness Report is an annual publication of the United Nations Sustainable Development Solutions Network. It contains articles, and rankings of national happiness based on respondent ratings of their own lives, which the report also correlates with various life factors. Each year, ~1,000 individuals are sampled fr...
4200 sym R (12969 sym/39 pcs) 19 img 4 tbl
Data607-Tydiverse_create_extend
The Data I’ll be using the data from SARS 2003 Outbreak Complete Dataset from Kaggle. The raw github link is here Loading Packages library(tidyverse) ## -- Attaching packages ------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.2.1 v purrr 0.3.3 ## v ti...
2157 sym R (3502 sym/26 pcs) 3 tbl
Data607-Assignment Discussion - Week12 - Recommender System
4/15/2020 Introduction Best Buy utilizes Recommender System to increase revenues and improve customer experience. Best Buy has been using its recommendation system for eCommerce since 2015. The system works by predicting what a customer is interested in based on their individual browsing and purchase data. Best Buy decided to focus on their onli...
1740 sym 2 img
Data606-HomeWork-Chapter 8 - Introduction to Linear Regression
Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on i...
7402 sym R (1189 sym/14 pcs) 10 img
Data606 - Lab 9 - Multiple linear regression
Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...
11602 sym R (8751 sym/40 pcs) 20 img 1 tbl
Data606-HomeWork-Chapter 9 - Multiple and Logistic Regression
Baby weights, Part I. (9.1, p. 350) The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable smoke is c...
7485 sym R (2328 sym/17 pcs) 2 img
Data607-Week10-Sentiment Analysis
Sentiment Analysis We will use Sentiment analysis to have text analysis to systematically identify, extract, quantify, and study affective states and subjective information. We will do this on the corpus of Novels and using different sentiment lexicon as discussed further in below sections. Loading the required libraries: #install.packages("tidy...
4129 sym R (15318 sym/59 pcs) 9 img 2 tbl
DATA 606 Data Project Proposal
Data Preparation Life Expectancy and Happiness; We will use the following data-sets from Kaggle on relating happiness with life expectancy. Life expectancy could be linked to many factors like monetary or physical needs or living condition or external factors with politics or like. Over here we will aim to correlate life expectancy with happiness...
3780 sym R (8672 sym/33 pcs) 4 img 6 tbl
Data607-Week09-Web APIs
New York Times web APIs The New York Times web site provides a rich set of APIs, as described here; For this assignment I have chosen Article Search API from New York Times web APIs; we will construct an interface in R to read in the JSON data, and transform it into an R DataFrame. Article Search API Overview: Use the Article Search API to look ...
894 sym R (1407 sym/6 pcs)
Data606-HomeWork-Chapter 7 - Inference for Numerical Data
Working backwards, Part II. (5.24, p. 203) A 90% confidence interval for a population mean is (65, 77). The population distribution is approximately normal and the population standard deviation is unknown. This confidence interval is based on a simple random sample of 25 observations. Calculate the sample mean, the margin of error, and the sampl...
5988 sym R (3259 sym/17 pcs) 4 img 1 tbl