Publications by Vinayak Kamath

Data607-Major Assignment-Project3-Most Valued Data Science Skills

22.03.2020

Overview: The objective of this project was to answer the question: “Which are the most valued data science skills?”. The dataset used to answer this question was sourced from Kaggle: https://www.kaggle.com/elroyggj/indeed-dataset-data-scientistanalystengineer. It contains information from 5715 data-science related job postings on the job-li...

2732 sym R (27121 sym/70 pcs) 34 img 3 tbl

Data606 - Lab 6 - Inference for numerical data

22.03.2020

North Carolina births In 2004, the state of North Carolina released a large data set containing information on births recorded in this state. This data set is useful to researchers studying the relation between habits and practices of expectant mothers and the birth of their children. We will work with a random sample of observations from this da...

5878 sym R (6214 sym/36 pcs) 9 img 1 tbl

Data606 - Lab 6 - Inference for categorical data

15.03.2020

In August of 2012, news outlets ranging from the Washington Post to the Huffington Post ran a story about the rise of atheism in America. The source for the story was a poll that asked people, “Irrespective of whether you attend a place of worship or not, would you say you are a religious person, not a religious person or a convinced atheist?�...

12437 sym R (5438 sym/40 pcs) 10 img

Data607-Week07-Working with XML and JSON in R

14.03.2020

Working with XML and JSON in R Books Selected I have picked the below three books (at random) from the Barnes & Nobles website: Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future ; by Ashlee Vance A Brief History of Time: From the Big Bang to Black Holes ; by Stephen Hawking 1066 Turned Upside Down; by Joanna Courtney / Hellen Holli...

1174 sym R (849 sym/3 pcs) 3 img

Data606 - Lab 5b - Foundations for statistical inference - Confidence intervals

08.03.2020

Sampling from Ames, Iowa If you have access to data on an entire population, say the size of every house in Ames, Iowa, it’s straight forward to answer questions like, “How big is the typical house in Ames?” and “How much variation is there in sizes of houses?”. If you have access to only a sample of the population, as is often the case...

6803 sym R (1248 sym/23 pcs) 3 img

Data607-MajorAssignment-Project2-Data Transformation

08.03.2020

Data Transformation Below three of the “wide” datasets identified in the Week 6 Discussion items have been used for this exercise. Set 1 - Bank stocks from 2007 (Discussion Thread by Jeff Shamp) Set 2 - UNICEF dataset on Under 5 Mortality (Discussion Thread by Samuel Bellows) Set 3 - Hospital Consumer Assessment of Healthcare Providers and...

2822 sym R (7075 sym/24 pcs) 6 img 7 tbl

Data606-HomeWork-Chapter5-Foundations for Inference

07.03.2020

Heights of adults. (7.7, p. 260) Researchers studying anthropometry collected body girth measurements and skeletal diameter measurements, as well as age, weight, height and gender, for 507 physically active individuals. The histogram below shows the sample distribution of heights in centimeters. What is the point estimate for the average height...

7070 sym R (3816 sym/26 pcs) 6 img

Data606-HomeWork-Chapter4-Distributions of Random Variables

29.02.2020

Area under the curve, Part I. (4.1, p. 142) What percent of a standard normal distribution \(N(\mu=0, \sigma=1)\) is found in each region? Be sure to draw a graph. \(Z < -1.35\) normalPlot(mean = 0, sd = 1, bounds = c( -Inf, -1.35), tails = FALSE) round(pnorm(-1.35, lower.tail=T), 4) ## [1] 0.0885 \(Z > 1.48\) normalPlot(mean = 0, sd = 1, boun...

6256 sym R (3453 sym/48 pcs) 6 img

Data606 - Lab 4 - The normal distribution

28.02.2020

In this lab we’ll investigate the probability distribution that is most central to statistics: the normal distribution. If we are confident that our data are nearly normal, that opens the door to many powerful statistical methods. Here we’ll use the graphical tools of R to assess the normality of our data and also learn how to generate random...

10281 sym R (3290 sym/35 pcs) 12 img

Data607-MajorAssignment-Project1-Chess Tournament

17.02.2020

In this project, you’re given a text file with chess tournament results where the information has some structure. Your job is to create an R Markdown file that generates a .CSV file (that could for example be imported into a SQL database) with the following information for all of the players: Player’s Name, Player’s State, Total Number of P...

1642 sym R (8107 sym/18 pcs)