Publications by Arvind Sharma
CLT
Central Limit Theorem The Central Limit Theorem (CLT) is one of the most important theorems in statistics and data science. The CLT states that the sample mean of a probability distribution sample is a random variable with a mean value given by population mean and standard deviation given by population standard deviation divided by square root of ...
4161 sym R (10299 sym/65 pcs) 5 img
Midterm Solution
Question 1: Basic Data Analysis in R (Assignment+Discussion 1) In 1986, the Challenger space shuttle exploded during “throttle up” due to catastrophic failure of o-rings (seals) around the rocket booster. The data (real) on all space shuttle launches prior to the Challenger disaster are in the file challenger.csv. Load the data into R or Python...
9617 sym R (11462 sym/97 pcs) 7 img
Discussion3
Often, we can model processes using several different probability distributions (skim). For example, we might use the Poisson instead of the binomial (\(n>20\) and \(np<10\) aka large n and small p), the binomial instead of the geometric (both are repetition of independent Bernoulli trials), or the normal approximation instead of the binomial (if \...
4732 sym 2 img
HW3
It would be very helpful if you could plot the distributions before calculating the probabilities. Begin with reading up on the plot() function. These questions will help you build an understanding of Normal, Binomial, Hypergeometric and Poisson distribution. You will be using the probability density function, cumulative density function and quanti...
13813 sym Python (15566 sym/101 pcs) 8 img
HW3_Q2
Q2. A quality control inspector has drawn a sample of 13 light bulbs from a recent production lot. Suppose 20% of the bulbs in the lot are defective. What is the probability that less than 6 but more than 3 bulbs from the sample are defective? Round your answer to four decimal places. i. Identify the distribution. This is a binomial distribution ...
1267 sym 1 img
Discussion1
1 Discussion on Iris Dataset ?read.csv In OpenStats Chapter 1, Exercises, Problem 9, there is a reference to Fisher’s iris data. Discuss the solutions to this problem, and then conduct a descriptive analysis of the data which are conveniently available in R. To access the data in R, simply type “iris.” Investigate any additional R libraries...
3192 sym R (12659 sym/77 pcs) 7 img
HW1
1 Instructions Go to Kaggle.com (owned by Google). Create a free account. Sign up for the Titanic: Machine Learning through Disaster competition located here: https://www.kaggle.com/c/titanic/data?select=train.csv Download the train.csv data. Open the train.csv file in R. To do so, use something like mydata <- read.csv(‘D:/train.csv’) but re...
6236 sym R (18445 sym/83 pcs) 3 img
R Markdown
1 Official R Markdown Guide The link above is what you should explore to understand R Markdown. Can replace ‘html_document’ with ‘pdf_document’ in the .Rmd (Rmarkdown) file above manually to generate the output in your preferred format. However, I would strongly suggest using HTML format initially as the setup is likely to reduce math sym...
3848 sym 1 img
Week 1:Bivariate Regression
Setting Up Working directory, clearing all data and memory # Clear the workspace rm(list = ls()) # Clear environment gc() # Clear unused memory ## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb) ## Ncells 539941 28.9 1208047 64.6 NA 669282 35.8 ## Vcells 990762 7.6 8388608 64.0 32768 1840247 14.1 cat(...
724 sym R (5859 sym/39 pcs) 6 img
OLS:lm vs matrix algebra formula
1 Introduction One of the very first learning algorithms that you’ll encounter when studying data science and machine learning is least squares linear regression. Linear regression is one of the easiest learning algorithms to understand; it’s suitable for a wide array of problems, and is already implemented in many programming languages. Most...
4658 sym R (10085 sym/15 pcs)