Publications by Dimple K. Patel
HarvardX: PH125.1x
18.10 Exercises Since the 1980s, sabermetricians have used a summary statistic different from batting average to evaluate players. They realized walks were important and that doubles, triples, and HRs, should be weighed more than singles. As a result, they proposed the following metric: \(\frac{BB}{PA}+\frac{Singles+2Doubles+3Triples+4HR}{AB}\)...
2982 sym R (6068 sym/29 pcs) 6 img
HarvardX: PH125.1x
18.6 Exercises 1. In a previous section, we computed the correlation between mothers and daughters, mothers and sons, fathers and daughters, and fathers and sons, and noticed that the highest correlation is between fathers and sons and the lowest is between mothers and sons. We can compute these correlations using: Are these differences statis...
1519 sym R (4830 sym/29 pcs) 3 img
HarvardX: PH125.1x
17.5 Exercises 1. Load the GaltonFamilies data from the HistData. The children in each family are listed by gender and then by height. Create a dataset called galton_heights by picking a male and female at random. library(tidyverse) ## Warning: package 'stringr' was built under R version 4.3.3 ## Warning: package 'lubridate' was built under R ve...
500 sym R (2389 sym/17 pcs) 1 img
HarvardX: PH125.1x
18.4 Exercises We have shown how BB and singles have similar predictive power for scoring runs. Another way to compare the usefulness of these baseball metrics is by assessing how stable they are across the years. Since we have to pick players based on their previous performances, we will prefer metrics that are more stable. In these exercises,...
1396 sym R (1563 sym/18 pcs) 2 img
HarvardX: PH125.1x
16.9 Exercises 1. Create this table. Now for each poll use the CLT to create a 95% confidence interval for the spread reported by each poll. Call the resulting object cis with columns lower and upper for the limits of the confidence intervals. Use the select function to keep the columns state, startdate, end date, pollster, grade, spread, lower...
2870 sym R (6715 sym/24 pcs) 5 img
HarvardX: PH125.1x
15.11 Exercises 1. A famous athlete has an impressive career, winning 70% of her 500 career matches. However, this athlete gets criticized because in important events, such as the Olympics, she has a losing record of 8 wins and 9 losses. Perform a Chi-square test to determine if this losing record can be simply due to chance as opposed to not ...
1368 sym
HarvardX: PH125.1x
16.3 Exercises We have been using urn models to motivate the use of probability models. Most data science applications are not related to data obtained from urns. More common are data that come from individuals. The reason probability plays a role here is because the data come from a random sample. The random sample is taken from a population ...
6805 sym R (6962 sym/29 pcs) 1 img
HarvardX: PH125.1x
15.11 Exercises 1. A famous athlete has an impressive career, winning 70% of her 500 career matches. However, this athlete gets criticized because in important events, such as the Olympics, she has a losing record of 8 wins and 9 losses. Perform a Chi-square test to determine if this losing record can be simply due to chance as opposed to not ...
1368 sym
HarvardX: PH125.1x
15.7 Exercises For these exercises, we will use actual polls from the 2016 election. You can load the data from the dslabs package. library(dslabs) library(magrittr) data("polls_us_election_2016") Specifically, we will use all the national polls that ended within one week before the election. library(tidyverse) ## Warning: package 'stringr' was...
2484 sym R (22247 sym/25 pcs) 2 img
HarvardX: PH125.1x
15.5 Exercises 1. Write an urn model function that takes the proportion of Democrats \(p\) and the sample size \(N\) as arguments and returns the sample average if Democrats are 1s and Republicans are 0s. Call the function take_sample. take_sample<-function(p, N) { x <- sample(c(1, 0), size=N, replace=TRUE, prob=c(p, 1-p)) mean(x) } 2. No...
3827 sym 3 img