Publications by Shidong Li
Shidong Li_ANLY505-2020-Late Fall-Assignment2
Chapter 3 - Sampling the Imaginary This chapter introduced the basic procedures for manipulating posterior distributions. Our fundamental tool is samples of parameter values drawn from the posterior distribution. These samples can be used to produce intervals, point estimates, posterior predictive checks, as well as other kinds of simulations. Po...
3170 sym R (3686 sym/51 pcs) 2 img
Shidong Li_ANLY505-2020-Late Fall-Assignment 3 Geocentric Models
Chapter 4 - Geocentric Models This chapter introduced the simple linear regression model, a framework for estimating the association between a predictor variable and an outcome variable. The Gaussian distribution comprises the likelihood in such models, because it counts up the relative numbers of ways different combinations of means and standard...
4311 sym R (7780 sym/36 pcs) 7 img
Coursera - R (Week 1)
Complete all Exercises, and submit answers to Questions on the Coursera platform. The goal of this lab is to introduce you to R and RStudio, which you’ll be using throughout the course both to learn the statistical concepts discussed in the course and to analyze real data and come to informed conclusions. To straighten out which is which: R is...
17822 sym R (4092 sym/33 pcs) 6 img
699 Week 6 Factor Analysis
Step 1 - Data Description How many dimensions? Answer: There are 8548 observations * 13 variables. What are the variables types? Answer: 10 of them are numeric, 3 are character (region_name, region_state, region_type) What are the variable names? Answer: See below Remove all non-numeric variables and create a dataset called data_X. dim(data_compl...
2092 sym R (17720 sym/33 pcs) 8 img
699 Week 4 Missing Data and Outliers Analysis
1. Describe missing data, provide summary of missing data, similar to the analysis in the Chapter 2 (table 3): Count of missing data/percent per variable, type of missing data (NA, null), total percent of missingness per dataset The dataset has a lot of variables. In order to clearly demonstrate them, let’s look at the summary of the data first...
1870 sym R (11139 sym/18 pcs) 5 img
ANLY505-2020-Late Fall-Assignment 7 - Conditional Manatees
Chapter 8 - Conditional Manatees This chapter introduced interactions, which allow for the association between a predictor and an outcome to depend upon the value of another predictor. While you can’t see them in a DAG, interactions can be important for making accurate inferences. Interactions can be difficult to interpret, and so the chapter a...
3580 sym R (5552 sym/19 pcs) 2 img
699 Week 8 Dimension Reduction Analysis
Step 1. Establish the optimal number of components: visualize the scree plot and explain your decision datamatrix <- cor(data_X) corrplot(datamatrix, order="hclust", type='upper', tl.srt = 45) cov_mat = cov(X_std) pcaobj <- prcomp(X_std) print(pcaobj) ## Standard deviations (1, .., p=9): ## [1] 2.0140181 1.6663949 1.0147369 0.8319572 0.45422...
956 sym R (3461 sym/11 pcs) 4 img
ANLY525 - Monte Carlo Lab
Simulation in R In class we have talked about computational methods and introduced the idea of modeling and simulation. We discussed how many levels of uncertainty and complex interactions make systems challenging to predict and understand. One approach to deal with uncertainty, such as in the expected cost example, is to run repeated simulations...
4569 sym R (8027 sym/70 pcs) 8 img
ANLY505-2020-Late Fall-Assignment 9 - Markov Chain Monte Carlo
Chapter 9 - Markov Chain Monte Carlo This chapter has been an informal introduction to Markov chain Monte Carlo (MCMC) estimation. The goal has been to introduce the purpose and approach MCMC algorithms. The major algorithms introduced were the Metropolis, Gibbs sampling, and Hamiltonian Monte Carlo algorithms. Each has its advantages and disadva...
3432 sym R (12829 sym/46 pcs) 9 img
House data Complete Analysis 0215
1. Explanatory Data Analysis Missing values EDA Visualization # 1. Describe missing data, provide summary of missing data, similar to the analysis in the Chapter 2 (table 3): Count of missing data/percent per variable, type of missing data (NA, null), total percent of missingness per dataset summary(data) ## region_state period_begin ...
524 sym R (42197 sym/106 pcs) 45 img