Publications by Shidong Li

699 Week 7 Cluster Analysis

29.12.2020

Step 1. Find the optimal number of clusters (elbow, gap or silhouette methods) The result (plot) suggests that 6 is the optimal number of clusters, as it appears to be the bend in the knee/elbow. The plot from average sihlouette method suggests 6 as the optimal number. The plot from gap statistic method suggests 1 as the optimal number - ignore. ...

893 sym R (6965 sym/36 pcs) 16 img

ANLY505-2020-Late Fall-Assignment 6 Ulysses' Compass

14.12.2020

Chapter 7 - Ulysses’ Compass The chapter began with the problem of overfitting, a universal phenomenon by which models with more parameters fit a sample better, even when the additional parameters are meaningless. Two common tools were introduced to address overfitting: regularizing priors and estimates of out-of-sample accuracy (WAIC and PSIS)...

5580 sym R (187 sym/6 pcs)

699 Week 5 Regression

14.12.2020

Step 1. Scale or normalize your data. Make sure to apply imputation if needed The output shows that all the numerical variables have been standardized with a mean value of zero. preproc1 <- preProcess(data_complete[,c(7,8,14)], method=c("center", "scale")) data_complete_scaled <- predict(preproc1, data_complete[,c(7,8,14)]) summary(data_complet...

1495 sym R (15841 sym/20 pcs) 6 img

ANLY505-2020-Late Fall-Assignment 5 The Haunted DAG & The Causal Terror

13.12.2020

Chapter 6 - The Haunted DAG & The Causal Terror Multiple regression is no oracle, but only a golem. It is logical, but the relationships it describes are conditional associations, not causal influences. Therefore additional information, from outside the model, is needed to make sense of it. This chapter presented introductory examples of some com...

2562 sym R (3486 sym/23 pcs) 1 img

Coursera - R (Week 3)

12.12.2020

Complete all Exercises, and submit answers to Questions on the Coursera platform. Hot Hands Basketball players who make several baskets in succession are described as having a hot hand. Fans and players have long believed in the hot hand phenomenon, which refutes the assumption that each shot is independent of the next. However, a 1985 paper by ...

11604 sym R (2803 sym/25 pcs) 4 img

Coursera - R (Week 2)

11.12.2020

Complete all Exercises, and submit answers to Questions on the Coursera platform. Some define statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information - the data. In this lab we explore flights, specifically a random sample of domestic flights that...

12760 sym R (923580 sym/47 pcs) 8 img

Shidong Li_ANLY505-2020-Late Fall-Assignment 4 Many Variables and Spurious Waffles

05.12.2020

Chapter 5 - Many Variables and Spurious Waffles This chapter introduced multiple regression, a way of constructing descriptive models for how the mean of a measurement is associated with more than one predictor variable. The defining question of multiple regression is: What is the value of knowing each predictor, once we already know the other pr...

4962 sym R (5691 sym/24 pcs) 5 img

699 Week 3 Data Visualization

24.11.2020

1. Create Univariate analysis for the variable of your interest (your Y variable). Calculate skewness and kurtosis and describe the results. My Y variable is the median sale price. The skewness of the Y variable is 3.233, indicating that it is a substantially positively skewed distribution. skewness(data_complete$median_sale_price) ## [1] 3.23304...

1944 sym R (515 sym/9 pcs) 5 img

Shidong Li_ANLY505-2020-Late Fall

03.11.2020

Chapter 2 - Large Worlds and Small Worlds The objectives of this problem set is to work with the conceptual mechanics of Bayesian data analysis. The target of inference in Bayesian inference is a posterior probability distribution. Posterior probabilities state the relative numbers of ways each conjectured cause of the data could have produced th...

6022 sym R (3280 sym/16 pcs) 6 img

Test Document_1031

31.10.2020

R Markdown This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within t...

591 sym R (268 sym/2 pcs) 1 img