Publications by Daniel
Nested mixed model
Nested mixed model in R reference Load package and data # load package library(nlme) # read in data setwd("C:\\Users\\hed2\\OneDrive - National Institutes of Health\\Mixed model by SAS and R") DF <- read.csv("Oxide.csv") # specific the reference group DF$Source=as.factor(DF$Source) DF <- within(DF, Source<- relevel(Source, ref = 2)) Rand...
940 sym R (5720 sym/17 pcs)
Data Mining
library(ISLR2) View(Hitters) names(Hitters) ## [1] "AtBat" "Hits" "HmRun" "Runs" "RBI" "Walks" ## [7] "Years" "CAtBat" "CHits" "CHmRun" "CRuns" "CRBI" ## [13] "CWalks" "League" "Division" "PutOuts" "Assists" "Errors" ## [19] "Salary" "NewLeague" dim(Hitters) ## [1] 322 20 s...
125 sym R (27547 sym/113 pcs) 9 img
Common issues in Statistics
reference No plotting before analysis Plot is sometimes better to check the assumptions than hypothesis test. Instead, use a probability plot (also know as a quantile plot or Q-Q plot). it is very hard to tell whether or not a small data set comes from a particular distribution. Histogram varies by the number of bins. Plot original lowess plo...
12047 sym
Common issues in Statistics
reference No plotting before analysis Plot is sometimes better to check the assumptions than hypothesis test. Instead, use a probability plot (also know as a quantile plot or Q-Q plot). it is very hard to tell whether or not a small data set comes from a particular distribution. Histogram varies by the number of bins. Plot original lowess plo...
11958 sym
Central Limit Theorem
Central Limit Theorem The Central Limit Theorem1 says that for most distributions, linear combinations (e.g., the sum or the mean) of a large enough number of independent random variables is approximately normal. For example, adult human heights (at least if we restrict to one sex3) are the sum of many heights: the heights of the ankles, lower ...
787 sym Python (523 sym/5 pcs) 5 img
Common issues in Statistics
reference plot original lowess plots or other types of plots fitted line with x Interpreting causality/ association “The only legitimate way to try to establish a causal connection statistically is through the use of randomized experiments.” “On average, people who take this medication have a decrease in blood pressure”. “The rate o...
11585 sym
Fixed or Random Factors
Fixed or Random Factors E.g. Two way ANOVA Fixed effect factor: Data has been gathered from all the levels of the factor that are of interest. Random effect factor: The factor has many possible levels, interest is in all possible levels, but only a random sample of levels is included in the data. The standard methods for analyzing random effe...
749 sym
Quantile Regression
Quantile Regression Standard regression estimates the mean of the conditional distribution (conditioned on the values of the predictors) of the response variable. Quantile regression is a method for estimating conditional quantiles, including the median....
263 sym
Issues about Dividing a Continuous Variable into Categories
Issues about Dividing a Continuous Variable into Categories Modern regression models do not require categorization. In general, continuous variables should remain continuous in regression models designed to study the effects of the variable on the outcome of interest. –by O. Naggara When doing hypothesis tests, the loss of information when d...
1444 sym
Logistic distribution and logistic regression
Logistic distribution and logistic regression Generalized linear model, GLM Generalized linear models cover all these situations by allowing for response variables that have arbitrary distributions (rather than simply normal distributions), and for an arbitrary function of the response variable (the link function) to vary linearly with the pred...
2522 sym Python (768 sym/4 pcs) 3 img