Publications by Joey Campbell

Interpreting Logistic Regression Coefficients

21.11.2024

Different ways to interpret logistic regression results Interpreting the results of a logistic regression model involves understanding how the predictor variables affect the probability of the outcome event. Here are some common ways to interpret logistic regression results: Coefficients and Odds Ratios Coefficients: These represent the change...

4254 sym R (6936 sym/15 pcs) 4 img

Multivariable Logistic Regression in R

31.10.2024

This notebook lecture will cover multivariable logistic regression in R, using the Titanic survival dataset as an example. Introduction Univariable models are insufficient for understanding complex phenomena because they do not account for the interconnectedness of multiple factors. Multivariable logistic regression is a more realistic approac...

20985 sym R (8429 sym/58 pcs) 12 img 1 tbl

Automated EDA in R

19.09.2024

Here we try automating exploratory data analysis with DataExplorer. We start by loading hospital dataset from Kaggle into Rstudio. library(readr) healthcare<-read_csv("C:/Users/email/Downloads/Healthcare_Investments_and_Hospital_Stay (1).csv") Rows: 518 Columns: 6── Column specification ──────────────────�...

3209 sym R (8358 sym/13 pcs) 4 img

EDA in R

12.09.2024

Objectives At the end of the lecture, you will be able to - perform exploratory data analysis (EDA) using graphical methods - perform EDA using descriptive statistics - acquire basic skills to use ggplot2 and gtsummary packages Introduction Exploratory Data Analysis or EDA is the critical process of performing initial investigations on data to...

8138 sym R (12590 sym/62 pcs) 14 img 6 tbl

ANOVA in R

29.08.2024

ANOVA is a statistical test for estimating how a quantitative dependent variable changes according to the levels of one or more categorical independent variables. ANOVA tests whether there is a difference in means of the groups at each level of the independent variable. The null hypothesis (\(H_0\)) of the ANOVA is no difference in means, and ...

37238 sym R (4408 sym/28 pcs) 6 img

Case Study EPA

23.04.2020

We are going apply some of the techniques we learned in Exploratory Data Analysis to study air pollution data, specifically particulate matter (we’ll call it pm25 sometimes), collected by the U.S. Environmental Protection Agency. This website https://www.health.ny.gov/environmental/indoors/air/pmq_a.htm from New York State offers some basic inf...

4859 sym R (6291 sym/24 pcs)

Lab 5 Cross-Validation and the Bootstrap

20.04.2020

In this lab, we explore the resampling techniques covered in this chapter. Some of the commands in this lab may take a while to run on your computer. The Validation Set Approach We explore the use of the validation set approach in order to estimate the test error rates that result from fitting various linear models on the Auto data set. Before w...

32291 sym R (6692 sym/39 pcs)

Rmarkdown Demo

18.04.2020

Problem 10 This question should be answered using the Carseats data set. library(ISLR) ## Warning: package 'ISLR' was built under R version 3.6.3 attach(Carseats) (a) Fit a multiple regression model to predict Sales using Price,Urban, and US. fit<-lm(Sales~Price+Urban+US) summary(fit) ## ## Call: ## lm(formula = Sales ~ Price + Urban + US) #...

2272 sym R (4331 sym/14 pcs) 1 img

R Lab: Logistic Regression, LDA, QDA, and KNN

16.04.2020

The Stock Market Data We will begin by examining some numerical and graphical summaries of the Smarket data, which is part of the ISLR library. This data set consists of percentage returns for the S&P 500 stock index over 1, 250 days, from the beginning of 2001 until the end of 2005. For each date, we have recorded the percentage returns for each...

50687 sym R (8922 sym/115 pcs) 2 img

NCI60 Data Example R Lab

16.04.2020

Unsupervised techniques are often used in the analysis of genomic data. In particular, PCA and hierarchical clustering are popular tools. We illustrate these techniques on the NCI60 cancer cell line microarray data, which consists of 6,830 gene expression measurements on 64 cancer cell lines. library(ISLR) nci.labs=NCI60$labs nci.data=NCI60$dat...

18698 sym R (6401 sym/27 pcs) 6 img