Publications by Joey Campbell

Principle Comonents Analysis R Lab

16.04.2020

In this lab, we perform PCA on the USArrests data set, which is part of the base R package. The rows of the data set contain the 50 states, in alphabetical order. states=row.names(USArrests )#create states vector as names in USArrests data states #look at it [1] "Alabama" "Alaska" "Arizona" "Arkansas" "California" ...

12267 sym R (3219 sym/31 pcs) 4 img

R Lab: Logistic Regression

20.08.2020

The Stock Market Data We will begin by examining some numerical and graphical summaries of the Smarket data, which is part of the ISLR library. This data set consists of percentage returns for the S&P 500 stock index over 1, 250 days, from the beginning of 2001 until the end of 2005. For each date, we have recorded the percentage returns for each...

24447 sym R (5208 sym/45 pcs) 2 img

ISLR Chapter 9 ITSEM HW

30.07.2020

In this assignment, we explore the support vector machine (SVM), an approach for classification that was developed in the computer science community in the 1990s and that has grown in popularity since then. SVMs have been shown to perform well in a variety of settings, and are often considered one of the best “out of the box” classifiers Pro...

25801 sym R (9358 sym/94 pcs) 21 img

ISLR Chapter 5 ITSEM HW

10.07.2020

Resampling methods involve repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model. They may allow us to obtain information that would not be available from fitting the model only once using the original training sample. Resampling approaches ...

19690 sym R (5417 sym/33 pcs)

ISLR Chapter 8 ITSEM HW

27.07.2020

Tree-based methods for regression and classification involve stratifying or segmenting the predictor space into a number of simple regions. In order to make a prediction for a given observation, we typically use the mean or the mode of the training observations in the region to which it belongs. In this exercise we compare the criterion for split...

17877 sym R (6744 sym/36 pcs) 5 img

ITSEM Session 1 Statistical Learning, Linear Regression, and Classification

10.07.2020

Chapter 2 Problem 2 Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p. (a) We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interes...

31814 sym R (15293 sym/80 pcs) 3 img

ISLR Chapter 6 ITSEM HW

16.07.2020

Before moving to the non-linear world, in chapter 6 we investigate some ways in which the simple linear model can be improved, by replacing plain least squares fitting with some alternative fitting procedures. Question 2. For parts (a) through (c), indicate which of i. through iv. is correct. Justify your answer. (a) The lasso, relative to least...

17419 sym R (6797 sym/45 pcs) 5 img

ISLR Chapter 7 ITSEM HW

16.07.2020

Question 6. In this exercise, you will further analyze the Wage data set considered throughout this chapter. (a) Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polynomial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a...

10634 sym R (6907 sym/25 pcs) 6 img

R class assignment 1

04.02.2021

Introduction Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphe...

13422 sym R (2132 sym/7 pcs) 6 img

R Lab: Survival Analysis

22.11.2021

Survival Analysis In this lab, we perform survival analyses on three separate data sets. In Section 11.8.1 we analyze the BrainCancer data that was first described in Section 11.3. In Section 11.8.2, we examine the Publication data from Section 11.5.4. Finally, Section 11.8.3 explores a simulated call center data set. Brain Cancer Data We begin ...

21732 sym R (9035 sym/56 pcs) 6 img