Publications by Kitada Smalley
DATA252:KNN
Learning Objectives In this lesson students will learn … How to implement the K-nearest neighbors (knn) algorithm Produce stratified training and testing sets The importance of standardizing data How to tune the knn algorithm to pick the best value of \(k\) hyperparameter BINARY CASE Example 1: Pima Indigenous People Motivation/Background: �...
6008 sym R (10878 sym/83 pcs) 11 img
DATA252: Regression Trees
Learning Objectives In this lesson students will learn how to implement… Regression trees Prune a tree Perform class validation to choose tree complexity Citation: Examples for this lesson come from https://bookdown.org/tpinto_home/Beyond-Additivity/ Osteoporosis Facts: Osteoporosis is a bone disease that develops when bone mineral density an...
2881 sym R (1220 sym/9 pcs) 7 img
DATA252: Variable Selection and Regularization
Learning Objectives In this lesson students will learn how to implement… Best Subset Selection Stepwise Selection (Backward and Forward) Ridge Regression LASSO Regression Ex: Health and Biostatistics Use the following data for these examples: library(tidyverse) #install.packages("faraway") library(faraway) set.seed(1212) data("fat") head(fat)...
1534 sym R (16289 sym/50 pcs) 6 img
BONUS: Time Series
Learning Objectives In this lesson students will learn how the basics of time series… What is a time series? Why and when should we use time series? How do you decompose a time series? Motivation Last class we saw how polynomial regression was used for model COVID-19 data in 2020; however, this model is inherently flawed because it assumes inde...
4062 sym R (9599 sym/62 pcs) 14 img
DATA252: Beyond Linearity
Learning Objectives In this lesson students will learn how to… Fit non-linear models (polynomial, gam, and loess) Choose the appropriate amount of curvature using p-values Choose hyper parameters via testing and training 0. Import the Data During the COVID-19 Pandemic several researchers published work on estimating/forecasting the number of ca...
4773 sym R (18968 sym/77 pcs) 30 img
DATA252: Multiple Linear Regression
Learning Objectives In this lesson students will learn how to… Fit a multiple linear regression model Create graphics to explore relationships between two variables Engineer features 0. Import the Data These data for `insurance’ charges come from the “US Health Insurance Dataset” on Kaggle. Souce: https://www.kaggle.com/datasets/teertha...
6406 sym R (13374 sym/89 pcs) 14 img
DATA252: Model Fitting
Learning Objectives Students will learn how to use R to generate random variables and become acquainted with the trade-offs inherent in models building. Generating Random Variables Normal Distribution ## using functions built into R ## generating random variables ?rnorm() ## parameters # mean mu<-0 # sd sigma<-1 ## arguments # n rnorm(n=50, m...
2486 sym R (5925 sym/26 pcs) 10 img
Linear Model Optimization
Shiny applications not supported in static R Markdown documents ...
67 sym
Processing Data in Tidyverse
Learning Objectives Students will learn how to work with real data to prepare it to perform machine learning functions using the tidyverse. Importing Data ### use raw file from github laptop_price <- read.csv("https://raw.githubusercontent.com/kitadasmalley/DATA252/main/Data/laptop_price.csv") Looking at Data Structure ## LOOK AT THE DATA ### s...
374 sym R (9918 sym/42 pcs) 8 img
DATA252: Laptop SLR
Learning Objectives In this lesson students will review the main concepts of Simple Linear Regression. How to fit a model Graphics a model Checking model fit and diagnostics Model inference Import the Data These laptop price data come from Kaggle. ### use raw file from github laptop_price <- read.csv("https://raw.githubusercontent.com/kitadasma...
3727 sym R (5452 sym/53 pcs) 15 img