Publications by David Blumenstiel
Data 605 Assignment 13
Assignment 13 David Blumenstiel 11/17/2020 1. Use integration by substitution to solve the integral below. \(\int 4e^{-7x}dx\) Substitution here involves swapping out the -7x term for u, leaving us with \(\int 4e^u\). Now, if \(u = -7x\) then \(-du/7 = dx\). Now, writing things in terms of u and moving the 4 outside to simplify it, we have: \(-4...
5904 sym R (79 sym/1 pcs) 1 img
Data 605 Assignment 15
1: Find the equation of the regression line for the given points. Round any final values to the nearest hundredth, if necessary. ( 5.6, 8.8 ), ( 6.3, 12.4 ), ( 7, 14.8 ), ( 7.7, 18.2 ), ( 8.4, 20.8 ) R can do this one for me. X <- c(5.6, 6.3, 7, 7.7, 8.4) y <- c(8.8, 12.4, 14.8, 18.2, 20.8) model <- lm(y~X) plot(X,y) abline(model$coefficient...
5836 sym R (216 sym/3 pcs) 1 img
Data 608 Assignment 1
Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...
3458 sym R (7646 sym/21 pcs) 3 img
DATA 621 Homework 1 In Progress
library(knitr) library(tidyverse) ## -- Attaching packages -------------------------------------------------------------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.3.0 v purrr 0.3.3 ## v tibble 2.1.3 v dplyr 0.8.4 ## v tidyr 1.0.2 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.5.0 ## -- Conflicts -...
1062 sym R (1313748 sym/54 pcs) 26 img
Principal Component Analysis
Principal Component Analysis (PCA) is a method by which we can reduce the number of variables in a data set to down fewer variables which contain most of the information the original variables had; i.e., the principal components. It’s often nice to have a simpler data set, although this may reduce the accuracy of the model. So, it’s basically...
2016 sym
GLM Blog
In contract a regular linear model where the response variable is assumed to be normally distributed, a generalized linear model (GLM) makes no such assumption. This allows GLMs to predict response variables with different distributions, e.g. Poisson, multinomial, etc. GLMs are composed of three different parts: a linear predictor (\(ax_a + bx_b...
1941 sym
Data 621 Final Project In Progress 1
Data prep library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 library(tidyr) library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, ...
2363 sym R (19547 sym/88 pcs) 51 img
DATA 621 LASSO Blog
Often considered a better alternative to stepwise-regression, Least Absolute Shrinkage and Selection Operator regression (LASSO for short) is a regularization technique for regression models that can reduce the amount of variables in a model. This works for various types of regression (logistic, linear, count, etc.). LASSO penalizes the model bas...
1986 sym R (3301 sym/5 pcs) 1 img
Ridge Regression Blog
Ridge regression is a particularly effective way of dealing with multicollinearity in multiple regression. Similar to L1 regularization in LASSO regression, Ridge uses an L2 penalty, which penalizes the the model based on the square of the magnitude of the coefficients. This incentivizes models with small coefficients, and has the effect of shrin...
1236 sym
Interaction Terms
When performing regression, one might wish to model the effect of an independent variable on the response variable as dependent upon another independent variable. In other words, an interaction term is two or more independent variables strapped together sharing the same coefficient. A ‘normal’ regression equation might look like: \(y = a + bx...
1097 sym