Publications by Joe Connolly
Tidyverse Extend
library(tidyverse) Introduction The tidyverse contains a collection of data science packages that work together in harmony to accomplish various goals. This vignette will demonstrate several ways to make full use of their combined capability. The Data For this demonstration, we will use a dataset that is included with dpylr itself. It c...
3453 sym R (3178 sym/23 pcs) 2 img
Data 606, lab #8
The Human Freedom Index is a report that attempts to summarize the idea of “freedom” through a bunch of different variables for many countries around the globe. It serves as a rough objective measure for the relationships between the different types of freedom - whether it’s political, religious, economical or personal freedom - and other s...
11283 sym R (20965 sym/29 pcs) 7 img
Data 606 HW #8
Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on i...
6456 sym R (541 sym/15 pcs) 10 img
Data 607: Hinge Reccommendation Systems
Your task is to analyze an existing recommender system that you find interesting. You should: Perform a Scenario Design analysis as described below. Consider whether it makes sense for your selected recommender system to perform scenario design twice, once for the organization (e.g. Amazon.com) and once for the organization’s customers. Attemp...
3505 sym
Data 607 Project 4
library(RTextTools) ## Loading required package: SparseM ## ## Attaching package: 'SparseM' ## The following object is masked from 'package:base': ## ## backsolve library(tm) ## Loading required package: NLP library(rlist) Creating Corpora hamsrc <- DirSource("C:/Users/jmcon/OneDrive/Desktop/spam_ham/easy_ham") hamcrp <- VCorpus(hamsrc...
1094 sym R (3581 sym/30 pcs)
Predicting Healthcare costs of Southwestern US Residents
Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not? library(tidyverse) ## -- Attaching package...
2016 sym R (4683 sym/24 pcs) 4 img
Data 605: pg. 389, C27
C12: For the matrix, A, the characteristic polynomial (CP) is \((x+2)(x-2^2)(x-4)\) Find the eigenvalues and corresponding eigenspaces of A. library('pracma') A = matrix(c(0,-2,-2,-2,4,6,8,8,-1,-1,-1,-3,1,1,-1,1),nrow = 4, ncol = 4) A ## [,1] [,2] [,3] [,4] ## [1,] 0 4 -1 1 ## [2,] -2 6 -1 1 ## [3,] -2 8 -1 ...
453 sym R (1893 sym/16 pcs)
Exponential Distribution: Continuous Conditional Probability
PAge 172, #2 A radioactive particle emits alpha-particles at a rate described by f(t) = 0.1 * exp(-0.1t). Find the probability a particle is emitted within the first 10 seconds given…. No particle is emitted in the first second P(t <= 10 | t < 1) => P(1 <= t <= 10) func_a <- function(t){ 0.1*exp(-0.1*t) } prob_a = integrate(func_a, lowe...
668 sym R (986 sym/16 pcs)
622 HW 2
Task Based on the latest topics presented, bring a dataset of your choice and create a Decision Tree where you can solve a classification or regression problem and predict the outcome of a particular feature or detail of the data used. Switch variables to generate 2 decision trees and compare the results. Create a random forest for regression...
2493 sym R (8820 sym/36 pcs) 4 img
The Curse of Dimensionality
This so called curse is essentially an exponential linear relationship between the feature space and the number of configurations; as the dimensions of the feature space increases, the number of configurations grows exponentially, and therefore the number of observations of configurations decreases. This makes it difficult for data scientists to ...
2899 sym