Publications by Joe Connolly

Tidyverse Extend

26.04.2021

library(tidyverse) Introduction   The tidyverse contains a collection of data science packages that work together in harmony to accomplish various goals. This vignette will demonstrate several ways to make full use of their combined capability. The Data   For this demonstration, we will use a dataset that is included with dpylr itself. It c...

3453 sym R (3178 sym/23 pcs) 2 img

Data 606, lab #8

19.04.2021

The Human Freedom Index is a report that attempts to summarize the idea of “freedom” through a bunch of different variables for many countries around the globe. It serves as a rough objective measure for the relationships between the different types of freedom - whether it’s political, religious, economical or personal freedom - and other s...

11283 sym R (20965 sym/29 pcs) 7 img

Data 606 HW #8

19.04.2021

Nutrition at Starbucks, Part I. (8.22, p. 326) The scatterplot below shows the relationship between the number of calories and amount of carbohydrates (in grams) Starbucks food menu items contain. Since Starbucks only lists the number of calories on the display items, we are interested in predicting the amount of carbs a menu item has based on i...

6456 sym R (541 sym/15 pcs) 10 img

Data 607: Hinge Reccommendation Systems

22.04.2021

Your task is to analyze an existing recommender system that you find interesting. You should: Perform a Scenario Design analysis as described below. Consider whether it makes sense for your selected recommender system to perform scenario design twice, once for the organization (e.g. Amazon.com) and once for the organization’s customers. Attemp...

3505 sym

Data 607 Project 4

03.05.2021

library(RTextTools) ## Loading required package: SparseM ## ## Attaching package: 'SparseM' ## The following object is masked from 'package:base': ## ## backsolve library(tm) ## Loading required package: NLP library(rlist) Creating Corpora hamsrc <- DirSource("C:/Users/jmcon/OneDrive/Desktop/spam_ham/easy_ham") hamcrp <- VCorpus(hamsrc...

1094 sym R (3581 sym/30 pcs)

Predicting Healthcare costs of Southwestern US Residents

15.11.2021

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term, one dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not? library(tidyverse) ## -- Attaching package...

2016 sym R (4683 sym/24 pcs) 4 img

Data 605: pg. 389, C27

10.09.2021

C12: For the matrix, A, the characteristic polynomial (CP) is \((x+2)(x-2^2)(x-4)\) Find the eigenvalues and corresponding eigenspaces of A. library('pracma') A = matrix(c(0,-2,-2,-2,4,6,8,8,-1,-1,-1,-3,1,1,-1,1),nrow = 4, ncol = 4) A ## [,1] [,2] [,3] [,4] ## [1,] 0 4 -1 1 ## [2,] -2 6 -1 1 ## [3,] -2 8 -1 ...

453 sym R (1893 sym/16 pcs)

Exponential Distribution: Continuous Conditional Probability

01.10.2021

PAge 172, #2 A radioactive particle emits alpha-particles at a rate described by f(t) = 0.1 * exp(-0.1t). Find the probability a particle is emitted within the first 10 seconds given…. No particle is emitted in the first second P(t <= 10 | t < 1) => P(1 <= t <= 10) func_a <- function(t){ 0.1*exp(-0.1*t) } prob_a = integrate(func_a, lowe...

668 sym R (986 sym/16 pcs)

622 HW 2

04.04.2022

Task Based on the latest topics presented, bring a dataset of your choice and create a Decision Tree where you can solve a classification or regression problem and predict the outcome of a particular feature or detail of the data used. Switch variables to generate 2 decision trees and compare the results. Create a random forest for regression...

2493 sym R (8820 sym/36 pcs) 4 img

The Curse of Dimensionality

07.03.2022

This so called curse is essentially an exponential linear relationship between the feature space and the number of configurations; as the dimensions of the feature space increases, the number of configurations grows exponentially, and therefore the number of observations of configurations decreases. This makes it difficult for data scientists to ...

2899 sym