Publications by Peter

DATA622-Final

18.12.2023

Overview For this project I will be using the Heart Disease dataset from UCI [https://archive.ics.uci.edu/dataset/45/heart+disease] and it contains the Cleveland heart disease information. The dataset has 303 instances and 14 features and I shall use this dataset in my project to predict the presence of heart disease given the patient health inform...

7987 sym R (10099 sym/42 pcs) 6 img 2 tbl

DATA622-Homework3

05.12.2023

Perform an analysis of the dataset(s) used in Homework #2 using the SVM algorithm. Compare the results with the results from previous homework. Read the following articles: https://www.hindawi.com/journals/complexity/2021/5550344/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137961/ Search for academic content (at least 3 articles) that compar...

5927 sym R (10688 sym/40 pcs) 1 img 1 tbl

DATA622-Homework2

06.11.2023

Decision Trees Algorithms Pre-work Read this blog: https://decizone.com/blog/the-good-the-bad-the-ugly-of-using-decision-trees which shows some of the issues with decision trees Choose a dataset from a source in Assignment #1, or another dataset of your choice Assignment work Based on the latest topics presented, choose a dataset of your choice...

3380 sym R (10518 sym/39 pcs) 5 img

DATA622-Homework1

12.10.2023

Exploratory analysis and essay Pre-work Visit the following website and explore the range of sizes of this dataset (from 100 to 5 million records): https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/ or (new) https://www.kaggle.com/datasets Select 2 files to download Based on your computer’s capabilities (...

4027 sym R (28005 sym/55 pcs) 2 img

608-Homework-01

09.09.2022

Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...

1481 sym R (7085 sym/26 pcs) 4 img

data621-assignment2

07.10.2022

Question 1 Download the classification output data set (attached in Blackboard to the assignment). url <- "https://raw.githubusercontent.com/petferns/DATA621/main/classification-output-data.csv" data <- read.csv(url) head(data) ## pregnant glucose diastolic skinfold insulin bmi pedigree age class ## 1 7 124 70 33 ...

3580 sym 2 img 2 tbl

DATA 621 - Homework 3

07.11.2022

Overview In this homework assignment, you will explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not (0). Your objective is to build a binary logistic regression model on...

4940 sym 6 img 3 tbl

DATA621-Homework-04

21.11.2022

Overview In this homework assignment, you will explore, analyze and model a data set containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first response variable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the ...

4367 sym 10 img 2 tbl

DATA621-Final-Project

06.12.2022

Introduction For this project I will be using Student performance dataset located at UCI Machine Learning Repository. The repository has 2 datasets one for Mathematics student-mat.csv and the other one for Portuguese language student-por.csv. In this project we will use these datasets and create models to predict the grades in mathematics and por...

6717 sym 8 img 2 tbl

DATA608-Final-Project

13.12.2022

Introduction For this project of DATA 608 I will be using the kaggle dataset beer_reviews.csv. The dataset has 1.5 million beer reviews , with ratings for appearance, aroma, palate, taste, and overall impression. Objective What are the factors that makes a beer favorite among the beer drinkers? How does the features like - beer taste, aroma, app...

3142 sym Python (14171 sym/46 pcs) 4 img 1 tbl