Publications by Peter
DATA622-Final
Overview For this project I will be using the Heart Disease dataset from UCI [https://archive.ics.uci.edu/dataset/45/heart+disease] and it contains the Cleveland heart disease information. The dataset has 303 instances and 14 features and I shall use this dataset in my project to predict the presence of heart disease given the patient health inform...
7987 sym R (10099 sym/42 pcs) 6 img 2 tbl
DATA622-Homework3
Perform an analysis of the dataset(s) used in Homework #2 using the SVM algorithm. Compare the results with the results from previous homework. Read the following articles: https://www.hindawi.com/journals/complexity/2021/5550344/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8137961/ Search for academic content (at least 3 articles) that compar...
5927 sym R (10688 sym/40 pcs) 1 img 1 tbl
DATA622-Homework2
Decision Trees Algorithms Pre-work Read this blog: https://decizone.com/blog/the-good-the-bad-the-ugly-of-using-decision-trees which shows some of the issues with decision trees Choose a dataset from a source in Assignment #1, or another dataset of your choice Assignment work Based on the latest topics presented, choose a dataset of your choice...
3380 sym R (10518 sym/39 pcs) 5 img
DATA622-Homework1
Exploratory analysis and essay Pre-work Visit the following website and explore the range of sizes of this dataset (from 100 to 5 million records): https://excelbianalytics.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/ or (new) https://www.kaggle.com/datasets Select 2 files to download Based on your computer’s capabilities (...
4027 sym R (28005 sym/55 pcs) 2 img
608-Homework-01
Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header= TRUE) And lets preview this...
1481 sym R (7085 sym/26 pcs) 4 img
data621-assignment2
Question 1 Download the classification output data set (attached in Blackboard to the assignment). url <- "https://raw.githubusercontent.com/petferns/DATA621/main/classification-output-data.csv" data <- read.csv(url) head(data) ## pregnant glucose diastolic skinfold insulin bmi pedigree age class ## 1 7 124 70 33 ...
3580 sym 2 img 2 tbl
DATA 621 - Homework 3
Overview In this homework assignment, you will explore, analyze and model a data set containing information on crime for various neighborhoods of a major city. Each record has a response variable indicating whether or not the crime rate is above the median crime rate (1) or not (0). Your objective is to build a binary logistic regression model on...
4940 sym 6 img 3 tbl
DATA621-Homework-04
Overview In this homework assignment, you will explore, analyze and model a data set containing approximately 8000 records representing a customer at an auto insurance company. Each record has two response variables. The first response variable, TARGET_FLAG, is a 1 or a 0. A “1” means that the person was in a car crash. A zero means that the ...
4367 sym 10 img 2 tbl
DATA621-Final-Project
Introduction For this project I will be using Student performance dataset located at UCI Machine Learning Repository. The repository has 2 datasets one for Mathematics student-mat.csv and the other one for Portuguese language student-por.csv. In this project we will use these datasets and create models to predict the grades in mathematics and por...
6717 sym 8 img 2 tbl
DATA608-Final-Project
Introduction For this project of DATA 608 I will be using the kaggle dataset beer_reviews.csv. The dataset has 1.5 million beer reviews , with ratings for appearance, aroma, palate, taste, and overall impression. Objective What are the factors that makes a beer favorite among the beer drinkers? How does the features like - beer taste, aroma, app...
3142 sym Python (14171 sym/46 pcs) 4 img 1 tbl