Publications by KoohPi
DATA606 - Breast Cancer Survival Rate Estimate
Data Preparation In this project, I have chosen to work on breast cancer. There are various resources available on this topic, with the Surveillance, Epidemiology, and End Results (SEER) [1] program being the most reliable one. The SEER Program of the National Cancer Institute (NCI) collects and publishes cancer data through a coordinated syst...
19434 sym R (119398 sym/125 pcs) 47 img 25 tbl
DATA606 Final Project
Data Preparation In this project, I have chosen to work on breast cancer. There are various resources available on this topic, with the Surveillance, Epidemiology, and End Results (SEER) [1] program being the most reliable one. The SEER Program of the National Cancer Institute (NCI) collects and publishes cancer data through a coordinated syst...
19428 sym R (118828 sym/120 pcs) 46 img 25 tbl
Breast Cancer Survival Rate With SEER
Data Preparation In this project, I have chosen to work on breast cancer. There are various resources available on this topic, with the Surveillance, Epidemiology, and End Results (SEER) [1] program being the most reliable one. The SEER Program of the National Cancer Institute (NCI) collects and publishes cancer data through a coordinated syst...
21644 sym R (124383 sym/150 pcs) 49 img 24 tbl
DATA 607 - SPAM/HAM email classification
Intro The goal of this project is to work with a database to identify spam emails. Being able to classify new “test” documents using already classified “training” documents is crucial. A common scenario involves using a corpus of labeled spam and ham (non-spam) emails to predict whether a new document is spam or not. For this project, ...
11739 sym Python (25870 sym/66 pcs) 9 img 7 tbl
DATA607 11th Week
Intro This assignment is to find an interesting recommender system and analyze it. I have chosen to work on Goodreads. What is Goodreads (WIKI): Goodreads is the world’s largest site for readers and book recommendations. It was launched in January 2007 and later acquired by Amazon in 2013. The platform is designed to help people find and sha...
10973 sym 6 img
DATA606 Project Intro
Data Preparation In this project, I have chosen to work on breast cancer. There are various resources available regarding this particular topic, with the SEER being the most reliable one. The Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute (NCI) collects and publishes cancer data through a coordinated...
7934 sym 4 img 8 tbl
DATA607_3rd Project_Final
Teamwork By: Team<- c (Anthony C., James N., Koohyar P., Victor T.) Introduction (AC) While the presidential election season is in full swing, we decided to explore polling data sources that exist online. There are several individual sources that could be found; however, the website RealClear Politics is a location that gathers, summarizes, and...
8532 sym Python (5155 sym/18 pcs) 5 img 6 tbl
DATA607_3rd Project
Introduction While the presidential election season is in full swing, we decided to explore polling data sources that exist online. There are several individual sources that could be found online; however, the website RealClear Politics is a location that gathers, summarizes, and presents the results of the various polls in one location. It sho...
5866 sym Python (5574 sym/28 pcs) 5 img 6 tbl
DATA607 9th Week Assignment
Introduction The goal of this week’s assignment is to work with APIs. We will work with the New York Times web site rich set of APIs, as described here: New York Times APIs. I first needt oestablish a secure way of working by signing up for an API key. My next task is as follow to choose one of the New York Times APIs, construct an interface ...
1885 sym Python (4335 sym/25 pcs) 16 img 2 tbl
DATA607 7th Week Assignment
Introduction The goal of this week’s assignment is to work with HTML, XML, and JSON files. In this process, I have selected three of my favorite books from Amazon and created different files for each. For each book, I have included the title, authors, language, version, publisher, links, and summary. I have created these files myself while le...
3512 sym R (9645 sym/36 pcs) 5 tbl