Publications by Alexander Simon

DATA605 Final Exam

21.12.2024

Setup I read the synthetic retail data into a dataframe and performed some basic checks on data types, duplicates, and missing data. url <- 'https://raw.githubusercontent.com/alexandersimon1/Data605/refs/heads/main/synthetic_retail_data.csv' retail_df <- read_csv(url, show_col_types = FALSE) glimpse(retail_df) ## Rows: 200 ## Columns: 6 ## $ Produc...

23047 sym Python (20463 sym/149 pcs) 19 img 1 tbl

DATA604 Assignment 3

23.06.2024

1. Introduction The overall goal of this assignment was to use Mockaroo, a mock data generator, to create mock (aka fake or synthetic) data that resembles a real dataset. 2. Real data I selected a dataset of average SAT exam scores from New York City (NYC) schools in 2010 that is publicly available on NYC OpenData, in part because it was less than...

9133 sym 12 img 10 tbl

DATA607 Final Project Presentation

08.05.2024

class: title-slide background-image: url("data:image/png;base64,#https://github.com/alexandersimon1/Data607/blob/main/Project_Final/background-020.jpg?raw=true") ## DATA607 Final Project ### Creation and comparison of movie recommender models with the recommenderlab R package Alexander Simon 2024-05-08 --- ## Project goals - My overall goal ...

13325 sym

DATA607 Final Project

06.05.2024

1. Introduction 1.1. Project aim My overall goal was to learn more about recommender systems by obtaining, tidying, and exploring a ratings dataset, creating a recommender model, and comparing the performance of different recommender algorithms using tools in the recommenderlab R package. 1.2. Challenges and pivots 1.2.1. Finding data I had orig...

19658 sym Python (17104 sym/85 pcs) 14 img 6 tbl

DATA607 Project 4

29.04.2024

0. Packages I used the tm package to create document-term matrices and the caret package to perform supervised machine learning (SML) for text classification. If needed, you can install them using the commands below. install.packages("caret") install.packages("tm") 1. Introduction Document classification is the process of assigning documents to on...

16087 sym Python (41767 sym/92 pcs) 10 img

DATA607 Assignment11

07.04.2024

Introduction A recommendation system is a type of artificial intelligence algorithm that uses information about users’ interests and history of consumption (material or digital goods) to suggest new items of interest. This type of system is most useful when the total number of items is too large for users to find by other means, such as browsing ...

7541 sym 1 img

DATA607 Assignment 10

01.04.2024

0. Packages In addition to the packages used in the textbook portion of this assignment, I used the SentimentAnalysis, RColorBrewer, maps, and plotly packages. If needed, you can install them using the command(s) below. install.packages("SentimentAnalysis") install.packages("RColorBrewer") install.packages("maps") install.packages("plotly") 1. Int...

16097 sym Python (33412 sym/93 pcs) 11 img

DATA607 Data Science in Context

28.03.2024

Aim Illustrate the number of human genomes sequenced in major genomics projects since 2003. This plot was part of my Data Science in Context presentation. Data Data were obtained from https://www.yourgenome.org/theme/timeline-history-of-genomics/ and allofus.nih.gov/about/program-overview/what-makes-all-us-different. Since there were only a few da...

417 sym 1 img

Assignment 9

23.03.2024

1. Introduction The New York Times offers APIs to get news articles programmatically. Here, I demonstrate the use of the Top Stories API to retrieve articles that are on the homepage of the newspaper and convert the data to a dataframe. 2. Data I created an account to get an API key but don’t show it because RPubs is public. The key is included ...

3151 sym

DATA607 TidyVerse CREATE Assignment

18.03.2024

Introduction Tidyverse is a collection of R packages that provide useful functions for common tasks in data science, including data import, tidying, manipulation, visualization, and programming. Packages for each of these tasks include: Import: readr for reading files in various formats Tidy: tidyr for tidying data Transformation: dplyr and its r...

2774 sym Python (4171 sym/7 pcs) 1 img 4 tbl