Publications by Mohammed Rahman
Predicting the PH Of Beverages
Introduction: As a data scientist tasked with developing a predictive model for pH regulation in manufacturing processes, the overarching objective is to leverage data-driven insights to enhance operational efficiency, ensure regulatory compliance, and optimize product quality. By harnessing advanced analytics and machine learning techniques, w...
20129 sym Python (28837 sym/49 pcs) 4 img 3 tbl
Walmart Data Analysis
Introduction For the final project I’m using the Walmart sales data from Kaggle that is scraped from the web to perform data and statistical analysis. Data Source: Walmart Dataset The data contains sales of different Walmart stores from 2010-02-05 to 2012-11-01. It has columns with store number, week of sales, sales for the given store, holi...
5357 sym R (5029 sym/18 pcs) 6 img 1 tbl
Project 4 - Document Classification
Project 4 - Document Classification Overview It can be useful to be able to classify new “test” documents using already classified “training” documents. A common example is using a corpus of labeled spam and ham (non-spam) e-mails to predict whether or not a new document is spam. For this project, you can start with a spam/ham dataset, ...
1582 sym Python (6352 sym/18 pcs) 3 img
Data-624 Homework 9
Exercise 8.1: Recreate the simulated data from Exercise 7.2: library(mlbench) set.seed(200) simulated <- mlbench.friedman1(200, sd = 1) simulated <- cbind(simulated$x, simulated$y) simulated <-as.data.frame(simulated) colnames(simulated)[ncol(simulated)] <- "y" (a): Fit a random forest model to all of the predictors, then estimate the variab...
6359 sym R (10186 sym/40 pcs) 2 img 2 tbl
Week 11: Recommender Systems
Introduction: Pandora is a leading music and podcast discovery platform, providing a highly-personalized listening experience to approximately 70 million users each month with its proprietary Music Genome Project® and Podcast Genome Project® technology - whether at home or on the go - through its mobile app, the web, and integration with more...
4737 sym
Data-624 Homework 8
Question 7.2: Friedman (1991) introduced several benchmark data sets create by simulation. One of these simulations used the following nonlinear equation to create data: \(y = 10 sin(\pi x_1x_2) + 20(x_3 - 0.5)^2 + 10x_4 + 5x_5 + N(0, \sigma^2)\) where the x values are random variables uniformly distributed between [0, 1] (there are also 5 othe...
2477 sym R (23504 sym/47 pcs) 6 img
Data-624 Homework 7
Exercise 6.2: Developing a model to predict permeability (see Sect. 1.4) could save significant resources for a pharmaceutical company, while at the same time more rapidly identifying molecules that have a sufficient permeability to become a drug: Start R and use these commands to load the data: library(AppliedPredictiveModeling) data(permeab...
5095 sym R (10070 sym/29 pcs) 4 img
Data 607 - TidyVerse Create
Description In this assignment, you’ll practice collaborating around a code project with GitHub. You could consider our collective work as building out a book of examples on how to use TidyVerse functions. GitHub repository: https://github.com/pkowalchuk/SPRING2024TIDYVERSE FiveThirtyEight.com datasets. Kaggle datasets. Your task here is to Cr...
1482 sym Python (7359 sym/13 pcs)
Data-624 Project 1
Part A – ATM Forecast We are asked to forecast how much cash is taken out of 4 different ATM machines for May 2010. We are given data in a single file with variable cash provided in hundreds of dollars. Explain and demonstrate you process, techniques used and not used and your actual forecast. # Load the dataset data <- read_excel("ATM624Data...
2095 sym Python (8397 sym/53 pcs) 18 img
Data 607 - Week 9
Introduction: I searched on the NY Times API website and I signed up into the website. I found the Books APIs and I chose the “Book Sellers History”. I retrieved info for the history to see the last few best sellers. ex<-GET("https://api.nytimes.com/svc/books/v3/lists/best-sellers/history.json?api-key=mP5gHH5A5oHbVq6PHAd2pAdv0BlS6s12") cat(...
534 sym 1 tbl