Publications by Michael Munguia
DATA 607 - Assignment # 3
Task 1: Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset, provide code that identifies the majors that contain either “DATA” or “STATISTICS” majors <- suppressMessages(read_csv( "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-list.csv")) (data_stat_majors <- majors %>% ...
2668 sym R (1197 sym/4 pcs)
DATA 607 - Week 5 Assignment
Goal-Setting The stated goal in this assignment is to take source data from a CSV file, as seen below, and transform it into a tidy form ready for analysis. In particular, we need to be able to compare delay times for both airlines listed. Loading Our first step is to load the data, which I do using readr::read_csv and we immediately get a warni...
5857 sym R (6632 sym/21 pcs) 4 img
DATA 607 - Week 12 Assignment
Introduction I find Spotify’s recommendation system(s) very interesting in that the experience of interacting with them somehow does not feel extremely artificial. For years, my main means of listening to new music online are a subculture of specialty YouTube videos that are just people filming vinyls from different genres, times and places. Be...
3434 sym
DATA 607 - Project 4
Introduction For this project, I used the data set offered here https://archive.ics.uci.edu/ml/datasets/spambase at the UCI Machine Learning Repository. This is an interesting case because it’s personalized to George Forman’s from back in 1999. The basic preparation for the data is term frequency, personalized from his own filed work and the ...
2113 sym R (606 sym/11 pcs)
Final Project Presentation
Overview Basic Concept Is it possible to link musical structure and sentiment score to construct feel-good playlists? Data Used Spotify API Web-scraped lyrics Sentiment lexicons Insights Challenges Authorization get_token <- function(id, secret) { response <- httr::POST("https://accounts.spotify.com/api/token", body =...
378 sym R (718 sym/2 pcs) 1 img
Data 605: Week 5 Discussion
Based on exercise 6 for Chapter 1.1: In Las Vegas, a roulette wheel has 38 slots numbered 0, 00, 1, 2, . . . , 36. The 0 and 00 slots are green and half of the remaining 36 slots are red and half are black. A croupier spins the wheel and throws in an ivory ball. If you bet 1 dollar on red, you win 1 dollar if the ball stops in a red slot and oth...
492 sym R (408 sym/3 pcs)
Data 605 Final
Presentation Here is the video for my accompanying presentation on YouTube. Problem 1 Using R, generate a random variable \(X\) that has 10,000 random uniform numbers from 1 to \(N\), where \(N\) can be any number of your choosing greater than or equal to 6. Then generate a random variable \(Y\) that has 10,000 random normal numbers with a mean ...
7492 sym R (12302 sym/44 pcs) 7 img 1 tbl
DATA 608: Assignment #1
Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.githubusercontent.com/charleyferrari/CUNY_DATA_608/master/module1/Data/inc5000_data.csv", header = TRUE) And lets preview thi...
2974 sym R (10661 sym/25 pcs) 3 img
DATA 621 Blog Post: Even More World-Building
Introduction After my recent post, I spent some time thinking about one of the main limitations I found with the initial approach - that is, the lack of time or seasonality as a predictor. After all, cyclical seasonal events are one of the main intuitive means by which we predict the weather. For those of us in the North Eastern United States, we...
2905 sym R (1870 sym/5 pcs) 2 img
Data 621 Blog Post: More World-Building
Introduction Based on my last entry, I’ve shopped around for some test weather data. In fact, after some looking I found this data set containing a few years of Austin, TX’s weather. Rather than spend a lot of time going through EDA here, let’s instead agree that I did some and that it was the rational for the way I read and processed the d...
1902 sym R (1240 sym/2 pcs) 1 img