Publications by Bonnie Cooper
Scraping Reddit Comments
Text mining Reddit & Indeed for the most valued Data Science skills Abdellah Ait, Bonnie Cooper, Gehad Gad & David Moste Introduction In this project we worked as a team to gather text data to address the question, “Which are the most valued data science skills?” Our approach involved scraping data from two very different sources: The job-li...
13538 sym R (69161 sym/38 pcs) 8 img
Melting an R data.frame with dplyr & tidyr methods
Elongating a Dataset using methods from dplyr & tidyr Setting the environment These are the R libraries we will need for this demo: library( magrittr ) library( dplyr ) library( tidyr ) library( ggplot2 ) Creating the data: This demo will use data that summarizes Gender Education Attainment in a table with a double-layered header. This a...
3721 sym R (5893 sym/12 pcs) 4 img
Video Games Sales 2019
Visualizing Video Game Genre Sales as a Function of Region or Year Setting the environment These are the R libraries we will need for this demo: library( magrittr ) library( dplyr ) library( tidyr ) library( ggplot2 ) Accessing the data This demo will utilize the Kaggle Video Game Sales 2019 dataset The data is openly available for the public to...
2475 sym R (6992 sym/18 pcs) 3 img
Chord Diagram Visualization of UN Migration Data
Visualizing human migration patterns with chord diagrams in R using the circlize package Setting the environment These are the R libraries we will need for this demo: library( magrittr ) library( dplyr ) library( tidyr ) library( DataCombine ) library( circlize ) Accessing the data This demo will utilize the United Nations International Migratio...
2820 sym R (10396 sym/20 pcs) 3 img
Tidying & Transforming Data
Clean Data in, Clear Results out. The quality of results relies on the quality of the source data. Therefore, cleaning data is a necessary step before real data analysis can begin. Hadley Wickham has outlined a standard ‘tidy’ organization for data where: Each variable forms a column Each observation forms a row Each type of...
8212 sym R (8566 sym/17 pcs) 7 img 1 tbl
Wrangling Text Data from a Chess Tournament
Data Wrangling It is often necessary to transform raw data to a condensed and more useful format to facilitate downstream analysis. This process is often refered to as ‘Data Wrangling’ In this demo, we will wrangle data from a chess tournament. The raw data is given to us as a .txt file and our goal is to transform the text data into a much m...
5720 sym R (9830 sym/15 pcs) 7 img
DATA607: R Character Manipulation
1. College Majors Dataset Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS” #load the FiveThirtyEight data from 'majors-list.csv' to a data.f...
2499 sym R (12733 sym/31 pcs)
DATA607_Assigment1
Introduction: What do men think it means to be a man? FiveThirtyEight asked > 1600 men whether they felt the #MeToo movement had changed their perception of masculinity. The study was an effort to gain insights into how #MeToo affects how men feel about being men. Important questions about male identity were raised: For example, participants were...
4760 sym R (5761 sym/12 pcs) 1 img
DATA606_Lab1
Getting Started loading the data & getting a preview with the head() function source("more/present.R") head( present ) ## year boys girls ## 1 1940 1211684 1148715 ## 2 1941 1289734 1223693 ## 3 1942 1444365 1364631 ## 4 1943 1508959 1427901 ## 5 1944 1435301 1359499 ## 6 1945 1404587 1330869 Task 1 What years are included in this data s...
3975 sym R (5277 sym/31 pcs) 4 img
A Prussian Poisson Process
Prussian Calvary on Parade Image source: picclick Demonstrating a Poisson distribution with a classic data set: the von Bortkeiwicz Prussian horse-kicking data The Poisson process is a useful model applied to occurances where, while individual events occur randomly, thier regularity can be described by a consistent rate. In this R not...
15324 sym R (15701 sym/40 pcs) 8 img