Publications by Bonnie Cooper

Scraping Reddit Comments

19.03.2020

Text mining Reddit & Indeed for the most valued Data Science skills Abdellah Ait, Bonnie Cooper, Gehad Gad & David Moste Introduction In this project we worked as a team to gather text data to address the question, “Which are the most valued data science skills?” Our approach involved scraping data from two very different sources: The job-li...

13538 sym R (69161 sym/38 pcs) 8 img

Melting an R data.frame with dplyr & tidyr methods

09.03.2020

Elongating a Dataset using methods from dplyr & tidyr Setting the environment These are the R libraries we will need for this demo: library( magrittr ) library( dplyr ) library( tidyr ) library( ggplot2 ) Creating the data:     This demo will use data that summarizes Gender Education Attainment in a table with a double-layered header. This a...

3721 sym R (5893 sym/12 pcs) 4 img

Video Games Sales 2019

08.03.2020

Visualizing Video Game Genre Sales as a Function of Region or Year Setting the environment These are the R libraries we will need for this demo: library( magrittr ) library( dplyr ) library( tidyr ) library( ggplot2 ) Accessing the data This demo will utilize the Kaggle Video Game Sales 2019 dataset The data is openly available for the public to...

2475 sym R (6992 sym/18 pcs) 3 img

Chord Diagram Visualization of UN Migration Data

08.03.2020

Visualizing human migration patterns with chord diagrams in R using the circlize package Setting the environment These are the R libraries we will need for this demo: library( magrittr ) library( dplyr ) library( tidyr ) library( DataCombine ) library( circlize ) Accessing the data This demo will utilize the United Nations International Migratio...

2820 sym R (10396 sym/20 pcs) 3 img

Tidying & Transforming Data

01.03.2020

Clean Data in, Clear Results out.     The quality of results relies on the quality of the source data. Therefore, cleaning data is a necessary step before real data analysis can begin.      Hadley Wickham has outlined a standard ‘tidy’ organization for data where: Each variable forms a column Each observation forms a row Each type of...

8212 sym R (8566 sym/17 pcs) 7 img 1 tbl

Wrangling Text Data from a Chess Tournament

24.02.2020

Data Wrangling It is often necessary to transform raw data to a condensed and more useful format to facilitate downstream analysis. This process is often refered to as ‘Data Wrangling’ In this demo, we will wrangle data from a chess tournament. The raw data is given to us as a .txt file and our goal is to transform the text data into a much m...

5720 sym R (9830 sym/15 pcs) 7 img

DATA607: R Character Manipulation

17.02.2020

1. College Majors Dataset Using the 173 majors listed in fivethirtyeight.com’s College Majors dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/], provide code that identifies the majors that contain either “DATA” or “STATISTICS” #load the FiveThirtyEight data from 'majors-list.csv' to a data.f...

2499 sym R (12733 sym/31 pcs)

DATA607_Assigment1

02.02.2020

Introduction: What do men think it means to be a man? FiveThirtyEight asked > 1600 men whether they felt the #MeToo movement had changed their perception of masculinity. The study was an effort to gain insights into how #MeToo affects how men feel about being men. Important questions about male identity were raised: For example, participants were...

4760 sym R (5761 sym/12 pcs) 1 img

DATA606_Lab1

02.02.2020

Getting Started loading the data & getting a preview with the head() function source("more/present.R") head( present ) ## year boys girls ## 1 1940 1211684 1148715 ## 2 1941 1289734 1223693 ## 3 1942 1444365 1364631 ## 4 1943 1508959 1427901 ## 5 1944 1435301 1359499 ## 6 1945 1404587 1330869 Task 1 What years are included in this data s...

3975 sym R (5277 sym/31 pcs) 4 img

A Prussian Poisson Process

20.01.2020

Prussian Calvary on Parade Image source: picclick Demonstrating a Poisson distribution with a classic data set: the von Bortkeiwicz Prussian horse-kicking data     The Poisson process is a useful model applied to occurances where, while individual events occur randomly, thier regularity can be described by a consistent rate. In this R not...

15324 sym R (15701 sym/40 pcs) 8 img