Publications by Rachel Mariman
Data 101: Homework 7
Question #1 We are interested in doing some analysis of a dataset of babynames, beginning by analyzing the names of children born in 2017. We start by filtering our data to include only names from the year 2017. names17 <- babynames %>% filter(year %in% 2017) %>% pull(name) head(names17) ## [1] "Emma" "Olivia" "Ava" "Isabella" "Sop...
4536 sym R (4913 sym/35 pcs) 3 img
Data 101: Homework 6
Question #1(a) First, load in the county_returns.csv, which contains the 2012 and 2016 presidential election returns by county. It also includes a variable for FIPS codes, which will help us later. Merge the county-level mapping data with this electoral data. Think carefully about how we should merge it! – First, we load in data on the 2012 ...
8287 sym R (10256 sym/12 pcs) 8 img
Data 101: Homework 3
Question #1 We analyzed the effect of carat, cut, and color on a diamond’s price. But what about the 4th c, clarity? How does clarity matter to a diamond’s price? How is clarity related to the other Cs? What would you conclude based on your analysis? If you need more information about clarity, you can consult https://www.americangemsociety.o...
5211 sym R (3007 sym/10 pcs) 6 img
Data 101: Homework 1
Question #1 Investigate the relationship between the number of cylinders () and highway fuel efficiency. Look at the variables, and decide which type of plot (scatterplot, line plot, boxplot, or bar chart) best summarizes their relationship. Comment on that relationship. HINT: you may need to use the as.factor(cyl) syntax in the graph (as we did...
2196 sym R (699 sym/4 pcs) 3 img
Data 101: Homework 2
Question #1 In the code above, we frequently used not_cancelled, rather than flights as our data. How did this simplify our code? Think especially about the functions we used within summarise(). – Using not_cancelled instead of flights allows us to drop an additional step of code in the summarize function when using e.g. mean(). Usually, you...
3202 sym R (3288 sym/9 pcs) 1 img
Data 101: Homework 4
Question #1 Solve problem 13.5.1.5 from the textbook. – Running anti_join on a merge of flights and airports shows us that there are several destination airports in the flights dataset that do not appear in the airports dataset. Specifically, BQN, SJU, and STT. There are also about 1300 airports in the airports dataset that do not appear in t...
4162 sym R (2842 sym/7 pcs) 1 img
Data 101: Homework 5
Question #1 Read the data into R and turn it into a tidy dataset. – For these questions, we’re going to use data on average tuition cost by state. Let’s first read our data into R. Next, let’s clean this data to make state, year, and tuition three separate columns. tuition <- read_excel("Data/Raw/us_avg_tuition.xlsx") head(tuition) tui...
5380 sym R (3392 sym/9 pcs) 3 img
Data 210: Homework 1
Problem 1: Exploring the gapminder data Question #1 Load in the gapminder data from the dslabs package using the data() function. data("gapminder") Question #2 Use the class function to determine what type of object the data is. – The object is a data frame. class(gapminder) ## [1] "data.frame" Question #3 Check the dimensions of the gapmind...
4642 sym R (9323 sym/53 pcs)
Data 210: Homework 5
Question #1 If you’ve ever taken a probability class you may have heard of the “birthday problem”. The problem demonstrates (perhaps counter-intuitively) that in a group of only 23 people the probability of at least two people sharing a birthday exceeds 50. In this question, we’re going to explore a variation of the birthday problem. Ulti...
9277 sym R (16823 sym/60 pcs) 2 img
Data 210: Homework 6
Data: Don’t forget that all data we provide you for this class can only be used for class purposes. Question #1 In many states, convicted felons are banned from voting while they serve their prison, parole, or probation sentence. In some of those states, they are able to have their voting rights restored. Even after voting rights are restored,...
10095 sym R (31838 sym/19 pcs)