Publications by Cameron Smith
Data 607 - Final Project
Introduction This project is focused on analyzing the relationship, if any, between quality of packaged ramen and where it is manufactured. I will start by looking at where the best ramen is made based on the average review score for each country, expanding and comparing that analysis from country to regions and assessing via a T-test whether the...
3996 sym R (13313 sym/32 pcs) 6 img
Data 607 - Project 4
Intro Objective This assignment is focused on classification, and in particular classifying email messages as either ‘spam’ or ‘ham’ (i.e. not spam). Approach I approached the assignment via the following process: Download and extract the spam/ham data from the internet Import the data into R Create a corpus Clean up the data Create a...
4366 sym R (5033 sym/21 pcs)
Data 607 - Week 11 Homework
Introduction This assignment is focused on recommender systems. Per the assignment’s instructions, our task is to: Identify a recommender system web site, then Answer the three scenario design questions for this web site. Attempt to ‘reverse engineer’ the site The 3 scenario design questions are: Who are your target users? What are thei...
3652 sym
Data 607 - Homework 10
Description This assignment is focused on sentiment analysis and is uses code examples from the following book: Silge, J. and Robinson, D. (2020). Text Mining with R: A Tidy Approach. Retrieved from https://www.tidytextmining.com. Overview of Approach Per the assignment’s instructions I have focused the first part of this assignment on running...
2296 sym R (13015 sym/81 pcs) 9 img
Data 606 - Lab 9
Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...
14652 sym R (10766 sym/41 pcs) 19 img
Data 606 - Chapter 9 Homework
Question 1 Baby weights, Part I. (9.1, p. 350) The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable...
7646 sym R (1183 sym/8 pcs) 2 img
Data 606 - Final Project
Introduction This is the final project for Data 606, the objective of which is to conduct a reproducible analysis of my own choosing. I chose to use data focused on subjective happiness, as described in more detail below. Research Question As someone about to enter the world of parenthood I thought it would be interesting to look at the potentia...
7125 sym R (11423 sym/36 pcs) 5 img
Data 605 - Week 8 Discussion - CS
Chapter 7, Exercise 15 Suppose we want to test a coin for fairness. We flip the coin n times and record the number of times X0 that the coin turns up tails and the number of times X1 = n − X0 that the coin turns up heads. Now we set $$ Z = ^1_{i=0} $$ Then for a fair coin Z has approximately a chi-squared distribution with 2 − 1 = 1 degree o...
983 sym R (1138 sym/3 pcs) 1 img
Data 605 - Week 9 Discussion - CS
Chapter 9.1, Exercise 2 (Page 338) Exercise 2: Let \(S_{200}\) be the number of heads that turn up in 200 tosses of a fair coin. Estimate: \(P(S_{200} =100)\) \(P(S_{200} = 90)\) \(P(S_{200} = 80)\) Answer: Since there are only two possible outcomes (heads and tails) this is a Bernoulli trial. Therefore, the following formula applies: \[ \bino...
652 sym R (747 sym/3 pcs) 1 img