Publications by Cameron Smith

Data 607 - Final Project

06.12.2020

Introduction This project is focused on analyzing the relationship, if any, between quality of packaged ramen and where it is manufactured. I will start by looking at where the best ramen is made based on the average review score for each country, expanding and comparing that analysis from country to regions and assessing via a T-test whether the...

3996 sym R (13313 sym/32 pcs) 6 img

Data 607 - Project 4

15.11.2020

Intro Objective This assignment is focused on classification, and in particular classifying email messages as either ‘spam’ or ‘ham’ (i.e. not spam). Approach I approached the assignment via the following process: Download and extract the spam/ham data from the internet Import the data into R Create a corpus Clean up the data Create a...

4366 sym R (5033 sym/21 pcs)

Data 607 - Week 11 Homework

04.11.2020

Introduction This assignment is focused on recommender systems. Per the assignment’s instructions, our task is to: Identify a recommender system web site, then Answer the three scenario design questions for this web site. Attempt to ‘reverse engineer’ the site The 3 scenario design questions are: Who are your target users? What are thei...

3652 sym

Data 607 - Homework 10

31.10.2020

Description This assignment is focused on sentiment analysis and is uses code examples from the following book: Silge, J. and Robinson, D. (2020). Text Mining with R: A Tidy Approach. Retrieved from https://www.tidytextmining.com. Overview of Approach Per the assignment’s instructions I have focused the first part of this assignment on running...

2296 sym R (13015 sym/81 pcs) 9 img

Data 606 - Lab 9

28.11.2020

Grading the professor Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related charac...

14652 sym R (10766 sym/41 pcs) 19 img

Data 606 - Chapter 9 Homework

29.11.2020

Question 1 Baby weights, Part I. (9.1, p. 350) The Child Health and Development Studies investigate a range of topics. One study considered all pregnancies between 1960 and 1967 among women in the Kaiser Foundation Health Plan in the San Francisco East Bay area. Here, we study the relationship between smoking and weight of the baby. The variable...

7646 sym R (1183 sym/8 pcs) 2 img

Data 606 - Final Project

02.12.2020

Introduction This is the final project for Data 606, the objective of which is to conduct a reproducible analysis of my own choosing. I chose to use data focused on subjective happiness, as described in more detail below. Research Question As someone about to enter the world of parenthood I thought it would be interesting to look at the potentia...

7125 sym R (11423 sym/36 pcs) 5 img

Data 605 - Week 8 Discussion - CS

13.10.2021

Chapter 7, Exercise 15 Suppose we want to test a coin for fairness. We flip the coin n times and record the number of times X0 that the coin turns up tails and the number of times X1 = n − X0 that the coin turns up heads. Now we set $$ Z = ^1_{i=0} $$ Then for a fair coin Z has approximately a chi-squared distribution with 2 − 1 = 1 degree o...

983 sym R (1138 sym/3 pcs) 1 img

Data 605 - Week 9 Discussion - CS

20.10.2021

Chapter 9.1, Exercise 2 (Page 338) Exercise 2: Let \(S_{200}\) be the number of heads that turn up in 200 tosses of a fair coin. Estimate: \(P(S_{200} =100)\) \(P(S_{200} = 90)\) \(P(S_{200} = 80)\) Answer: Since there are only two possible outcomes (heads and tails) this is a Bernoulli trial. Therefore, the following formula applies: \[ \bino...

652 sym R (747 sym/3 pcs) 1 img