Publications by Shane Hylton
DATA 607 Project 3
Project: What are the most important data science skills? Goals In this project, we seek to answer this question by collecting all of the words among a wide number of job descriptions. We will take all of the words and create word cloud plots to show which words occur most frequently in job descriptions. This should provide us with sufficient in...
2663 sym R (7861 sym/13 pcs) 5 img
DATA 607 Homework Week 9 -- APIs
NYT Bestselling Books API I chose to work with the New York Times Bestseller List. I loaded the data into R using the api-key I requested, then I collected the raw data from the JSON data. I then took the raw data, selected the results subsection, and created a dataframe based on books, where the data is stored. From there, I performed some minor...
832 sym R (2806 sym/2 pcs)
DATA 607 Homework Week 10 -- Sentiment Analysis
Example Code Citations: Example code was downloaded from here Robinson, Julia Silge and David. “2 Sentiment Analysis with Tidy Data: Text Mining with R.” 2 Sentiment Analysis with Tidy Data | Text Mining with R, https://www.tidytextmining.com/sentiment.html. Data Sets: Saif M. Mohammad and Peter Turney. (2013), ``Crowdsourcing a Word-Emotion...
4204 sym R (18925 sym/93 pcs) 15 img
DATA 607 Final Project Proposal
Sloan Digital Sky Survey Exploration Shane Hylton 11/14/2021 Proposal I have always found astronomy to be very inspiring. I began my college career as an astronomy major. Over the past few months, I have grown increasingly attracted to the idea of studying astronomy again. After searching for interesting astronomy datasets, I found the Sloan Dig...
1982 sym
Data Science In Context Notes
Data Science In Context Data Science In Context Automated Machine Learning Shane Hylton 11/21/2021 Automated Machine Learning What is Automated Machine Learning? Machine Learning: Improving algorithms and outputs through explicit instruction and experience. Supervised Machine Learning: Using labeled data to train the computer to predict l...
3798 sym
Data Science In Context Slides
Automated Machine Learning Shane Hylton 11/21/2021 Three Key Types of Machine Learning Supervised Machine Learning Unsupervised Machine Learning Automated Machine Learning Supervised Machine Learning User provides labeled data Computer analyzes the provided data to predict labels Hands on Unsupervised Machine Learning Raw, unlabeled data ...
1895 sym
607 Final Project Presentation
Exploring the Universe Shane Hylton 12/8/2021 Motivation and Data Source Big Data and Astronomy go hand in hand Sloan Digital Sky Survey SQL Based Search Goals Map the Universe Visualize the relationship between temperature and magnitude (brightness) Create a custom classification system Initial Steps Extensive Tidying and Trimming Custo...
2105 sym R (1067 sym/6 pcs) 9 img
606 Final Presentation
MLB Batting Analysis for 2021 Shane Hylton 12/9/2021 Goals Provide Relevant Summary Statistics Construct a Regression Model for the relationship between age and batting average Visualize the differences in batting efficiency for each position Construct a simulation to show which position is most likely to successfully record a hit Demonstrate...
2372 sym R (2765 sym/30 pcs) 11 img
DATA 605 Homework 2
Problem Set 1 Question 1: Show that \(A^TA \neq AA^T\) in general. Let A be a square 2x2 matrix. Let \(x_1, x_2, x_3, x_4\) be unique elements of A. Let \(A = \begin{bmatrix} x_1 & x_2 \\ x_3 & x_4 \end{bmatrix}\). Then \(A^T = \begin{bmatrix} x_1 & x_3 \\ x_2 & x_4 \end{bmatrix}\). From \(A\) and \(A^T\), a general example can be found. \(A^TA ...
3597 sym R (3193 sym/10 pcs)
DATA 605 Homework 1
Initials Creation Using rep in the x-axis and seq in the y-axis creates a vertical line segment. Each individual chunk in the x-axis has a complementary chunk in the y-axis. x <- c(rep(-1.5, 500), seq(-1.5,-0.5, length.out = 500), rep(-0.5, 500), seq(-1.5,-0.5, length.out = 500), seq(-1.5,-0.5, length.out = 500), seq(-1.5, -0.5, length.out = 500)...
514 sym R (1693 sym/8 pcs) 5 img