Publications by Dr. Chelsey Hill
Linear Regression and Model Selection
Preliminary The output that follows was generated using the stats package in base R. In the lesson that follows, we use the training and testing sets of the CarSeats data, which we import as ‘CS_Train’ and ‘CS_Test’. CS_Train <- read.csv("CarSeats_Train.csv") CS_Test <- read.csv("CarSeats_Test.csv") Data Exploration We will train our li...
3447 sym
INFO 265: Data Integration (Joins)
Preliminary We will use the dplyr package, which contains small relational datasets for use in the examples. If you do not already have the dplyr package installed, you will first install the package using the install.packages() function. install.packages("dplyr") Once installed, we load the package for use in the R session using the library() ...
3927 sym R (1352 sym/14 pcs)
INFO 365: Hierarchical Cluster Analysis
Preliminary The output that follows was generated using the stat, caret, cluster, and factoextra packages in R. In the lesson that follows we use a data set containing car seat sales at 81 different stores. The car seat manufacturer would like to group stores for market segmentation and marketing purposes. The variables in the dataset include:...
8216 sym 9 img 6 tbl
INFO 365: Association Analysis
Preliminary The output that follows was generated using the arules package in R. In the lesson that follows, we use the Groceries (a data set from the arules package) which contains 1 month (30 days) of real-world point-of-sale transaction data from a local grocery outlet. The grocery store would like to find actionable, explainable and non-tr...
3695 sym 2 img 2 tbl
Sentiment Analysis (Lexicons)
Introduction Sentiment Analysis or Opinion Mining was introduced in the early 2000s as a method to understand and analyze opinions and feelings (Dave, Lawrence and Pennock 2003; Liu 2012; Nasukawa and Yi 2003). The objective is to explain sentiment based on term polarity (and in some cases intensity). Polarity is sometimes referred to as valen...
8050 sym R (11215 sym/53 pcs) 7 img
Topic Models (LDA, CTM, STM)
Introduction to Topic Models Topic Models are unsupervised methods of automatic organizing, understanding, searching and summarizing text documents (Blei 2012). Topic models are built based on term co-occurrence. Topic Models are mixed-membership models Every document is mixture of topics. Document 1 could be 30% Topic A, 20% Topic B, 40% Topi...
13207 sym R (5314 sym/32 pcs) 18 img
Working with Text Data
Some of the packages that we will use in R to work with text data include tm (preprocessing and transformation), textstem (lemmatization), wordcloud (visualization) and lexicon (sentiment lexicons). If you do not already have these packages installed, you will need to install them. install.packages(c("tm", "textstem", "wordcloud", "lexicon")) To ...
12180 sym R (15059 sym/78 pcs) 10 img
Deep Learning for Text Data
Introduction In this lesson, we will use tensorflow and keras for deep learning using neural networks. Tensorflow is an open-source library for machine learning in python. Keras is a neural network API written to run on top of Tensorflow. A deep neural network is an artificial neural network (ANN) with multiple hidden layers between the input and...
14441 sym R (13763 sym/46 pcs) 3 img
Pretrained Word Embeddings
Introduction Pre-trained word embeddings are useful when you do not have a lot of text to use to train your own word embeddings. A popular word embedding model is Global Vectors (GloVe)). GloVe uses word co-occurrence information, specifically ratios of co-occurrence probabilities to create vectors that capture meaning. Similar to LSA, it uses ma...
2948 sym R (6985 sym/18 pcs)
Creating Reports using Markdown
Introduction This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. R Markdown creates reproducible reports can output as HTML, PDF, PPT, DOC integrates code and output output changes when data or code changes Let’s watch a short video introducing R Markdown. Getting Started wi...
7782 sym R (490 sym/9 pcs) 4 img