Publications by Dr. Chelsey Hill

Linear Regression and Model Selection

14.11.2023

Preliminary The output that follows was generated using the stats package in base R. In the lesson that follows, we use the training and testing sets of the CarSeats data, which we import as ‘CS_Train’ and ‘CS_Test’. CS_Train <- read.csv("CarSeats_Train.csv") CS_Test <- read.csv("CarSeats_Test.csv") Data Exploration We will train our li...

3447 sym

INFO 265: Data Integration (Joins)

13.09.2023

Preliminary We will use the dplyr package, which contains small relational datasets for use in the examples. If you do not already have the dplyr package installed, you will first install the package using the install.packages() function. install.packages("dplyr") Once installed, we load the package for use in the R session using the library() ...

3927 sym R (1352 sym/14 pcs)

INFO 365: Hierarchical Cluster Analysis

14.03.2023

Preliminary The output that follows was generated using the stat, caret, cluster, and factoextra packages in R. In the lesson that follows we use a data set containing car seat sales at 81 different stores. The car seat manufacturer would like to group stores for market segmentation and marketing purposes. The variables in the dataset include:...

8216 sym 9 img 6 tbl

INFO 365: Association Analysis

28.02.2023

Preliminary The output that follows was generated using the arules package in R. In the lesson that follows, we use the Groceries (a data set from the arules package) which contains 1 month (30 days) of real-world point-of-sale transaction data from a local grocery outlet. The grocery store would like to find actionable, explainable and non-tr...

3695 sym 2 img 2 tbl

Sentiment Analysis (Lexicons)

15.10.2020

Introduction Sentiment Analysis or Opinion Mining was introduced in the early 2000s as a method to understand and analyze opinions and feelings (Dave, Lawrence and Pennock 2003; Liu 2012; Nasukawa and Yi 2003). The objective is to explain sentiment based on term polarity (and in some cases intensity). Polarity is sometimes referred to as valen...

8050 sym R (11215 sym/53 pcs) 7 img

Topic Models (LDA, CTM, STM)

08.10.2020

Introduction to Topic Models Topic Models are unsupervised methods of automatic organizing, understanding, searching and summarizing text documents (Blei 2012). Topic models are built based on term co-occurrence. Topic Models are mixed-membership models Every document is mixture of topics. Document 1 could be 30% Topic A, 20% Topic B, 40% Topi...

13207 sym R (5314 sym/32 pcs) 18 img

Working with Text Data

01.10.2020

Some of the packages that we will use in R to work with text data include tm (preprocessing and transformation), textstem (lemmatization), wordcloud (visualization) and lexicon (sentiment lexicons). If you do not already have these packages installed, you will need to install them. install.packages(c("tm", "textstem", "wordcloud", "lexicon")) To ...

12180 sym R (15059 sym/78 pcs) 10 img

Deep Learning for Text Data

21.10.2020

Introduction In this lesson, we will use tensorflow and keras for deep learning using neural networks. Tensorflow is an open-source library for machine learning in python. Keras is a neural network API written to run on top of Tensorflow. A deep neural network is an artificial neural network (ANN) with multiple hidden layers between the input and...

14441 sym R (13763 sym/46 pcs) 3 img

Pretrained Word Embeddings

12.11.2020

Introduction Pre-trained word embeddings are useful when you do not have a lot of text to use to train your own word embeddings. A popular word embedding model is Global Vectors (GloVe)). GloVe uses word co-occurrence information, specifically ratios of co-occurrence probabilities to create vectors that capture meaning. Similar to LSA, it uses ma...

2948 sym R (6985 sym/18 pcs)

Creating Reports using Markdown

09.11.2020

Introduction This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. R Markdown creates reproducible reports can output as HTML, PDF, PPT, DOC integrates code and output output changes when data or code changes Let’s watch a short video introducing R Markdown. Getting Started wi...

7782 sym R (490 sym/9 pcs) 4 img