Publications by atmathew

Examining Email Addresses in R

22.08.2015

I don’t normally work with personal identifiable information such as emails. However, the recent data dump from Ashley Madison got me thinking about how I’d examine a data set composed of email addresses. What are the characteristics of an email that I’d look to extract? How would I perform that task in R? Here’s some quick R code to extr...

1159 sym R (1300 sym/1 pcs) 4 img

Logistic Regression in R – Part One

01.09.2015

Please note that an earlier version of this post had to be retracted because it contained some content which was generated at work. I have since chosen to rewrite the document in a series of posts. Please recognize that this may take some time. Apologies for any inconvenience. Logistic regression is used to analyze the relationship between a di...

2416 sym R (479 sym/1 pcs) 36 img

Logistic Regression in R – Part Two

02.09.2015

My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to prod...

4703 sym R (1457 sym/7 pcs) 44 img

Working With SEM Keywords in R

20.09.2015

The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how can I develop an effective...

3754 sym R (3455 sym/4 pcs) 4 img

A Few Days of Python: Using R in Python

28.09.2015

Using R Functions in Python import pandas as pd import pyper as pr def zone_func(zone_file): # IMPORT DATA: dat = pd.read_csv(zone_file, parse_dates=["Row Labels"]) # CREATE A R INSTANCE WITH PYPER: r = pr.R() # PASS DATA FROM PYTHON TO R: r.assign("rsfores", dat["forest"]) forecast = r(""" librar...

433 sym R (711 sym/1 pcs) 4 img

Basic Forecasting

17.10.2015

Forecasting refers to the process of using statistical procedures to predict future values of a time series based on historical trends. For businesses, being able gauge expected outcomes for a given time period is essential for managing marketing, planning, and finances. For example, an advertising agency may want to utilizes sales forecasts to i...

4212 sym R (1293 sym/4 pcs) 4 img

Applied Statistical Theory: Belief Networks

21.10.2015

Applied statistical theory is a new series that will cover the basic methodology and framework behind various statistical procedures. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” At the same time, w...

2471 sym 48 img

Applied Statistical Theory: Quantile Regression

13.11.2015

This is part two of the ‘applied statistical theory’ series that will cover the bare essentials of various statistical techniques. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” Standard linear r...

1933 sym R (244 sym/1 pcs) 20 img

Automate the Boring Stuff: GGPlot2

26.11.2015

The majority of my interaction with the ggplot2 package involves the interactive execution of code to visualize data within the context of exploratory data analysis. This is often a manual process and quite laborious. I recently sought to improve these tasks by creating a series of user defined functions that contained my most commonly used ggplo...

1400 sym R (1198 sym/1 pcs) 4 img

Weekly R-Tips: Importing Packages and User Inputs

11.12.2015

Number 1: Importing Multiple Packages Anyone who has used R for some time has written code that required the use of multiple packages. In most cases, this will be done by using the library or require function to bring in the appropriate extensions. library(forecast) library(ggplot2) library(stringr) library(lubridateee) library(rockchalk) That...

1255 sym R (535 sym/3 pcs) 4 img