Publications by atmathew
Examining Email Addresses in R
I don’t normally work with personal identifiable information such as emails. However, the recent data dump from Ashley Madison got me thinking about how I’d examine a data set composed of email addresses. What are the characteristics of an email that I’d look to extract? How would I perform that task in R? Here’s some quick R code to extr...
1159 sym R (1300 sym/1 pcs) 4 img
Logistic Regression in R – Part One
Please note that an earlier version of this post had to be retracted because it contained some content which was generated at work. I have since chosen to rewrite the document in a series of posts. Please recognize that this may take some time. Apologies for any inconvenience. Logistic regression is used to analyze the relationship between a di...
2416 sym R (479 sym/1 pcs) 36 img
Logistic Regression in R – Part Two
My previous post covered the basics of logistic regression. We must now examine the model to understand how well it fits the data and generalizes to other observations. The evaluation process involves the assessment of three distinct areas – goodness of fit, tests of individual predictors, and validation of predicted values – in order to prod...
4703 sym R (1457 sym/7 pcs) 44 img
Working With SEM Keywords in R
The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how can I develop an effective...
3754 sym R (3455 sym/4 pcs) 4 img
A Few Days of Python: Using R in Python
Using R Functions in Python import pandas as pd import pyper as pr def zone_func(zone_file): # IMPORT DATA: dat = pd.read_csv(zone_file, parse_dates=["Row Labels"]) # CREATE A R INSTANCE WITH PYPER: r = pr.R() # PASS DATA FROM PYTHON TO R: r.assign("rsfores", dat["forest"]) forecast = r(""" librar...
433 sym R (711 sym/1 pcs) 4 img
Basic Forecasting
Forecasting refers to the process of using statistical procedures to predict future values of a time series based on historical trends. For businesses, being able gauge expected outcomes for a given time period is essential for managing marketing, planning, and finances. For example, an advertising agency may want to utilizes sales forecasts to i...
4212 sym R (1293 sym/4 pcs) 4 img
Applied Statistical Theory: Belief Networks
Applied statistical theory is a new series that will cover the basic methodology and framework behind various statistical procedures. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” At the same time, w...
2471 sym 48 img
Applied Statistical Theory: Quantile Regression
This is part two of the ‘applied statistical theory’ series that will cover the bare essentials of various statistical techniques. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” Standard linear r...
1933 sym R (244 sym/1 pcs) 20 img
Automate the Boring Stuff: GGPlot2
The majority of my interaction with the ggplot2 package involves the interactive execution of code to visualize data within the context of exploratory data analysis. This is often a manual process and quite laborious. I recently sought to improve these tasks by creating a series of user defined functions that contained my most commonly used ggplo...
1400 sym R (1198 sym/1 pcs) 4 img
Weekly R-Tips: Importing Packages and User Inputs
Number 1: Importing Multiple Packages Anyone who has used R for some time has written code that required the use of multiple packages. In most cases, this will be done by using the library or require function to bring in the appropriate extensions. library(forecast) library(ggplot2) library(stringr) library(lubridateee) library(rockchalk) That...
1255 sym R (535 sym/3 pcs) 4 img