Publications by atmathew

Creating ‘Tags’ For PPC Keywords

07.02.2013

When performing search engine marketing, it is usually beneficial to construct a system for making sense of keywords and their performance. While one could construct Bayesian Belief Networks to model the process of consumers clicking on ads, I have found that using ’tags’ to categorize keywords is just as useful for conducting post-hoc ana...

1708 sym R (2620 sym/2 pcs) 4 img

Summarizing Data in R

10.04.2013

When work with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. Of course, R also has similar calculations that can be used to summarize large amount of data. In t...

835 sym R (1251 sym/1 pcs) 4 img

GGPlot2 #1: Employee Job Satisfaction at Top Tech Companies

13.07.2013

This is the first in a series of ongoing posts where I’ll take data on various topics and create simple visualizations of that data using the ggplot2 package in R. While my day job involves analyzing data, I rarely work on projects where I’m expected to produce “publication-worthy” graphics. Therefore, these posts are a way for me to con...

1205 sym R (1946 sym/1 pcs) 4 img

Above Average: Analyzing Self-Rated Qualities in R

16.03.2014

Numerous psychological studies have demonstrated that people often have an inflated perception of their personal qualities. From work performance to driving skills, people report being above average in relation to others when it comes to many arenas. This extends to how people perceive their own physical attractiveness and intelligence levels. T...

2857 sym R (4071 sym/3 pcs) 16 img

R 101: Summarizing Data

25.03.2014

When working with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. While not as “efficient” in relation to Excel pivot tables, R also has similar calculations ...

889 sym R (2062 sym/1 pcs) 4 img

I like you and you like me…but what does it all mean. (Part 1)

19.08.2014

Tinder is a popular matchmaking application that allows users to connect with others whom they share a physical attraction. New members build their profile by importing their age, gender, geographic information, and photos from their Facebook account. Users are then presented with profiles which meet their search criteria and are able to like o...

2387 sym R (1377 sym/2 pcs) 4 img

Turning Data Into Awesome With sqldf and pandasql

29.04.2015

Both R and Python possess libraries for using SQL statements to interact with data frames. While both languages have native facilities for manipulating data, the sqldf and pandasql provide a simple and elegant interface for conducting tasks using an intuitive framework that’s widely used by analysts. R and sqldf sqldf("SELECT COUNT(*) FROM ...

752 sym Python (1796 sym/2 pcs) 8 img

Wikipedia and the Fashion Weeks: A Look at Usage Patterns

03.08.2015

Unlike many of the entries on Wikipedia relating to statistics or computer science, fashion related topics have not not been thoroughly documented. For example, the entries on Martin Margiela and Rei Kawakubo pale in comparison to the breadth of content on John Bayes, structural equation modeling, or R. In lieu of this, I wanted to investigate ...

2705 sym 14 img

Evaluating Logistic Regression Models

17.08.2015

Logistic regression is a technique that is well suited for examining the relationship between a categorical response variable and one or more categorical or continuous predictor variables. The model is generally presented in the following format, where β refers to the parameters and x represents the independent variables. log(odds)=β0+β1∗x1...

8674 sym R (3677 sym/13 pcs) 4 img

Homework during the hiring process…no thanks!

17.08.2015

For the past four months, I’ve been on the job market looking for work as an applied statistician or data scientist within the the online marketing industry. One thing I’ve come to expect with almost every company is some sort of homework assignment or challenge where a spreadsheet would be presented along with some guidelines on what type of...

4281 sym 4 img