Publications by John Mount

More on Adjusting Saturated Multivariate Linear Models

20.08.2024

Nina has more on Adjusting Saturated Multivariate Linear Models. Think of it as a statistics topic from an engineering and data scientist’s perspective. Related To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC. R-bloggers.com offers daily e-mail updates about R news and tutorials about ...

552 sym

Solving Recurrence Relations

09.05.2024

Introduction A neat bit of “engineering mathematics” is solving recurrence relations. The solution method falls out of the notation itself, and harkens back to a time where formal sums were often used in place of vector subscript notation.Unfortunately the variety of such solutions is small enough to allow teaching by memorization. In this note...

9680 sym Python (3244 sym/26 pcs)

What Good is Analysis of Variance?

28.02.2024

Introduction I’d like to demonstrate what “analysis of variance” (often abbreviated as “anova” or “aov”) does for you as a data scientist or analyst. After reading this note you should be able to determine how an analysis of variance style calculation can or can not help with your project. (Orson Welles as Macbeth, a photo that will ...

20839 sym R (6516 sym/28 pcs) 2 img 7 tbl

Schemas for Python Data Frames

12.09.2023

The Pandas data frame is probably the most popular tool used to model tabular data in Python. For in-memory data, Pandas serves a role that might normally fall to a relational database. Though, Pandas data frames are typically manipulated through methods, instead of with a relational query language. One can even extend Pandas to accept query langua...

6959 sym Python (3839 sym/32 pcs) 5 tbl

Omitted Variable Effects in Logistic Regression

18.08.2023

Introduction I would like to illustrate a way which omitted variables interfere in logistic regression inference (or coefficient estimation). These effects are different than what is seen in linear regression, and possibly different than some expectations or intuitions. Our Example Data Let’s start with a data example in R. # example variable fr...

7809 sym Python (2749 sym/31 pcs) 2 img 5 tbl

A Time Series Apologia

07.05.2023

I would like to share a new article on some of the methods and pitfalls of time series forecasting: “A Time Series Apologia”. In it I work the seemingly simple problem of forecasting a noisy copy of sin(t). The purpose of the article is to demonstrate using ARIMA methods, and to show that it is okay to also try non-ARIMA methods. We also share ...

874 sym 2 img

Experimenting with Polars for Data in Python

07.12.2022

I’ve just started experimenting with the Polars data frame library in Python.I really like the programmable API it exposes. In fact I am starting an experimental adapter from the data algebra to Polars. When this is complete one can use the data algebra to run the same data transform in Pandas, SQL, or Polars.Here is my first experiment: de-dup...

716 sym

Touching the 3rd Rail of Data Science: “R or Python?”

13.12.2022

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a beginner in the data science field. So the right choice is: work with others and use the too...

4152 sym 1 tbl

Eliminating Tail Calls in Python Using Exceptions

23.08.2019

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques.The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is not used by the calling function, but instead imm...

3928 sym Python (1021 sym/6 pcs)

New Getting Started with vtreat Documentation

02.09.2019

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation.vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from:Missing valuesLarge cardinality categorical variablesNovel levels from categorical variablesI hoped she could get the Python vtreat documentation up to par...

1048 sym