Publications by John Mount

The 100 Bushels Puzzle

26.09.2024

Nina Zumel shares the following puzzle from the December 1908 issue of The Strand Magazine: 100 bushes of corn are distributed to 100 people such that every man receives 3 bushels, every woman 2 bushels, and every child 1/2 a bushel. How many men, women, and children are there? Check out some of the background and how to solve it here. Related ...

739 sym

Bay Area use R Meetup is Back!!!!

23.09.2024

BARUG is back!!! Hope to see you there! https://www.meetup.com/r-users/events/303488652/?eventOrigin=group_upcoming_events Related To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topic...

523 sym 2 img

Please Version Data

09.09.2024

Introduction An important goal of our Win Vector LLC teaching offerings is to instill in engineers some familiarity with, and empathy for, how data is likely to be used for analytics and business. Having such engineers in your organization greatly increases the quality of the data later available to your analysts and data scientists. This in turn e...

7618 sym R (4629 sym/13 pcs) 8 img 6 tbl

More on Adjusting Saturated Multivariate Linear Models

20.08.2024

Nina has more on Adjusting Saturated Multivariate Linear Models. Think of it as a statistics topic from an engineering and data scientist’s perspective. Related To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC. R-bloggers.com offers daily e-mail updates about R news and tutorials about ...

552 sym

Solving Recurrence Relations

09.05.2024

Introduction A neat bit of “engineering mathematics” is solving recurrence relations. The solution method falls out of the notation itself, and harkens back to a time where formal sums were often used in place of vector subscript notation.Unfortunately the variety of such solutions is small enough to allow teaching by memorization. In this note...

9680 sym Python (3244 sym/26 pcs)

What Good is Analysis of Variance?

28.02.2024

Introduction I’d like to demonstrate what “analysis of variance” (often abbreviated as “anova” or “aov”) does for you as a data scientist or analyst. After reading this note you should be able to determine how an analysis of variance style calculation can or can not help with your project. (Orson Welles as Macbeth, a photo that will ...

20839 sym R (6516 sym/28 pcs) 2 img 7 tbl

Schemas for Python Data Frames

12.09.2023

The Pandas data frame is probably the most popular tool used to model tabular data in Python. For in-memory data, Pandas serves a role that might normally fall to a relational database. Though, Pandas data frames are typically manipulated through methods, instead of with a relational query language. One can even extend Pandas to accept query langua...

6959 sym Python (3839 sym/32 pcs) 5 tbl

Omitted Variable Effects in Logistic Regression

18.08.2023

Introduction I would like to illustrate a way which omitted variables interfere in logistic regression inference (or coefficient estimation). These effects are different than what is seen in linear regression, and possibly different than some expectations or intuitions. Our Example Data Let’s start with a data example in R. # example variable fr...

7809 sym Python (2749 sym/31 pcs) 2 img 5 tbl

A Time Series Apologia

07.05.2023

I would like to share a new article on some of the methods and pitfalls of time series forecasting: “A Time Series Apologia”. In it I work the seemingly simple problem of forecasting a noisy copy of sin(t). The purpose of the article is to demonstrate using ARIMA methods, and to show that it is okay to also try non-ARIMA methods. We also share ...

874 sym 2 img

Experimenting with Polars for Data in Python

07.12.2022

I’ve just started experimenting with the Polars data frame library in Python.I really like the programmable API it exposes. In fact I am starting an experimental adapter from the data algebra to Polars. When this is complete one can use the data algebra to run the same data transform in Pandas, SQL, or Polars.Here is my first experiment: de-dup...

716 sym