Publications by John Mount

Experimenting with Polars for Data in Python

07.12.2022

I’ve just started experimenting with the Polars data frame library in Python.I really like the programmable API it exposes. In fact I am starting an experimental adapter from the data algebra to Polars. When this is complete one can use the data algebra to run the same data transform in Pandas, SQL, or Polars.Here is my first experiment: de-dup...

716 sym

Touching the 3rd Rail of Data Science: “R or Python?”

13.12.2022

I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a beginner in the data science field. So the right choice is: work with others and use the too...

4152 sym 1 tbl

Eliminating Tail Calls in Python Using Exceptions

23.08.2019

I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques.The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is not used by the calling function, but instead imm...

3928 sym Python (1021 sym/6 pcs)

New Getting Started with vtreat Documentation

02.09.2019

Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation.vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from:Missing valuesLarge cardinality categorical variablesNovel levels from categorical variablesI hoped she could get the Python vtreat documentation up to par...

1048 sym

AI for Engineers

09.10.2019

For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn’s AI Academy. John Mount at the LinkedIn campusNina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We’ve just started our 9th cohort. We adjust the course each time. O...

1193 sym 2 img

New Introduction to the data_algebra

31.10.2019

We’ve had really good progress in bringing the Python data_algebra to feature parity with R rquery. In fact we are able to reproduced the New Introduction to rquery article as a “New Introduction to the data_algebra” here.The idea is: you may have good reasons to want to work in R or to want to work in Python. And Win-Vector LLC wants to ...

8988 sym Python (2461 sym/20 pcs) 15 tbl

Slides from the PyData2019 data_algebra lightning talk

04.12.2019

Slides from my PyData2019 data_algebra lightning talk are here.Related To leave a comment for the author, please follow the link and comment on their blog: python – Win-Vector Blog . Want to share your content on python-bloggers? click here....

244 sym

Slides for PyData LA 2019 vtreat Talk

05.12.2019

Slides for PyData LA 2019 vtreat Talk are here!Related To leave a comment for the author, please follow the link and comment on their blog: python – Win-Vector Blog . Want to share your content on python-bloggers? click here....

228 sym

Python changing attribute mystery. Help?

07.12.2019

Python peeps: any idea why this attribute changes value when I re-examine it? I am using PyCharm, but the calculation is weird even in Jupyter.It doesn’t just seem to be the debugger, running it in Jupyter gives the wrong value (just {'x'}, instead of {'x', 'y'}). The type appears to be a dictionary object as an attribute of a class, but the...

580 sym

data_algebra/rquery as a Category Over Table Descriptions

14.12.2019

IntroductionI would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package).The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project site and the examples directory. In this note we will set up s...

21463 sym Python (1170 sym/20 pcs) 2 tbl