Publications by John Mount
Experimenting with Polars for Data in Python
I’ve just started experimenting with the Polars data frame library in Python.I really like the programmable API it exposes. In fact I am starting an experimental adapter from the data algebra to Polars. When this is complete one can use the data algebra to run the same data transform in Pandas, SQL, or Polars.Here is my first experiment: de-dup...
716 sym
Touching the 3rd Rail of Data Science: “R or Python?”
I’ve been seeing a lot of hot takes on if one should do data science in R or in Python. I’ll comment generally on the topic, and then add my own myopic gear-head micro benchmark. I’ll jump in: If learning the language is the big step: then you are a beginner in the data science field. So the right choice is: work with others and use the too...
4152 sym 1 tbl
Eliminating Tail Calls in Python Using Exceptions
I was working through Kyle Miller‘s excellent note: “Tail call recursion in Python”, and decided to experiment with variations of the techniques.The idea is: one may want to eliminate use of the Python language call-stack in the case of a “tail calls” (a function call where the result is not used by the calling function, but instead imm...
3928 sym Python (1021 sym/6 pcs)
New Getting Started with vtreat Documentation
Win Vector LLC‘s Dr. Nina Zumel has just released some new vtreat documentation.vtreat is a an all-in one step data preparation system that helps defend your machine learning algorithms from:Missing valuesLarge cardinality categorical variablesNovel levels from categorical variablesI hoped she could get the Python vtreat documentation up to par...
1048 sym
AI for Engineers
For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn’s AI Academy. John Mount at the LinkedIn campusNina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We’ve just started our 9th cohort. We adjust the course each time. O...
1193 sym 2 img
New Introduction to the data_algebra
We’ve had really good progress in bringing the Python data_algebra to feature parity with R rquery. In fact we are able to reproduced the New Introduction to rquery article as a “New Introduction to the data_algebra” here.The idea is: you may have good reasons to want to work in R or to want to work in Python. And Win-Vector LLC wants to ...
8988 sym Python (2461 sym/20 pcs) 15 tbl
Slides from the PyData2019 data_algebra lightning talk
Slides from my PyData2019 data_algebra lightning talk are here.Related To leave a comment for the author, please follow the link and comment on their blog: python – Win-Vector Blog . Want to share your content on python-bloggers? click here....
244 sym
Slides for PyData LA 2019 vtreat Talk
Slides for PyData LA 2019 vtreat Talk are here!Related To leave a comment for the author, please follow the link and comment on their blog: python – Win-Vector Blog . Want to share your content on python-bloggers? click here....
228 sym
Python changing attribute mystery. Help?
Python peeps: any idea why this attribute changes value when I re-examine it? I am using PyCharm, but the calculation is weird even in Jupyter.It doesn’t just seem to be the debugger, running it in Jupyter gives the wrong value (just {'x'}, instead of {'x', 'y'}). The type appears to be a dictionary object as an attribute of a class, but the...
580 sym
data_algebra/rquery as a Category Over Table Descriptions
IntroductionI would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package).The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project site and the examples directory. In this note we will set up s...
21463 sym Python (1170 sym/20 pcs) 2 tbl