Publications by John Mount
AI for Engineers
For the last year we (Nina Zumel, and myself: John Mount) have had the honor of teaching the AI200 portion of LinkedIn’s AI Academy. John Mount at the LinkedIn campusNina Zumel designed most of the material, and John Mount has been delivering it and bringing her feedback. We’ve just started our 9th cohort. We adjust the course each time. O...
1193 sym 2 img
New Introduction to the data_algebra
We’ve had really good progress in bringing the Python data_algebra to feature parity with R rquery. In fact we are able to reproduced the New Introduction to rquery article as a “New Introduction to the data_algebra” here.The idea is: you may have good reasons to want to work in R or to want to work in Python. And Win-Vector LLC wants to ...
8988 sym Python (2461 sym/20 pcs) 15 tbl
Slides from the PyData2019 data_algebra lightning talk
Slides from my PyData2019 data_algebra lightning talk are here.Related To leave a comment for the author, please follow the link and comment on their blog: python – Win-Vector Blog . Want to share your content on python-bloggers? click here....
244 sym
Slides for PyData LA 2019 vtreat Talk
Slides for PyData LA 2019 vtreat Talk are here!Related To leave a comment for the author, please follow the link and comment on their blog: python – Win-Vector Blog . Want to share your content on python-bloggers? click here....
228 sym
Python changing attribute mystery. Help?
Python peeps: any idea why this attribute changes value when I re-examine it? I am using PyCharm, but the calculation is weird even in Jupyter.It doesn’t just seem to be the debugger, running it in Jupyter gives the wrong value (just {'x'}, instead of {'x', 'y'}). The type appears to be a dictionary object as an attribute of a class, but the...
580 sym
data_algebra/rquery as a Category Over Table Descriptions
IntroductionI would like to talk about some of the design principles underlying the data_algebra package (and also in its sibling rquery package).The data_algebra package is a query generator that can act on either Pandas data frames or on SQL tables. This is discussed on the project site and the examples directory. In this note we will set up s...
21463 sym Python (1170 sym/20 pcs) 2 tbl
Better SQL Generation via the data_algebra
In our recent note What is new for rquery December 2019 we mentioned an ugly processing pipeline that translates into SQL of varying size/quality depending on the query generator we use. In this note we try a near-relative of that query in the data_algebra.dplyr translates the query to SQL as:SELECT 5.0 AS `x`, `sum23` FROM (SELECT `col1`, `col2...
1850 sym
A Richer Category for Data Wrangling
I’ve been writing a lot about a category theory interpretations of data-processing pipelines and some of the improvements we feel it is driving in both the data_algebra and in rquery/rqdatatable.I think I’ve found an even better category theory re-formulation of the package, which I will describe here.In the earlier formalism our data transfo...
5800 sym Python (2110 sym/34 pcs) 9 tbl
data_algebra 0.7.0 What is New
I’ve been tinkering a lot recently with the data_algebra, and just released version 0.7.0 to PyPi. In this note I’ll touch on what the data algebra is, what the new features are, and my plans going forward. The data algebraThe data algebra is a modern realization of elements of Codd’s 1969 relational model for data wrangling (see also Co...
5622 sym Python (2219 sym/15 pcs) 4 tbl
Using WITH For Neater SQL
I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible SQL.A major complaint with SQL is that it composes statements by right-ward nesting.That is: a sequence of operations A -> B -> C is represented as SELECT C FROM SELECT B FROM SELECT A. However, the SQL 99 standard introduced the WITH statement and...
1912 sym Python (1903 sym/10 pcs) 2 tbl