Publications by Paul van der Laken
Tutorial: Demystifying Deep Learning for Data Scientists
In this great tutorial for PyCon 2020, Eric Ma proposes a very simple framework for machine learning, consisting of only three elements:ModelLoss functionOptimizerScreenshot from youtube.com/watch?v=gGu3pPC_fBMBy adjusting the three elements in this simple framework, you can build any type of machine learning program.In the tutorial, Eric shows y...
1720 sym 4 img
100 Python pandas tips and tricks
Working with Python’s pandas library often?This resource will be worth its length in gold!Kevin Markham shares his tips and tricks for the most common data handling tasks on twitter. He compiled the top 100 in this one amazing overview page. Find the hyperlinks to specific sections below! pandas trick:Want to plot a DataFrame? It's as easy as:d...
1300 sym 12 img
Best Tech & Programming Talks Ever
Every now and then, Twitter will offer these golden resources.Ashley Willis recently asked people to name the best tech talk they’ve ever seen and the results are a resource I don’t want to lose.Hundreds of people responded, sharing their contenders for the title.What’s the best tech talk you’ve ever seen?— Ashley Willis (McNamara) (@as...
1543 sym
Handling and Converting Data Types in Python Pandas
Data types are one of those things that you don’t tend to care about until you get an error or some unexpected results. It is also one of the first things you should check once you load a new data into pandas for further analysis.Chris MoffitIn this short tutorial, Chris shows how to the pandas dtypes map to the numpy and base Python data type...
1030 sym 6 img
How a File Format Exposed a Crossword Scandal
Vincent Warmerdam shared this Youtube video which I thoroughly enjoyed watched. It’s about Saul Pwanson, a software engineer whose hobby project got a little out of hand.In 2016, Saul Pwanson designed a plain-text file format for crossword puzzle data, and then spent a couple of months building a micro-data-pipeline, scraping tens of thousands ...
1390 sym 12 img
Become a Data Science Professional
Amit Ness gathered an impressive list of learning resources for becoming a data scientist.It’s great to see that he shares them publicly on his github so that others may follow along.But beware, this learning guideline covers a multi-year process.Amit’s personal motto seems to be “Becoming better at data science every day“.Completing the ...
2654 sym 2 img
Using OpenCV to win Mobile games
OpenCV logoOpenCV is open-source library with tools and functionalities that support computer vision. It allows your computer to use complex mathematics to detect lines, shapes, colors, text and what not.OpenCV was originally developed by Intel in 2000 and sometime later someone had the bright idea to build a Python module on top of it.Using a si...
2711 sym 12 img
Need to save R’s lm() or glm() models? Trim the fat!
I was training a predictive model for work for use in a Shiny App. However, as the training set was quite large (700k+ obs.), the model object to save was also quite large in size (500mb). This slows down your operation significantly! Basically, all you really need are the coefficients (and a link function, in case of glm()). However, I can imagi...
1301 sym R (457 sym/1 pcs)
Anomaly Detection Resources
Carnegie Mellon PhD student Yue Zhao collects this great Github repository of anomaly detection resources: https://github.com/yzhao062/anomaly-detection-resources The repository consists of tools for multiple languages (R, Python, Matlab, Java) and resources in the form of: Books & Academic PapersOnline Courses and VideosOutlier DatasetsAlgorit...
2131 sym
Visualizing Sampling Distributions in ggplot2: Adding area under the curve
Thank you ggplot2tutor for solving one of my struggles. Apparently this is all it takes: ggplot(NULL, aes(x = c(-3, 3))) + stat_function(fun = dnorm, geom = "line") I can’t begin to count how often I have wanted to visualize a (normal) distribution in a plot. For instance to show how my sample differs from expectations, or to highlight the s...
1241 sym R (982 sym/4 pcs) 4 img