Publications by Gary Hutson
Python Pandas Pro – Session Two – Selection on Data Frames
In the previous post we worked with a custom data set we had created for the purposes of demonstration. This time we are going to work with the gapminder dataset.Viewing the gapminder datasetTo initialise the gapminder dataset in my project I will use the below import statements to prepare the project ready for the demonstration:from gapminder im...
3481 sym Python (6069 sym/28 pcs) 2 img
Python Pandas Pro – Session Three – Setting and Operations
Following on from the previous post, in this post we are going to learn about setting values, dealing with missing data and data frame operations.Setting valuesThe below example shows how to use the iat command we learned in the last lesson to set values.Setting values by positionAs stated, the implementation below can be used to set values by po...
3690 sym Python (3687 sym/17 pcs) 2 img
Creating Virtual Environments for Python Projects in VS Code
Credit Real Python (https://realpython.com/python-virtual-environments-a-primer/)I had a similar problem recently, and then a request came through from a close friend (Chris Mainey) for the same purpose. I thought “I’ll write a blog post on this”. So, what are the benefits of creating virtual environments.”First of all. what are virtual e...
5779 sym 36 img
Feature encoding methods – the Pandas way
This tutorial explores the various ways data can be encoded, using Pandas and Numpy, to prepare the data ready for a Machine Learning, or predictive model pipeline.Encoding methodsThere are three main methods explored therein:Label encoding – encoding a value based on where the label order falls – could be good for rank and non-parametric met...
1468 sym 2 img
Python (PyHacks) tutorials on lists, list comprehensions and Tuples has arrived
I have had many colleagues who work with the R programming language wanting something that is easily accessible for learning Python. I have started to compile some guides on how to work with Python data structures. My aim is to do a complete series of tutorials on how to utilise Python to its greatest potential.I aim to do basic Python and then g...
2806 sym 8 img
PyHacks tutorials – the resource for learning Python is growing
I had an aim at the start of this year to make learning Python accessible to everyone, for free. Unfortunately, I had some health problems to contend with alongside work, so I did not have the extracurricular time to tend to my pet project.However, I have been busy over the last few evenings and have managed to build a good repository of material...
5848 sym 2 img
Training XGBoost Model and Assessing Feature Importance using Shapley Values in Sci-kit Learn
In this tutorial I will take you through how to:Read in dataPerform feature engineering, dummy encoding and feature selectionSplitting dataTraining an XGBoost classifierPickling your model and data to be consumed in an evaluation scriptEvaluating your model with Confusion Matrices and Classification reports in Sci-kit LearnWorking with the shap p...
12922 sym Python (6144 sym/11 pcs) 30 img
Creating and replicating an Anaconda Environment from a YAML file
I had the perfect environment for my setup and I want to potentially create another environment and add some additional packages, or I want to roll this environment out for training purposes. This is where the YAML file comes in and it is sweet.Head to Anaconda Navigator I load up my Anaconda Navigator and select the environment I want to replica...
3358 sym 36 img
Parallelisation of Sci Kit Learning Python Models
I wanted a way to effectively make parallel some of my prebuilt machine learning models. Luckily, the package has this capability inbuilt and it is easy to make massive performance gains in terms of your model runtime. I will show you how in the following article.Loading imports and finding out my CPU coresThe first step is to load in the requir...
5948 sym Python (2120 sym/8 pcs) 10 img
Parallelisation of Model Evaluation and Hyperparameter Tuning in Sci Kit Learn
Hello, it is me again for another post on how to make sci-kit perform at the top of its game.Amping up the Model Evaluation processModel evaluation in sci-kit learn can be achieved by the cross_val_score. This performs repeated stratified K fold resampling and assess the model accuracy across each of those folds. This is done to get a better sens...
6165 sym Python (4034 sym/10 pcs) 10 img