Publications by Gary Hutson

Python Pandas Pro – Session Two – Selection on Data Frames

05.10.2020

In the previous post we worked with a custom data set we had created for the purposes of demonstration. This time we are going to work with the gapminder dataset.Viewing the gapminder datasetTo initialise the gapminder dataset in my project I will use the below import statements to prepare the project ready for the demonstration:from gapminder im...

3481 sym Python (6069 sym/28 pcs) 2 img

Python Pandas Pro – Session Three – Setting and Operations

05.10.2020

Following on from the previous post, in this post we are going to learn about setting values, dealing with missing data and data frame operations.Setting valuesThe below example shows how to use the iat command we learned in the last lesson to set values.Setting values by positionAs stated, the implementation below can be used to set values by po...

3690 sym Python (3687 sym/17 pcs) 2 img

Creating Virtual Environments for Python Projects in VS Code

13.04.2021

Credit Real Python (https://realpython.com/python-virtual-environments-a-primer/)I had a similar problem recently, and then a request came through from a close friend (Chris Mainey) for the same purpose. I thought “I’ll write a blog post on this”. So, what are the benefits of creating virtual environments.”First of all. what are virtual e...

5779 sym 36 img

Feature encoding methods – the Pandas way

21.04.2021

This tutorial explores the various ways data can be encoded, using Pandas and Numpy, to prepare the data ready for a Machine Learning, or predictive model pipeline.Encoding methodsThere are three main methods explored therein:Label encoding – encoding a value based on where the label order falls – could be good for rank and non-parametric met...

1468 sym 2 img

Python (PyHacks) tutorials on lists, list comprehensions and Tuples has arrived

01.07.2021

I have had many colleagues who work with the R programming language wanting something that is easily accessible for learning Python. I have started to compile some guides on how to work with Python data structures. My aim is to do a complete series of tutorials on how to utilise Python to its greatest potential.I aim to do basic Python and then g...

2806 sym 8 img

PyHacks tutorials – the resource for learning Python is growing

16.07.2021

I had an aim at the start of this year to make learning Python accessible to everyone, for free. Unfortunately, I had some health problems to contend with alongside work, so I did not have the extracurricular time to tend to my pet project.However, I have been busy over the last few evenings and have managed to build a good repository of material...

5848 sym 2 img

Training XGBoost Model and Assessing Feature Importance using Shapley Values in Sci-kit Learn

07.09.2021

In this tutorial I will take you through how to:Read in dataPerform feature engineering, dummy encoding and feature selectionSplitting dataTraining an XGBoost classifierPickling your model and data to be consumed in an evaluation scriptEvaluating your model with Confusion Matrices and Classification reports in Sci-kit LearnWorking with the shap p...

12922 sym Python (6144 sym/11 pcs) 30 img

Creating and replicating an Anaconda Environment from a YAML file

08.09.2021

I had the perfect environment for my setup and I want to potentially create another environment and add some additional packages, or I want to roll this environment out for training purposes. This is where the YAML file comes in and it is sweet.Head to Anaconda Navigator I load up my Anaconda Navigator and select the environment I want to replica...

3358 sym 36 img

Parallelisation of Sci Kit Learning Python Models

14.09.2021

I wanted a way to effectively make parallel some of my prebuilt machine learning models. Luckily, the package has this capability inbuilt and it is easy to make massive performance gains in terms of your model runtime. I will show you how in the following article.Loading imports and finding out my CPU coresThe first step is to load in the requir...

5948 sym Python (2120 sym/8 pcs) 10 img

Parallelisation of Model Evaluation and Hyperparameter Tuning in Sci Kit Learn

14.09.2021

Hello, it is me again for another post on how to make sci-kit perform at the top of its game.Amping up the Model Evaluation processModel evaluation in sci-kit learn can be achieved by the cross_val_score. This performs repeated stratified K fold resampling and assess the model accuracy across each of those folds. This is done to get a better sens...

6165 sym Python (4034 sym/10 pcs) 10 img