Publications by Pabloc

Introduction to automatic machine learning

02.07.2015

Automatic Machine Learning Introduction Introduction “I want to develop a model that automatically learns over time“, a really challenging objective. We’ll develop in this post a procedure that loads data, build a model, make predictions and, if something changes over time, it will create a new model, all with R. Picture credit: S.H Hori...

5257 sym R (2332 sym/11 pcs) 22 img

Package funModeling: data cleaning, importance variable analysis and model performance

08.02.2016

Hi there 🙂 This new package –install.packages("funModeling")– tries to cover with simple concepts common tasks in data science. Written like a short tutorial, its focus is on data interpretation and analysis. Below, you’ll find a copy-paste from the package vignette, (so you can drink a good coffee while you read it… ) Introduction...

7828 sym R (1936 sym/15 pcs) 32 img

Package funModeling: data cleaning, importance variable analysis and model perfomance

08.02.2016

Hi there 🙂 This new package –install.packages("funModeling")– tries to cover with simple concepts common tasks in data science. Written like a short tutorial, its focus is on data interpretation and analysis. Below, you’ll find a copy-paste from the package vignette, (so you can drink a good coffee while you read it… ) Introduction...

7862 sym R (2213 sym/17 pcs) 30 img

Time Series Analysis Using Max/Min… and some Neuroscience.

06.06.2016

Introduction Time series have maximum and minimum points as general patterns. Sometimes the noise present on it causes problems to spot general behavior. In this post, we will smooth time series -reducing noise- to maximize the story that data has to tell us. And then, an easy formula will be applied to find and plot max/min points thus character...

3101 sym R (1188 sym/7 pcs) 30 img

Time Series Analysis Using Max/Min… and some Neuroscience.

06.06.2016

Introduction Time series has maximum and minimum points as general patterns. Sometimes the noise present on it causes problems to spot general behavior. In this post, we will smooth time series -reducing noise- to maximize the story that data has to tell us. And then, an easy formula will be applied to find and plot max/min points thus characteri...

3097 sym R (1188 sym/7 pcs) 28 img

Data Science Live Book (open source)

10.08.2016

Hi! Well finally there is the first release of this project: An open source book which will hopefully contain some useful resources for those who want to learn some data analysis/machine learning. This release covers a little of data preparation, data profiling, selecting best variables (dataviz), assessing model performance, and coming soon a c...

955 sym 12 img

Data Science Live Book – Scoring, Model Performance & profiling – Update!

17.10.2016

This update contains a new chapter –scoring– which is related to model performance and model deployment, used when predicting a binary outcome. Link to the scoring chapter. Important: To use following updates please update funModeling package 🙂 install.packages("funModeling") Also related to predictive modelling for binary outcome, ther...

1408 sym 18 img

Model Performance in Data Science Live Book

08.12.2016

Hi there! I decided to almost re-write the model validation section since it didn’t reflect real case scenarios. Hopefully in the two new chapters you will gain a deeper knowledge on methodological aspects on model validation through classical cross-validation, bootstrapping, and going further in the nature of the error. And also take advanta...

1272 sym 14 img

Playing with dimensions: from Clustering, PCA, t-SNE… to Carl Sagan!

01.03.2017

Playing with dimensions Hi there! This post is an experiment combining the result of t-SNE with two well known clustering techniques: k-means and hierarchical. This will be the practical section, in R. But also, this post will explore the intersection point of concepts like dimension reduction, clustering analysis, data preparation, PCA, HDBSCAN...

7733 sym R (2378 sym/4 pcs) 26 img

Data Science Live Book (open source) ~ new big release! 200-pages

29.10.2017

Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago 🙂 tl;dr: Hi there! I invite you to read the book online and/or download here. Thanks and have a nice day 🙂 !(tl;dr): An overview… It’s a book to learn data science, machine ...

3804 sym 10 img