Publications by T. Moudiki

On model specification, identification, degrees of freedom and regularization

19.03.2020

I had a lot of fun this week, revisiting this blog post (Monte Carlo simulation of a 2-factor interest rates model with ESGtoolkit) I wrote a few years ago in 2014 – that somehow generated a heatwave. This 2020 post is about model specification, identification, degrees of freedom and regularization. The first part is on Monte Carlo simulation ...

3831 sym R (5983 sym/7 pcs) 6 img

Time series cross-validation using crossval

26.03.2020

Time series cross-validation is now available in crossval, using function crossval::crossval_ts. Main parameters for crossval::crossval_ts include: fixed_window described below in sections 1 and 2, and indicating if the training set’s size is fixed or increasing through cross-validation iterations initial_window: the number of points in the ro...

2115 sym R (1542 sym/5 pcs) 6 img

Grid search cross-validation using crossval

09.04.2020

crossval is an R package which contains generic functions for cross-validation. Two weeks ago, I presented an example of time series cross-validation based on crossval. This week’s post is about cross-validation on a grid of hyperparameters. glmnet is used as statistical learning model for the demo, but it could be any other package of your cho...

1597 sym R (9316 sym/7 pcs) 2 img

Linear model, xgboost and randomForest cross-validation using crossval::crossval_ml

16.04.2020

As seen last week in a post on grid search cross-validation, crossval contains generic functions for statistical/machine learning cross-validation in R. A 4-fold cross-validation procedure is presented below: In this post, I present some examples of use of crossval on a linear model, and on the popular xgboost and randomForest models. The error...

1982 sym R (4420 sym/9 pcs) 4 img

Encoding your categorical variables based on the response variable and correlations

23.04.2020

Sometimes in Statistical/Machine Learning problems, we encounter categorical explanatory variables with high cardinality. Let’s say for example that we want to determine if a diet is good or bad, based on what a person eats. In trying to answer this question, we’d construct a response variable containing a sequence of characters good or bad, ...

5075 sym R (6626 sym/12 pcs) 6 img

Custom errors for cross-validation using crossval::crossval_ml

07.05.2020

This post is about using custom error measures in crossval, a tool offering generic functions for the cross-validation of Statistical/Machine Learning models. More information about cross-validation of regression models using crossval can be found in this post, or this other one. The default error measure for regression in crossval is Root Mean S...

1995 sym R (4671 sym/7 pcs)

AdaOpt (a probabilistic classifier based on a mix of multivariable optimization and a nearest neighbors) for R

21.05.2020

Last week on this blog, I presented AdaOpt for Python on a handwritten digits classification task. AdaOpt is a novel probabilistic classifier, based on a mix of multivariable optimization and a nearest neighbors algorithm. It’s still very new and only time will allow to fully appreciate all of its features. The tool is fast due to Cython, and t...

2408 sym R (1622 sym/6 pcs) 2 img

AdaOpt classification on MNIST handwritten digits (without preprocessing)

28.05.2020

Last week on this blog, I presented AdaOpt for R, applied to iris dataset classification. And the week before that, I introduced AdaOpt for Python. AdaOpt is a novel probabilistic classifier, based on a mix of multivariable optimization and a nearest neighbors algorithm. More details about the algorithm can be found in this (short) paper. This we...

2015 sym R (2571 sym/11 pcs) 2 img

Maximizing your tip as a waiter

04.06.2020

A few weeks ago, I introduced a target-based categorical encoder for Statistical/Machine Learning based on correlations + Cholesky decomposition. That is, a way to convert explanatory variables such as the x below, to numerical variables which can be digested by ML models. # Have: x <- c("apple", "tomato", "banana", "apple", "pineapple", "bic mac...

2391 sym R (1870 sym/3 pcs) 2 img

Maximizing your tip as a waiter

04.06.2020

A few weeks ago, I introduced a target-based categorical encoder for Statistical/Machine Learning based on correlations + Cholesky decomposition. That is, a way to convert explanatory variables such as the x below, to numerical variables which can be digested by ML models. # Have: x <- c("apple", "tomato", "banana", "apple", "pineapple", "bic mac...

2391 sym R (1870 sym/3 pcs) 2 img