Publications by Marc in the box

Evaluating model performance – A practical example of the effects of overfitting and data size on prediction

03.05.2014

Following my last post on decision making trees and machine learning, where I presented some tips gathered from the “Pragmatic Programming Techniques” blog, I have again been impressed by its clear presentation of strategies regarding the evaluation of model performance. I have seen some of these topics presented elsewhere – especially grap...

3881 sym R (4001 sym/1 pcs) 8 img

Automated determination of distribution groupings – A StackOverflow collaboration

18.05.2014

For those of you not familiar with StackOverflow (SO), it’s a coder’s help forum on the StackExchange website. It’s one of the best resources for R-coding tips that I know of, due entirely to the community of users that routinely give expert advise (assuming you show that you have done your homework and provide a clear question and a reprod...

5258 sym R (3287 sym/1 pcs) 4 img

Flood fill a region of an active device in R

23.07.2014

The following is a function to “flood fill” a region on the active plotting device. Once called, the user will be asked to click on the desired target region. The flood fill algorithm then searches neighbors in 4 directions of the target cell (down, left, up, right) and checks for similar colors to the target cell. If neighboring cells are of...

2289 sym R (3541 sym/2 pcs) 4 img 1 tbl

Rotated axis labels in R plots

05.08.2014

It’s somehow amazing to me that the option for slanted or rotated axes labels is not an option within the basic plot() or axis() functions in R.  The advantage is mainly in saving plot area space when long labels are needed (rather than as a means of preventing excessive head tilting). The topic is briefly covered in this FAQ, and the solution...

844 sym R (508 sym/1 pcs) 2 img

“sinkr” – a collection of functions featured on “me nugget”

02.09.2014

The R package sinkr (version 1.0) has now been released:  https://github.com/menugget/sinkrI have finally gotten around to learning how to create an R package and decided to start by bundling functions that I have featured on the blog. Thanks to the R Studio team for making this so easy (in combination with the R packages roxygen2 and devtools)....

3409 sym 2 img

PCA / EOF for data with missing values – a comparison of accuracy

15.09.2014

Not all Principal Component Analysis (PCA) (also called Empirical Orthogonal Function analysis, EOF) approaches are equal when it comes to dealing with a data field that contain missing values (i.e. “gappy”). The following post compares several methods by assessing the accuracy of the derived PCs to reconstruct the “true” data set, as was...

8160 sym R (12691 sym/1 pcs) 12 img

Maximal Information Coefficient (Part II)

17.09.2014

A while back, I wrote a post simply announcing a recent paper that described a new statistic called the “Maximal Information Coefficient” (MIC), which is able to describe the correlation between paired variables regardless of linear or nonlinear relationship. This turned out to be quite a popular post, and included a lively discussion as to t...

2929 sym R (844 sym/1 pcs) 4 img

Additional tips for structuring an individual-based model in R

30.09.2014

 I had a reader ask me recently to help understand how to modify the code of an individual-based model (IBM) that I posted a while back. It was my first attempt at an IBM in R, and I realized that I have made some significant changes to the way that I code such models nowadays. Most of the changes are structural, but seem to help a lot in clearl...

2540 sym R (2882 sym/1 pcs) 4 img

Data point locator function

05.12.2014

Here’s a little function to select data points in an open graphical device (ptlocator()). The function does a scaling of the x and y axes in order to give them equal weighting and remove the influence of differing units or ranges. The function then calculates the Euclidean distance between the selected locations (using the locator() function) a...

1000 sym R (788 sym/2 pcs) 2 img

R package “fishdynr”

01.02.2015

The fishdynr package allows for the construction of some basic population dynamics models commonly used in fisheries science. Included are models of a single cohort, cohortSim, and a more complex iterative model that incorporates a stock-recruitment relationship, stockSim. The model functions require a list of parameters as the main argument, whi...

2223 sym R (2142 sym/1 pcs) 4 img