Publications by beckmw
Stealing from the internet: Part 1
Well, not stealing but rather some handy tools for data mining… About a year ago I came across the package XML as I was struggling to get some data from various web pages. The purpose of this blog is to describe how this package can be used to quickly gather data from the internet. I’ll first describe how the functions are used and then show ...
6936 sym 4 img
Breaking the rules with spatial correlation
Students in any basic statistics class are taught linear regression, which is one of the simplest forms of a statistical model. The basic idea is that a ‘response’ variable can be mathematically related to one or any number of ‘explanatory’ variables through a linear equation and a normally distributed error term. With any statistical too...
14188 sym 10 img
Data fishing: R and XML part 2
I’m constantly amazed at what can be done using free software, such as R, and more importantly, what can be done with data that are available on the internet. In an earlier post, I confessed to my sedentary lifestyle immersed in code, so my opinion regarding the utility of open-source software is perhaps biased. None the less, I’m convinced...
11365 sym 8 img 2 tbl
Collinearity and stepwise VIF selection
Collinearity, or excessive correlation among explanatory variables, can complicate or prevent the identification of an optimal set of explanatory variables for a statistical model. For example, forward or backward selection of variables could produce inconsistent results, variance partitioning analyses may be unable to identify unique sources of...
9206 sym 20 img
Data fishing: R and XML part 3
I’ve recently posted two blogs about gathering data from web pages using functions in R. Both examples showed how we can create our own custom functions to gather data about Minnesota lakes from the Lakefinder website. The first post was an example showing the use of R to create our own custom functions to get specific lake information and th...
6950 sym R (3978 sym/5 pcs) 8 img 3 tbl
Visualizing neural networks from the nnet package
Neural networks have received a lot of attention for their abilities to ‘learn’ relationships among variables. They represent an innovative technique for model fitting that doesn’t rely on conventional assumptions necessary for standard models and they can also quite effectively handle multivariate response data. A neural network model is...
8012 sym R (1947 sym/7 pcs) 10 img 1 tbl
Animating neural networks from the nnet package
My research has allowed me to implement techniques for visualizing multivariate models in R and I wanted to share some additional techniques I’ve developed, in addition to my previous post. For example, I think a primary obstacle towards developing a useful neural network model is an under-appreciation of the effects model parameters have on m...
9886 sym R (5133 sym/5 pcs) 10 img
A nifty line plot to visualize multivariate time series
A few days ago a colleague came to me for advice on the interpretation of some data. The dataset was large and included measurements for twenty-six species at several site-year-plot combinations. A substantial amount of effort had clearly been made to ensure every species at every site over several years was documented. I don’t pretend to hi...
7284 sym R (1847 sym/7 pcs) 16 img 2 tbl
How long is the average dissertation?
The best part about writing a dissertation is finding clever ways to procrastinate. The motivation for this blog comes from one of the more creative ways I’ve found to keep myself from writing. I’ve posted about data mining in the past and this post follows up on those ideas using a topic that is relevant to anyone that has ever considered ge...
7976 sym R (6040 sym/3 pcs) 14 img
Poor man’s integration – a simulated visualization approach
Every once in a while I encounter a problem that requires the use of calculus. This can be quite bothersome since my brain has refused over the years to retain any useful information related to calculus. Most of my formal training in the dark arts was completed in high school and has not been covered extensively in my post-graduate education. ...
8287 sym R (715 sym/6 pcs) 28 img 1 tbl