Publications by Derek Jones
XSLT, yacc and Yorick
X and Y are for XSLT, yacc and Yorick. XSLT is the tree climbing Kangaroo of the programming language world. Eating your own dog food is good practice for implementors, but users should not be forced to endure it. Anyway, people only use XML, rather than JSON, to increase the size of their files so they can claim to be working with big data. ya...
2436 sym 10 img
Extracting the original data from a heatmap image
The paper Analysis of the Linux Kernel Evolution Using Code Clone Coverage analysed 136 versions of Linux (from 1.0 to 2.6.18.3) and calculated the amount of source code that was shared, going forward, between each pair of these versions. When I saw the heatmap at the end of the paper (see below) I knew it had to appear in my book. The paper w...
3518 sym 4 img
R’s plot function, the 1970′s retro look is not cool any more
Casual users of a system want to learn a few simple rules that enable them to get most things done. Many languages have a design principle of only providing one way of doing things. Members of one language family are known for providing umpteen different ways of doing something and R is no exception. R comes with the plot function as part of th...
2100 sym
Aggregate player preference for the first 20 building created in Illyriad
I was at the Microsoft Gaming data hackathon today. Gaming is very big business and companies rarely publish detailed game data. Through contacts one of the organizers was able to obtain two gaming datasets, both containing just under 300M of compressed of data. Illyriad supplied a random snapshot of anonymised data on 50,000 users and Mediato...
5056 sym R (632 sym/2 pcs) 2 tbl
R is now important enought to have a paid for PR make-over
With the creation of the R consortium R has moved up a rung on the ladder of commercial importance. R has captured the early adopters and has picked up a fair few of the early majority (I’m following the technology adoption life-cycle model made popular by the book Crossing the Chasm), i.e., it is starting to become mainstream. Being mainstrea...
2906 sym
R recommended usage for professional developers
R is not one of those languages where there is only one way of doing something, the language is blessed/cursed with lots of ways of doing the same thing. Teaching R to professional developers is easy in the sense that their fluency with other languages will enable them to soak up this small language like a sponge, on the day they learn it. The p...
3514 sym
subset vs array indexing: which will cause the least grief in R?
The comments on my post outlining recommended R usage for professional developers were universally scornful, with my proposal recommending subset receiving the greatest wrath. The main argument against using subset appeared to be that it went against existing practice, one comment linked to Hadley Wickham suggesting it was useful in an interacti...
3560 sym R (249 sym/5 pcs) 5 tbl
Workshop on survival and time series analysis in empirical SE
In January the material in my book on Empirical software engineering using R had its first exposure to professional software developers at a one day workshop (there was a rerun last week; slides here). The sessions were both fully booked, but as often happens on half turned up, around 15 at each workshop. A couple of people turned up expecting ...
2840 sym
cpu+FPGA: applications can soon have bespoke instructions
Compiler writers are always frustrated that the cpu they are currently targeting does not contain the one instruction that would enable them to generate really efficient code. If only it were possible to add new instructions to the cpu. Well, it looks like this will soon be possible; Intel have added an on chip FPGA to their Broadwell processo...
3938 sym
Software engineering data sets
The pretty pictures from my empirical software engineering book are now online, along with the 210 data sets and R code (330M). Plotting the number of data sets in each year shows that empirical software engineering has really taken off in the last 10 years (code+data). Around dozen or so confidential data sets are not included; I am only writin...
2339 sym 2 img