Publications by Derek Jones

Patches for the code of Peter Turchin’s Attrition Warfare Model

17.12.2023

The paper Empirically Testing Predictions of an Attrition on Warfare Model for the War in Ukraine, by Peter Turchin, recently showed up during one of my regular searches for software engineering data. A quick scan of the paper founded that it is very empirical, and that the analysis coding was done in R; I could not resist checking out the source ...

5976 sym 8 img

Multi-state survival modeling of a Jira issues snapshot

10.07.2022

Work items in a formal development process progress through a series of stages, e.g., starting at Open, perhaps moving to Withdrawn or Merged with another item, eventually reaching Development, and finishing at Done (with a few being Reopened, i.e., moving back to the start of the process). This process can be modelled as a Markov chain, provided...

4464 sym R (505 sym/3 pcs) 10 img

Extracting numbers from a stacked density plot

17.07.2022

A month or so ago, I found a graph showing a percentage of PCs having a given range of memory installed, between March 2000 and April 2020, on a TechTalk page of PC Matic; it had the form of a stacked density plot. This kind of installed memory data is rare, how could I get the underlying values (a previous post covers extracting data from a hea...

2830 sym R (1226 sym/3 pcs) 4 img

Transition probabilities when adjacent sequence items must be different

24.09.2012

Generating a random sequence from a fixed set of items is a common requirement, e.g., given the items A, B and C we might generate the sequence BACABCCBABC. Often the randomness is tempered by requirements such as each item having each item appear a given number of times in a sequence of a given length, e.g., in a random sequence of 100 items A ...

3418 sym R (1461 sym/6 pcs) 6 tbl

Sequence generation with no duplicate pairs

04.10.2012

Given a fixed set of items (say, 6 A, 12 B and 12 C) what algorithm will generate a randomised sequence containing all of these items with any adjacent pairs being different, e.g., no AA, BB or CC in the sequence? The answer would seem to be provided in my last post. However, turning this bit of theory into practice uncovered a few problems. Be...

6566 sym Python (865 sym/1 pcs) 10 img 1 tbl

Agreement between code readability ratings given by students

13.10.2012

I have previously written about how we know nothing about code readability and questioned how the information content of expressions might be calculated. Buse and Weimer ran a very interesting experiment that asked subjects to rate short code snippets for readability (somebody please rerun this experiment using professional software developers)....

12600 sym R (439 sym/3 pcs) 6 img 3 tbl

Break even ratios for development investment decisions

22.10.2012

Developers are constantly being told that it is worth making the effort when writing code to make it maintainable (whatever that might be). Looking at this effort as an investment what kind of return has to be achieved to make it worthwhile? Short answer: The percentage saving during maintenance has to be twice as great as the percentage investm...

8236 sym R (19 sym/1 pcs) 56 img 2 tbl

Distribution of uptimes for high-performance computing systems

28.11.2012

Computers break down every now and again and this is a serious problem when an application needs runs on thousands of individual computers (nodes) plugged together; lots more hardware creates lots more opportunity for a failure that renders any subsequent calculations by working nodes possible wrong. The solution is checkpointing; saving the sta...

10898 sym 6 img 1 tbl

My R naming nemesis

17.12.2012

When learning a new language I try to make an effort to write it like a native developer. R has one language feature that has been severely testing my desire to write like a native and this afternoon I realized that most of the people reading my code will also experience the same jarring sensation on encountering this construct, so I am not goin...

4056 sym

Does native R usage exist?

22.02.2013

Note to R users: Users of other languages enjoy spending lots of time discussing the minutiae of the language they use, something R users don’t appear to do; perhaps you spend your minutiae time on statistics which I don’t yet know well enough to spot when it occurs). There follows a minutiae post that may appear to be navel-gazing to you (...

6330 sym R (635 sym/3 pcs) 3 tbl