Publications by Neil Gunther
Playing with Primes in R (Part II)
Popping Part III off the stack—where I ended up unexpectedly discovering that the primes and primlist functions are broken in the schoolmath package on CRAN—let’s see what prime numbers look like when computed correctly in R. To do this, I’ve had to roll my own prime number generating function.Personalizing primes in RFor what...
11592 sym R (2249 sym/11 pcs) 4 img
Linear Modeling in R and the Hubble Bubble
Here is a scatter plot with the coordinate labels deliberately omitted.Figure 1.Do you see any trends? How would you model these data? It just so happens that this scatterplot is arguably the most famous scatterplot in history. One aficionado, writing more than forty years after its publication, commented skeptically [1]:“[The] data...
7402 sym R (1780 sym/1 pcs) 10 img
Prime Parallels for Load Balancing
Having finally popped the stack on computing prime numbers with R in Part II and Part III, we are now in a position to discuss their relevance for computational scalability. My original intent was to show how poor partitioning of a workload can defeat the linear scalability expected when full parallelism is otherwise attainable, i.e....
5709 sym R (1036 sym/4 pcs) 4 img 10 tbl
Go Guerrill… R on Your Data in August
Only one month to go! Register now for the Guerrilla Data Analysis Techniques (GDAT) class to be held during the week of August 9-13, 2010. The focus will be on using R and the PDQ-R for computer performance analysis and capacity planning.(Click on the image for details)For those of you coming from international locations, here is a...
838 sym 2 img
Gone Guerrill_ R on Our Data
Here’s a summary of some things we learnt about applying R to computer performance and capacity planning data in the GDAT Class last week. Neural nets pkg nnet applied to CPU performance data in the Ripley and Venables book (see Section 8.10). How to do stacked plots that Jim calls “spark plots.” Jim told us that ggplot has a...
1408 sym
Excel Errors and Other Numerical Nightmares
Although I use Excel all the time, and I strongly encourage my students to use it for performance analysis and CaP, I was forced to include a warranty disclaimer in my GCaP book because I discovered a serious numerical error while writing Appendix B. There, my intention was just to show that Excel gives essentially the same results as...
3649 sym R (166 sym/1 pcs) 4 img
Where to Start with PDQ?
Once you’ve downloaded PDQ with a view to solving your performance-related questions, the next step is getting started using it. Why not have some fun with blocks? Fun-ctional blocks, that is. Since all digital computers and network systems can be considered as a collection of functional blocks and these blocks often contain buffers...
2782 sym R (1186 sym/2 pcs) 2 img
Confidence Bands for Universal Scalability Models
In the recent GDAT class, confidence intervals (CI) for performance data were discussed. Their generalization to confidence bands (CB) for scalability projections using the USL model also came up informally. I showed a prototype plot but it was an ugly hack. Later requests from GDAT attendees to apply CBs to their own data meant I ha...
3916 sym R (2622 sym/3 pcs) 4 img
Reporting Standard Errors for USL Coefficients
In a recent Guerrilla CaP Group discussion, Baron S. wrote:.... BS> Using gnuplot against the dataset I gave, I get BS> sigma 0.0207163 +/- 0.001323 (6.385%) BS> kappa 0.000861226 +/- 5.414e-05 (6.287%) The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question ab...
2233 sym R (2333 sym/6 pcs)
Applying PDQ in R to Load Testing
PDQ is a library of functions that helps you to express and solve performance questions about computer systems using the abstraction of queues. The queueing paradigm is a natural choice because, whether big (a web site) or small (a laptop), all computer systems can be represented as a network or circuit of buffers and a buffer is a ty...
4151 sym R (5849 sym/8 pcs) 4 img