Publications by John Johnson
Simulating a Weibull conditional on time-to-event is greater than a given time
Recently, I had to simulate a time-to-event of subjects who have been on a study, are still ongoing at the time of a data cut, but who are still at risk of an event (e.g. progressive disease, cardiac event, death). This requires the simulation of a conditional Weibull. To do this, I created the following function:# simulate conditional Weibull co...
1931 sym 2 img
Which countries have Regrexit?
This doesn’t have a lot to do with bio part of biostatistics, but is an interesting data analysis that I just started. In the wake of the Brexit vote, there is a petition for a redo. The data for the petition is here, in JSON format.Fortunately, in R, working with JSON data is pretty easy. You can easily download the data from the l...
1989 sym 2 img
Which countries have Regrexit?
This doesn’t have a lot to do with bio part of biostatistics, but is an interesting data analysis that I just started. In the wake of the Brexit vote, there is a petition for a redo. The data for the petition is here, in JSON format.Fortunately, in R, working with JSON data is pretty easy. You can easily download the data from the l...
1917 sym 2 img
Windows 10 anniversary updates includes a whole Linux layer – this is good news for data scientists
If you are on Windows 10, no doubt you have heard that Microsoft included the bash shell in its 2016 Windows 10 anniversary update. What you may not know is that this is much, much more than just the bash shell. This is a whole Linux layer that enables you to use Linux tools, and does away with a further layer like Cygwin (which requi...
1788 sym
Plotting GeoJSON data on a map with R
GeoJSON is a standard text-based data format for encoding geographical information, which relies on the JSON (Javascript object notation) standard. There are a number of public datasets for Greenville, SC that use this format, and, the R programming language makes working with these data easy. Install the rgeojson library, which is part of the RO...
3418 sym R (1364 sym/16 pcs)
I set up a new data analysis blog
Well, I tried to write a blog post using the RStudio Rmarkdown system, and utterly failed. Thus, I set up a system where I could write from RStudio. So I set up a Github pages blog at randomjohn.github.io. There I can easily write and publish posts involving data analysis. Related To leave a comment for the author, please follow t...
699 sym
Plotting GeoJSON polygons on a map with R
In a previous post we plotted some points, retrieved from a public dataset in GeoJSON format, on top of a Google Map of the area surrounding Greenville, SC. In this post we plot some public data in GeoJSON format as well, but instead of particular points, we plot polygons. Polygons describe an area rather than a single point. As before, to set up...
2498 sym R (1613 sym/19 pcs)
What do they talk about on Greenville Reddit?
Reddit is a discussion forum website with many discussion rooms (“subreddits”) on different topics. Greenville, SC has its own subreddit. It might be of interest to see what kind of discussions take place. We can do this in a systematic way using the R software through a technique called text mining. We will do a simple text mining exercise h...
781 sym
Greenville on Twitter
In this blogpost, we use R to use Twitter data to analyze topics of interest to Greenville, SC. We will describe obtaining, manipulating, and summarizing the data. Twitter is a “microblogging” service where users can, usually publicly, share links, pictures, or short comments (up to 140 characters) onto a timeline. The public timeline consist...
9867 sym R (9928 sym/28 pcs) 8 img
Inauguration speeches
Acquiring inauguration speeches Though not about Greenville especially, it might be interesting to quantitatively analyze inauguration speeches. This analysis will be done using two paradigms: the tm package and the tidytext package. We will read the speeches in such a way that we use the tidytext package; later on we will use some tools from tha...
7411 sym R (11493 sym/23 pcs) 24 img