Publications by John Mount
Controlling Data Layout With cdata
Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on me #rstats folks? I only want to reshape two variables x & y from wide to long! Starting with: d xa xb ya yb 1 1 3 6 8 2 2 4 7 9 How can I get to: id t x y ...
1971 sym R (1911 sym/10 pcs) 6 img 4 tbl
Practical Data Science with R Book Update (April 2019)
I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more to come!). At this point the second edit...
3818 sym
Data Layout Exercises
John Mount, Nina Zumel; Win-Vector LLC 2019-04-27 In this note we will use five real life examples to demonstrate data layout transforms using the cdata R package. The examples for this note are all demo-examples from tidyr/demo/, and are mostly based on questions posted to StackOverflow. They represent a good cross-section of data layout problem...
11063 sym R (11263 sym/15 pcs) 10 tbl
Could not Resist
Also, Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019 is now content complete! It is deep into editing and soon into production! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about...
555 sym 2 img
What is “Tidy Data”?
I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. Wickham, Hadley “Tidy Data”...
4610 sym R (576 sym/2 pcs) 1 tbl
Timing Working With a Row or a Column from a data.frame
In this note we share a quick study timing how long it takes to perform some simple data manipulation tasks with R data.frames. We are interested in the time needed to select a column, alter a column, or select a row. Knowing what is fast and what is slow is critical in planning code, so here we examine some common simple cases. It is often impr...
6048 sym 8 img
Free Video Lecture: Vectors for Programmers and Data Scientists
We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material. Please check the lectures out. Vectors for Programmers and Data Sci...
880 sym 2 img
Practical Data Science with R, half off sale!
Our publisher, Manning, is running a Memorial Day sale this weekend (May 24-27, 2019), with a new offer every day. Fri: Half off all eBooks Sat: Half off all MEAPs Sun: Half off all pBooks and liveVideos Mon: Half off everything The discount code is: wm052419au. Many great opportunities to get Practical Data Science with R 2nd Edition at a...
764 sym 2 img
Technical books are amazing opportunities
Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of: Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data for many years. To us this reads as somebody w...
3906 sym 8 img
Estimating Rates using Probability Theory: Chalk Talk
We are sharing a chalk talk rehearsal on applied probability. We use basic notions of probability theory to work through the estimation of sample size needed to reliably estimate event rates. This expands basic calculations, and then moves to the ideas of: Sample size and power for rare events. Please check it out here. Related To leave a com...
725 sym