Publications by John Mount

Controlling Data Layout With cdata

16.04.2019

Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on me #rstats folks? I only want to reshape two variables x & y from wide to long! Starting with: d xa xb ya yb 1 1 3 6 8 2 2 4 7 9 How can I get to: id t x y ...

1971 sym R (1911 sym/10 pcs) 6 img 4 tbl

Practical Data Science with R Book Update (April 2019)

22.04.2019

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more to come!). At this point the second edit...

3818 sym

Data Layout Exercises

27.04.2019

John Mount, Nina Zumel; Win-Vector LLC 2019-04-27 In this note we will use five real life examples to demonstrate data layout transforms using the cdata R package. The examples for this note are all demo-examples from tidyr/demo/, and are mostly based on questions posted to StackOverflow. They represent a good cross-section of data layout problem...

11063 sym R (11263 sym/15 pcs) 10 tbl

Could not Resist

29.04.2019

Also, Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019 is now content complete! It is deep into editing and soon into production! Related To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials about...

555 sym 2 img

What is “Tidy Data”?

11.05.2019

I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. Wickham, Hadley “Tidy Data”...

4610 sym R (576 sym/2 pcs) 1 tbl

Timing Working With a Row or a Column from a data.frame

15.05.2019

In this note we share a quick study timing how long it takes to perform some simple data manipulation tasks with R data.frames. We are interested in the time needed to select a column, alter a column, or select a row. Knowing what is fast and what is slow is critical in planning code, so here we examine some common simple cases. It is often impr...

6048 sym 8 img

Free Video Lecture: Vectors for Programmers and Data Scientists

20.05.2019

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material. Please check the lectures out. Vectors for Programmers and Data Sci...

880 sym 2 img

Practical Data Science with R, half off sale!

24.05.2019

Our publisher, Manning, is running a Memorial Day sale this weekend (May 24-27, 2019), with a new offer every day. Fri: Half off all eBooks Sat: Half off all MEAPs Sun: Half off all pBooks and liveVideos Mon: Half off everything The discount code is: wm052419au. Many great opportunities to get Practical Data Science with R 2nd Edition at a...

764 sym 2 img

Technical books are amazing opportunities

06.06.2019

Nina and I have been sending out drafts of our book Practical Data Science with R 2nd Edition for technical review. A few of the reviews came back from reviewers that described themselves with variations of: Senior Business Analyst for COMPANYNAME. I have been involved in presenting graphs of data for many years. To us this reads as somebody w...

3906 sym 8 img

Estimating Rates using Probability Theory: Chalk Talk

10.06.2019

We are sharing a chalk talk rehearsal on applied probability. We use basic notions of probability theory to work through the estimation of sample size needed to reliably estimate event rates. This expands basic calculations, and then moves to the ideas of: Sample size and power for rare events. Please check it out here. Related To leave a com...

725 sym