Publications by Rstats on Julia Silge
Using tidycensus and leaflet to map Census data
Recently, I have been following the development and release of Kyle Walker’s tidycensus package. I have been filled with amazement, delight, and well, perhaps another feeling… There should be a word for “the regret felt when an R ?, which would have saved untold hours of your life, is released”… #rstats ? https://t.co/2THN4MwedO — M...
2172 sym R (2204 sym/2 pcs)
Text Mining of Stack Overflow Questions
Note: Cross-posted with the Stack Overflow blog. This week, my fellow Stack Overflow data scientist David Robinson and I are happy to announce the publication of our book Text Mining with R with O’Reilly. We are so excited to see this project out in the world, and so relieved to finally be finished with it! Text data is being generated all the ...
8450 sym 10 img
Navigating the R Package Universe
Earlier this month, I, along with John Nash, Spencer Graves, and Ludovic Vannoorenberghe, organized a session at useR!2017 focused on discovering, learning about, and evaluating R packages. You can check out the recording of the session. There are more than 11,000 packages on CRAN, and R users must approach this abundance of packages with effect...
3223 sym
Seeking guidance in choosing and evaluating R packages
At useR!2017 in Brussels last month, I contributed to an organized session focused on navigating the 11,000+ packages on CRAN. My collaborators on this session and I recently put together an overall summary of the session and our goals, and now I’d like to talk more about the specific issue of learning about R packages and deciding which ones t...
6844 sym R (684 sym/3 pcs) 2 img 1 tbl
Understanding gender roles in movies with text mining
I have a new visual essay up at The Pudding today, using text mining to explore how women are portrayed in film. The R code behind this analysis in publicly available on GitHub. I was so glad to work with the talented Russell Goldenberg and Amber Thomas on this project, and many thanks to Matt Daniels for inviting me to contribute to The Puddin...
812 sym 6 img
Sentiment analysis using tidy data principles at DataCamp
I’ve been developing a course at DataCamp over the past several months, and I am happy to announce that it is now launched! The course is Sentiment Analysis in R: the Tidy Way and I am excited that it is now available for you to explore and learn from. This course focuses on digging into the emotional and opinion content of text using sentimen...
1973 sym 2 img
tidytext 0.1.4
I am pleased to announce that tidytext 0.1.4 is now on CRAN! This release of our package for text mining using tidy data principles has an excellent collection of delightfulness in it. First off, all the important functions in tidytext now support support non-standard evaluation through the tidyeval framework. library(janeaustenr) library(tidytex...
2164 sym R (439 sym/1 pcs) 2 img
Mapping ecosystems of software development
I have a new post on the Stack Overflow blog today about the complex, interrelated ecosystems of software development. On the data team at Stack Overflow, we spend a lot of time and energy thinking about tech ecosystems and how technologies are related to each other. One way to get at this idea of relationships between technologies is tag correla...
1783 sym R (764 sym/1 pcs) 4 img
From Power Calculations to P-Values: A/B Testing at Stack Overflow
Note: cross-posted with the Stack Overflow blog. If you hang out on Meta Stack Overflow, you may have noticed news from time to time about A/B tests of various features here at Stack Overflow. We use A/B testing to compare a new version to a baseline for a design, a machine learning model, or practically any feature of what we do here at Stack Ov...
11938 sym 2 img
Word Vectors with tidy data principles
Last week I saw Chris Moody’s post on the Stitch Fix blog about calculating word vectors from a corpus of text using word counts and matrix factorization, and I was so excited! This blog post illustrates how to implement that approach to find word vector representations in R using tidy data principles and sparse matrices. Word vectors, or word ...
7183 sym R (8648 sym/16 pcs) 2 img