Publications by Rstats on Julia Silge

Measuring Gobbledygook


In learning more about text mining over the past several months, one aspect of text that I’ve been interested in is readability. A text’s readability measures how hard or easy it is for a reader to read and understand what a text is saying; it depends on how sentences are written, what words are chosen, and so forth. I first becam...

923 sym

Reddit Responds to the Election


It’s been about a month since the U.S. presidential election, with Donald Trump’s victory over Hillary Clinton coming as a surprise to most. Reddit user Jason Baumgartner collected and published every submission and comment posted to Reddit on the day of (and a bit surrounding) the U.S. election; let’s explore this data set and ...

932 sym

Text Mining in R: A Tidy Approach


I spoke on approaching text mining tasks using tidy data principles at rstudio::conf yesterday. I was so happy to have the opportunity to speak and the conference has been a great experience. If you want to catch up on what has been going on at rstudio::conf, Karl Broman put together a GitHub repo of slides and Sharon Machlis has bee...

995 sym

Women in the 2016 Stack Overflow Survey


Note: Cross-posted with the Stack Overflow blog The 2017 Stack Overflow Developer Survey opened last week, and we on the Data Team are looking forward to analyzing the survey results to better understand our developer community. I am particularly interested in women in tech, for probably obvious reasons, and recently I explored last y...

862 sym

What Programming Languages Are Used Most on Weekends?


Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do in my day job here at Stack Overflow,...

879 sym

Scraping CRAN with rvest


I am one of the organizers for a session at userR 2017 this coming July that will focus on discovering and learning about R packages. How do R users find packages that meet their needs? Can we make this process easier? As somebody who is relatively new to the R world compared to many, this is a topic that resonates with me and I am ha...

787 sym

How Do You Discover R Packages?


Like I mentioned in my last blog post, I am contributing to a session at userR 2017 this coming July that will focus on discovering and learning about R packages. This is an increasingly important issue for R users as we all decide which of the 10,000+ packages to invest time in understanding and then use in our work. library(dplyr) a...

868 sym

Gender Roles with Text Mining and N-grams


Today is the one year anniversary of the janeaustenr package’s appearance on CRAN, its cranniversary, if you will. I think it’s time for more Jane Austen here on my blog. via GIPHY I saw this paper by Matthew Jockers and Gabi Kirilloff a number of months ago and the ideas in it have been knocking around in my head ever since. The authors of...

5315 sym R (6531 sym/10 pcs) 6 img



A couple of weeks ago, I saw on Dirk Eddelbuettel’s blog that R 3.4.0 was going to include a function for obtaining information about packages currently on CRAN, including basically everything in DESCRIPTION files. When R 3.4.0 was released, this was one of the things I was most immediately excited about exploring, because although I recently d...

3762 sym R (8111 sym/10 pcs) 6 img

tidytext 0.1.3


I am pleased to announce that tidytext 0.1.3 is now on CRAN! In this release, my collaborator David Robinson and I have fixed a handful of bugs, added tidiers for LDA models from the mallet package, and updated functions for changes to quanteda’s API. You can check out the NEWS for more details on changes. One enhancement in this release is the...

3252 sym R (4291 sym/6 pcs) 4 img