Publications by Julia Silge
A Tall Drink of Water
In a previous post, I used water consumption data from Utah’s Open Data Catalog to explore what kind of users consume the most water in my home here in Salt Lake City, what the annual pattern of water use is, and how the drought of the past few years has affected water use. I made a predictive model for the total aggregate water use of the city...
10970 sym R (3890 sym/14 pcs) 16 img
Your Floor Is the Most Dangerous Thing In Your House
I saw this analysis at Flowing Data about the most common consumer products involved in hospital ER visits and was delighted, interested, etc. Nathan’s next related post is, um, also super interesting, if entirely horrifying. Apparently, I am not the only one who thought this data set was compelling, because this week Hadley Wickham took the NE...
8264 sym R (9811 sym/26 pcs) 18 img
My Baby Boomer Name Might Have Been “Debbie”
I have always loved learning and thinking about names, how they are chosen and used, and how people feel about their names and the names around them. We had a traditional baby name book at our house when I was growing up (you know, lists of names with meanings), and I remember poring over it to find unusual or appealing names for my pretend play ...
7367 sym R (3446 sym/7 pcs) 18 img
You Must Allow Me To Tell You How Ardently I Admire and Love Natural Language Processing
It is a truth universally acknowledged that sentiment analysis is super fun, and Pride and Prejudice is probably my very favorite book in all of literature, so let’s do some Jane Austen natural language processing. Project Gutenberg makes e-texts available for many, many books, including Pride and Prejudice which is available here. I am using t...
8707 sym R (10719 sym/27 pcs) 10 img
If I Loved Natural Language Processing Less, I Might Be Able to Talk About It More
In my last post, I did some natural language processing and sentiment analysis for Jane Austen’s most well-known novel, Pride and Prejudice. It was just so much fun that I wanted to extend some of that work and compare across her body of writing. I decided to make an R package for her texts, for easy access for myself and anybody else who would...
8691 sym R (8505 sym/15 pcs) 14 img
Trump Losing and Feeling the Bern in Utah
Well, it’s been an interesting election season so far, right? Everybody holding up OK? Utah held its caucuses this past Tuesday on March 22 and I thought I would do a bit of plotting to show the results. We can get the JSON data from CNN, as pointed out by Bob Rudis in his post here. Utah’s results were not available when he wrote that post b...
5783 sym R (6250 sym/13 pcs) 6 img
I Went to ROpenSci Unconference and All I Got Were These Lousy Hex Stickers
Just kidding; it was amazing. Last week, I traveled to San Francisco to participate in an unconference/hackathon organized and hosted by ROpenSci. This was my first R conference or meeting, and it was a such a great experience. I am still feeling a bit at a loss for words about what a tremendous time I had, actually, but I will make an attempt to...
3781 sym
Who Came to Vote in Utah’s Caucuses?
Late last month, I analyzed results from Utah’s Republican and Democratic caucuses to show how the different presidential candidates fared across Utah. That was fun work to do, but I realized there was one more map I wanted to make; I want to compare the Republican and Democratic voter turnout across the counties in Utah. Utah is a politically ...
4764 sym R (2919 sym/6 pcs) 2 img
How I Learned to Stop Worrying and Love R CMD Check
Last week, I officially became the maintainer of a CRAN package! My package for the texts of Jane Austen’s 6 completed, published novels, janeaustenr, was released on CRAN and my Twitter feed was filled with congratulatory Jane Austen GIFs. I think this might be my favorite. .@juliasilge *clears schedule**opens @rstudio * pic.twitter.com/Hu7V2E...
8423 sym R (261 sym/4 pcs) 2 img
The Life-Changing Magic of Tidying Text
When I went to the rOpenSci unconference about a month ago, I started work with Dave Robinson on a package for text mining using tidy data principles. What is this tidy data you keep hearing so much about? As described by Hadley Wickham, tidy data has a specific structure: each variable is a column each observation is a row each type of observat...
11716 sym R (11009 sym/33 pcs) 8 img