Publications by Rolf Fredheim
Experiments in python and d3 from R: GDELT made easy
Related To leave a comment for the author, please follow the link and comment on their blog: Quantifying Memory. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here...
400 sym
Fun simulating Wimbledon in R and Python
R and Python have different strengths. There’s little you can do in R you absolutely can’t do in Python and vice versa, but there’s a lot of stuff that’s really annoying in one and nice and simple in the other. I’m sure simulations can be run in R, but it seems frightfully tricky. Recently I wrote a simple Tennis simulator in Python, ...
4532 sym R (2210 sym/5 pcs) 8 img 1 tbl
Scaling up text processing and Shutting up R: Topic modelling and MALLET
In this post I show how a combination of MALLET, Python, and data.table means we can analyse quite Big data in R, even though R itself buckles when confronted by textual data. Topic modelling is great fun. Using topic modelling I have been able to separate articles about the ‘Kremlin’ as a) a building, b) an international actor c) the advers...
6473 sym 4 img
Databases for text analysis: archive and access texts using SQL
This post is a collection of scripts I’ve found useful for integrating a SQL database into more complex applications. SQL allows quickish access to largish repositories of text (I wrote about this at some length here), and are a good starting point for taking textual analysis beyond thousands of texts.I timed Python to be thirteen t...
5824 sym
Visualising Structure in Topic Models
How exactly should we visualise topic models to get an overview of how topics relate to each other? This post is a brief lit review of that debate – I realise the subject matter is sooo last year. I also present my chosen solution to the dilemma: I use dendrograms to position topic, and add a network visualisation using an arcplot to expose lin...
12823 sym 10 img
Plugging hierarchical data from R into d3
Here I show how to convert tabulated data into a json format that can be used in d3 graphics. The motivation for this was an attempt at getting an overview of topic models (link). Illustrations like the one to the right are very attractive; my motivation to learn how to make them was that the radial layout sometimes saves a lot of space – in my...
7179 sym R (444 sym/1 pcs) 6 img
Web-Scraping: the Basics
Slides from the first session of my course about web scraping through R: Web scraping for the humanities and social sciencesIncludes an introduction to the paste function, working with URLs, functions and loops. Putting it all together we fetch data in JSON format about Wikipedia page views from http://stats.grok.se/Solutions here: Do...
794 sym
Web Scraping part2: Digging deeper
Slides from the second web scraping through R session: Web scraping for the humanities and social sciencesIn which we make sure we are comfortable with functions, before looking at XPath queries to download data from newspaper articles. Examples including BBC news and Guardian commentsDownload the .Rpres file to use in Rstudio hereA r...
804 sym
Web Scraping: Scaling up Digital Data Collection
The latest slides from web scraping through R: Web scraping for the humanities and social sciencesSlides from the first session hereSlides from the second session hereThis week we look in greater detail at scaling up digital data-collection: coercing scraper output into dataframes, how to download files (along with a cursory look at t...
991 sym
Web Scraping: working with APIs
APIs present researchers with a diverse set of data sources through a standardised access mechanism: send a pasted together HTTP request, receive JSON or XML in return. Today we tap into a range of APIs to get comfortable sending queries and processing responses. These are the slides from the final class in Web Scraping through R: Web...
1801 sym