Publications by Steph
Giving back with code
Tweet From code in answers on Stack Overflow to R packages or full programs, there’s a lot of code being written and given away. This post examines some of the reasons why the people writing all that code do it, why you should consider giving back with code, and how you can get started. Finally, I cap it all off with perspectives from some of ...
9914 sym 12 img
HIBPwned updated on CRAN
Tweet Haveibeenpwned.com is a fantastic service that helps people find out if they’ve been involved in a data breach. HIBPwned is an R wrapper for that service. Recently, due to abuse of the system, Troy Hunt had to add a limit of one request per 1.5s. The new version published on CRAN last night adds a delay into each call so that we can cont...
918 sym
GirlswithDeepPockets.com
Tweet Ok, this post is about one of my latest crazy/harebrained/whacky ideas. I’m fed up of having to carry my Galaxy Note 3 in my hand. I can’t stand handbags and most women’s clothing items don’t have pockets or the pockets are insufficient. Given how easy it is to build a website these days, I thought I’d become a sofa warrior for t...
1935 sym
Slack all the things!
Tweet Slack all the things! OK, if you haven’t heard of it before Slack is kinda like IRC, kinda like Dropbox, kinda like a lot of things – it’s a neat place to bring together communications between your team or community, and the integrations allow you to pipe in external feeds like twitter activity or RSS. It’s a great way of collabora...
2177 sym 4 img
CRISP-DM and why you should know about it
Tweet The Cross Industry Standard Process for Data Mining (CRISP-DM) was a concept developed 20 years ago now. I’ve read about it in various data mining and related books and it’s come in very handy over the years. In this post, I’ll outline what the model is and why you should know about it, even if it has that terribly out of vogue phras...
6467 sym 4 img
Is my time series additive or multiplicative?
Time series data is an important area of analysis, especially if you do a lot of web analytics. To be able to analyse time series effectively, it helps to understand the interaction between general seasonality in activity and the underlying trend. The interactions between trend and seasonality are typically classified as either additive or multip...
6371 sym R (1903 sym/8 pcs) 4 img 7 tbl
Quick tip: knitr Python Windows setup checklist
One of the nifty things about using R is that you can use it for many different purposes and even other languages! If you want to use Python in your knitr docs or the newish RStudio R notebook functionality, you might encounter some fiddliness getting all the moving parts running on Windows. This is a quick knitr Python Windows setup checklist to...
1484 sym 2 img
Announcing community R workshops
A big part of why I’ve launched Locke Data is so that I can give back more to my communities. I want to give more time and more support to others. One of the first steps is doing some activities that give financial support to community groups without damaging my startup cashflow! Community R workshops that fund local user groups is the first ac...
2652 sym
R Quick tip: Microsoft Cognitive Services’ Text Analytics API
Today in class, I taught some fundamentals of API consumption in R. As it was aligned to some Microsoft content, we first used HaveIBeenPwned.com‘s API and then played with Microsoft Cognitive Services‘ Text Analytics API. This brief post overviews what you need to get started, and how you can chain consecutive calls to these APIs in order to...
2287 sym R (1382 sym/7 pcs) 2 img 1 tbl
Community workshops
Following on from when we announced the availability of our community workshops, we’ve got three in the next three months that folks can attend. May 19th – Data science project in a day We’ll be in Kiev, Ukraine, doing a whole data science project in a day. This is intended to give people a little bit of code, process, and critical thinking...
2034 sym