Publications by Martin Chan
Harnessing Azure OpenAI and R for Web Content Summarisation: A Practical Guide with rvest and tidyverse
Introduction In last week’s article, we covered how you can interact with a local language model from R, using LM Studio and the Phi-3 model. Although local language models have the advantage of not incurring any costs asides from your electricity bills, they are likely to be less powerful than the more complex large language models (LLMs) that a...
10795 sym R (5605 sym/15 pcs) 22 img 1 tbl
Common Statistical Tests in R – Part I
Introduction This post will focus on common statistical tests in R to understand and validate the relationship between two variables. There must be tons of similar tutorials around, you may be thinking. So why? The primary (and selfish) goal of the post is to create a guide that is practical enough for myself to refer to from time to time. This p...
23285 sym R (6828 sym/18 pcs) 14 img
First Post
Why start a R blog? Ever since I discovered r-bloggers and had subsequently learnt immensely from the articles contributed by R users all around the world, I’ve wanted to start a R blog myself. Part of the motivation is give back to the open source community. Since I myself had benefitted so much from R vignettes, blogs, and Stack Overflow dis...
2675 sym R (91 sym/1 pcs) 4 img
A Short R Package Review: RQDA
A favourite R package? Whenever I’m asked the question of what my favourite R package is, I often go through this reasoning: tidyverse packages, such as dplyr and tidyr, are what I’d call “essentials” i.e. packages that I would always load for almost every piece of analysis in R. I love tidyverse, but when we are talking about a favourit...
7457 sym R (406 sym/1 pcs) 4 img
Using data.table with magrittr pipes: best of both worlds
Should we use magrittr pipes with data.table? Why ask the question? If you are fairly new to R, you might find it puzzling / intriguing that R questions on Stack Overflow tend to attract a range of solutions which all have different syntax “styles”, but almost all seem to be valid answers to some extent (as indicated by the number of upvotes...
6855 sym R (1305 sym/2 pcs) 6 img
My favourite alternative to Excel dashboards
Excel dashboards are great… what? ???? For all the complaints that people have for Excel, it still has many clear, indisputable advantages. For one, it is extremely accessible – almost everyone has Excel installed on their computer. It’s familiar to most people, and practically anyone who can use a computer will know how to perform basic op...
8576 sym 4 img
Two Styles of Learning R
What’s the best way to learn R? Motivations behind the debate Some argue that R fundamentally has a steep learning curve, and that there are no real shortcuts for learning R. I don’t completely agree with that: I think that there are easier ways to learn R nowadays, specifically with the availability and expansion of the tidyverse collection...
7965 sym 4 img
Vignette: Scraping Amazon Reviews in R
Background One of the pet projects that I had been working on earlier in the year was to figure out an efficient way to gain an insight into what is going on in a consumer market, e.g.: What do people look for when they’re buying a product? What are the typical pain points / causes of frustration in the purchase process or in a product itself?...
5978 sym R (2670 sym/5 pcs) 6 img
Vignette: a ‘Copy & Paste’ R workflow for word clouds
Background Anyone who has created wordclouds for a presentation before will know that it is an iterative process. Not only do you have to remove “useless” stop words (e.g. the, at, am), you may also need to process word stemming so that words with the same stem do not appear more than once (e.g. “analysis”, “analyse”, “analyze”)...
4681 sym R (1092 sym/4 pcs) 4 img
Working with SPSS labels in R
TL;DR ???? This post provides an overview of R functions for dealing with survey data labels, particularly ones that I wish I’d known when I first started out analysing survey data in R (primarily stored in SPSS data files). Some of these functions come from surveytoolbox, a package I’m developing (GitHub only) which contains a collection of ...
12071 sym R (7119 sym/16 pcs) 6 img