Publications by Brian Lee Yung Rowe
Matrix factorizations and social network graph analysis
This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. Consequently these lectures will not always be as rigorous as...
8769 sym R (894 sym/4 pcs) 32 img
Probability and Monte Carlo methods
This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. I also emphasize using programming to help gain insight into ...
6760 sym R (1174 sym/9 pcs) 40 img
How to use vectorization to streamline simulations
While grading some homework it became apparent that many of the idioms of R are not widely known and aren’t particularly intuitive to newcomers. Two key features of R (and why I like the language so much) are vectorization and higher order functions. These features overlap with functional programming and form a powerful toolkit to implement mod...
6700 sym R (1218 sym/5 pcs) 6 img
Me Meme: An adaptive news digest based on your interests
About a month ago, I quietly released a daily news digest service, called Me Meme. This is an email companion to the still-private website that allows you to analyze and run social media models at the click of a button. I previously wrote about some of my models, and how they can infer someone’s interests simply by analyzing who the...
3687 sym 10 img
Automated parsing of ebola situation reports
I’m pleased to announce public availability of ebola.sitrep, a package I wrote to download and parse ebola situation reports. The package currently knows how to process all PDF situation reports from the Ministries of Health for Liberia and Sierra Leone. These situation reports are the most immediate source of data as they are typically updated...
5841 sym R (2888 sym/9 pcs) 8 img
Type constraints and NAs in lambda.r
Someone asked recently how lambda.r deals with NAs in type constraints. Type constraints are optional decorations on a function that enforces the type for each function argument. The short answer is that since NAs are typed, they work just like other values. Consider the toy function f(x) %::% numeric : numeric f(x) %as% x^2 Calling f with a ve...
1443 sym R (265 sym/6 pcs) 4 img
How to reliably access network resources in R
It’s frustrating when an application unexpectedly dies due to a network timeout or unavailability of a network resource. Veterans of distributed systems know not to rely on network-based resources, such as web services or databases, since they can be unpredictable. So what is a data scientist supposed to do when you must use these resources in ...
2590 sym R (635 sym/6 pcs) 4 img
Chapter 3 of Modeling data with functional programming in R is out
Chapter 3 of my book “Modeling data with functional programming in R” is available for download. This chapter describes map-vectorization and how it’s used in R. I make a distinction between different types of vectorization since f(x) = x^2 + 2*y - 5 is vectorized differently from sum(x). I call the first form map-vectorization, after the h...
1416 sym 4 img
Chapter 4 of Modeling data with functional programming in R is out
This chapter is on what I call fold-vectorization. In some languages, it’s called reduce, though the concept is the same. Fold implements binary iterated function application, where elements of a sequence are passed along with an accumulator to a function. This process repeats such that each successive element is paired with the previous result...
1723 sym 4 img
Intraday time series analysis of the #rstats hashtag on Twitter
This post is a lecture for IS624 Predictive Analytics, which is part of the CUNY Master’s program in Data Analytics. Twitter is renowned for spawning vibrant communities and discussion of current events. Many services exist to track hashtags for popularity, but less is known about the statistical characteristics of the timelines associated wit...
6003 sym R (1927 sym/7 pcs) 8 img