Publications by Brian Lee Yung Rowe

Matrix factorizations and social network graph analysis

09.12.2013

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. Consequently these lectures will not always be as rigorous as...

8769 sym R (894 sym/4 pcs) 32 img

Probability and Monte Carlo methods

15.12.2013

This is a lecture post for my students in the CUNY MS Data Analytics program. In this series of lectures I discuss mathematical concepts from different perspectives. The goal is to ask questions and challenge standard ways of thinking about what are generally considered basic concepts. I also emphasize using programming to help gain insight into ...

6760 sym R (1174 sym/9 pcs) 40 img

How to use vectorization to streamline simulations

06.01.2014

While grading some homework it became apparent that many of the idioms of R are not widely known and aren’t particularly intuitive to newcomers. Two key features of R (and why I like the language so much) are vectorization and higher order functions. These features overlap with functional programming and form a powerful toolkit to implement mod...

6700 sym R (1218 sym/5 pcs) 6 img

Me Meme: An adaptive news digest based on your interests

07.01.2014

About a month ago, I quietly released a daily news digest service, called Me Meme. This is an email companion to the still-private website that allows you to analyze and run social media models at the click of a button. I previously wrote about some of my models, and how they can infer someone’s interests simply by analyzing who the...

3687 sym 10 img

Automated parsing of ebola situation reports

05.01.2015

I’m pleased to announce public availability of ebola.sitrep, a package I wrote to download and parse ebola situation reports. The package currently knows how to process all PDF situation reports from the Ministries of Health for Liberia and Sierra Leone. These situation reports are the most immediate source of data as they are typically updated...

5841 sym R (2888 sym/9 pcs) 8 img

Type constraints and NAs in lambda.r

15.01.2015

Someone asked recently how lambda.r deals with NAs in type constraints. Type constraints are optional decorations on a function that enforces the type for each function argument. The short answer is that since NAs are typed, they work just like other values. Consider the toy function f(x) %::% numeric : numeric f(x) %as% x^2 Calling f with a ve...

1443 sym R (265 sym/6 pcs) 4 img

How to reliably access network resources in R

21.01.2015

It’s frustrating when an application unexpectedly dies due to a network timeout or unavailability of a network resource. Veterans of distributed systems know not to rely on network-based resources, such as web services or databases, since they can be unpredictable. So what is a data scientist supposed to do when you must use these resources in ...

2590 sym R (635 sym/6 pcs) 4 img

Chapter 3 of Modeling data with functional programming in R is out

08.02.2015

Chapter 3 of my book “Modeling data with functional programming in R” is available for download. This chapter describes map-vectorization and how it’s used in R. I make a distinction between different types of vectorization since f(x) = x^2 + 2*y - 5 is vectorized differently from sum(x). I call the first form map-vectorization, after the h...

1416 sym 4 img

Chapter 4 of Modeling data with functional programming in R is out

30.05.2015

This chapter is on what I call fold-vectorization. In some languages, it’s called reduce, though the concept is the same. Fold implements binary iterated function application, where elements of a sequence are passed along with an accumulator to a function. This process repeats such that each successive element is paired with the previous result...

1723 sym 4 img

Intraday time series analysis of the #rstats hashtag on Twitter

11.06.2015

This post is a lecture for IS624 Predictive Analytics, which is part of the CUNY Master’s program in Data Analytics. Twitter is renowned for spawning vibrant communities and discussion of current events. Many services exist to track hashtags for popularity, but less is known about the statistical characteristics of the timelines associated wit...

6003 sym R (1927 sym/7 pcs) 8 img