Publications by Sharp Sight Labs
Stop trying to jump to the sexy stuff first
A few weeks ago, an acquaintance told me that he was interested in getting started with machine learning. He’s a web developer who primarily works in Ruby and Python, but also has a small amount of experience with R. Day-to-day, his work is run-of-the-mill web development, and he’s confessed to me that he’s a bit bored and looking for some...
4852 sym
How to make a simple heatmap in ggplot2
In the world of data visualization, the heatmap is underrated and underutilized. It has limitations, but overall, it’s an excellent tool in your data science and data visualization toolkit. After you’ve mastered the foundational visualization techniques (you can write the code for the basic plots in your sleep, right?), you should learn the h...
7293 sym R (641 sym/1 pcs) 4 img
A simple histogram (and why you need to practice it)
In data science, before doing almost anything else, you need to know your data. This is why visualization is one of the pillars of data-science: visualization allows you to see your data and “know” it in a way that your mind is wired for. (And, it’s why I emphasize mastering data visualization before almost anything else.) In practice, “k...
6610 sym R (1250 sym/2 pcs) 2 img
How much data science do you actually remember?
How many data science books have you read? 5? 10? A few dozen? How many free online courses have you taken? A few? How many blog posts have you read? (I’d be willing to bet: you’ve read dozens.) If you’re like most budding data scientists, you’ve probably consumed a lot of material. You probably even learned some of it. The problem t...
9466 sym R (228 sym/1 pcs) 2 img
Why you should master R (even if it might eventually become obsolete)
In last week’s blog post I asked How much data science do you actually remember? It’s a critical question. If you study data science, but forget everything that you learn, you’ll be in big trouble when you go in for an interview. Or, you’ll be in big trouble if you actually get a data science job, but you’ve forgotten the essential sk...
17649 sym
Why R is the best data science language to learn today
In last week’s blog, I explained why you should Master R (even if it may eventually become obsolete). I wrote that article to address people who claim mastering R is a bit of a waste of time (because it will eventually become obsolete). But when I suggested that R may eventually become obsolete, this seemed to provoke fear that R is becoming ...
14964 sym 2 img
The best R package for learning to “think about visualization”
As a beginning data scientist, you’ll have quite a few subject areas that you need to learn (and eventually master). While you’ll certainly need to learn some math and statistics, math and stats are not the first things I recommend to most beginners. Almost always, I recommend that people start with data visualization. The reason for this, is...
14123 sym R (1698 sym/4 pcs) 12 img
How to really do an analysis in R (part 1, data manipulation)
For a couple of years, I’ve been writing about the importance of data analysis, saying that data analysis is essentially the foundation of data science itself. While I admit that there’s room for argument (and I admit that the reality is a little more nuanced) I still firmly believe that data analysis is the true foundation of practical dat...
19921 sym R (16059 sym/16 pcs) 2 img
How to do an analysis in R (part 2, visualization and analysis)
In several recent blog posts, I’ve emphasized the importance of data analysis. My main point has been, that if you want to learn data science, you need to learn data analysis. Data analysis is the foundation of practical data science. With that statement in mind, my most recent blog post showed you “part one” of an example analysis. In t...
24185 sym R (11498 sym/16 pcs) 22 img
Mapping unemployment data, 2016
The last few posts at Sharp Sight about data analysis have been long and fairly intense. Let’s do something a little more fun. Let’s make a quick map. How to make a compelling map,in a few dozen lines of code The code to create this map is surprisingly brief. #============== # LOAD PACKAGES #============== library(ggplot2) library(vir...
5229 sym R (1382 sym/2 pcs) 4 img