Publications by Sharp Sight Labs

Why you should learn R first for data science

26.01.2015

Over and over, when talking with people who are starting to learn data science, there’s a frustration that comes up: I don’t know which programming language to start with.” And it’s not just programming languages, it’s also software systems like Tableau, SPSS, etc. There is an ever widening range of tools and programming languages and ...

11674 sym 2 img

Why you should start by learning data visualization and manipulation

10.02.2015

One of the biggest issues that comes up when I talk to people who want to get started learning data science is the following: I don’t know where to get started! Recently, I argued that R is the best programming language to learn when you’re getting started with data science. While this helps you select a programming language, it still doesn...

9312 sym

Mapping Paris bikes stands

03.03.2015

A Sharp Sight Labs reader (and now student), Jason P. recently started learning data science. He has a background in data analysis (primarily with Excel and related tools in the Microsoft ecosystem) but he wanted to start learning some of the harder skills of data science. He contacted me after he had diligently reviewed past blog posts on dat...

1988 sym R (978 sym/1 pcs) 2 img

A quick introduction to machine learning in R with caret

06.04.2016

If you’ve been using R for a while, and you’ve been working with basic data visualization and data exploration techniques, the next logical step is to start learning some machine learning. To help you begin learning about machine learning in R, I’m going to introduce you to an R package: the caret package. We’ll build a very simple machi...

13457 sym R (806 sym/3 pcs) 12 img

The one machine learning concept you need to know

25.04.2016

Machine learning is hard. Some people spend weeks, months, even years trying to learn machine learning without any success. They play around with datasets, buy books, compete on Kaggle, but ultimately make little progress. One of the big problems, is that many people just want to “dive in and build something.” I admire the ambition of...

10591 sym R (1109 sym/7 pcs) 50 img

What’s the difference between machine learning, statistics, and data mining?

09.05.2016

Over the last few blog posts, I’ve discussed some of the basics of what machine learning is and why it’s important: – Why machine learning will reshape software engineering – What is the core task of machine learning – How to get started in machine learning in R Throughout those posts, I’ve been using the following definition of mach...

15729 sym 2 img

The real prerequisite for machine learning isn’t math, it’s data analysis

16.05.2016

When beginners get started with machine learning, the inevitable question is “what are the prerequisites? What do I need to know to get started?” And once they start researching, beginners frequently find well-intentioned but disheartening advice, like the following: You need to master math. You need all of the following: – Calculus – ...

15575 sym 6 img

How to use data analysis for machine learning (example, part 1)

31.05.2016

In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite for machine learning is data analysis, not math. One of the main reasons for making this statement, is that data scientists spend an inordinate amount of time on data analysis. The traditional statement is that data scientists “spend 80% of the...

14880 sym R (642 sym/7 pcs) 16 img

How to use data analysis for machine learning, part 2

21.06.2016

In part 1, we went over how to use data visualization and data analysis prior to machine learning. For example, we discussed how to visualize the data to identify potential issues in the dataset, examine the variable distributions, etc. In this blog post, we’ll continue by building a very simple model and using data visualization to examine th...

7487 sym R (1192 sym/4 pcs) 10 img

Mapping global venture capital investment

14.09.2016

Claims of “the end of geography” and the flatness of the world notwithstanding, place still matters today. Discussing why place matters is somewhat beyond the scope of this post, so I will direct you to the excellent work of Parag Khanna and his book Connectography. To put it simply, the the future of business and international relations wil...

2784 sym R (3030 sym/1 pcs) 4 img