Publications by Teja Kodali

Visualizing MLS Player Salaries with ggplot2

23.11.2015

Recently, I came across this great visualization of MLS Player salaries. I tried to do something similar with ggplot2, and while I was unable to replicate the interactivity or the tree-map nature of the graph, the graph still looks pretty cool. Data The data is contained in this pdf file. I obtained a CSV file extracted from the PDF file by using...

3219 sym R (1235 sym/4 pcs) 8 img

Using Decision Trees to Predict Infant Birth Weights

16.12.2015

In this article, I will show you how to use decision trees to predict whether the birth weights of infants will be low or not. We will use the birthwt data from the MASS library. What is a decision tree? A decision tree is an algorithm that builds a flowchart like graph to illustrate the possible outcomes of a decision. To build the tree, the alg...

4244 sym R (886 sym/9 pcs) 12 img

K Means Clustering in R

28.12.2015

Hello everyone, hope you had a wonderful Christmas! In this post I will show you how to do k means clustering in R. We will use the iris dataset from the datasets library. What is K Means Clustering? K Means Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. Unsupervised learning means that ther...

2956 sym R (1771 sym/5 pcs) 4 img

Data manipulation with tidyr

06.01.2016

Hello everyone! In this article, I will show you how you can use tidyr for data manipulation. tidyr is a package by Hadley Wickham that makes it easy to tidy your data. It is often used in conjunction with dplyr. Data is said to be tidy when each column represents a variable, and each row represents an observation. I will demonstrate the usage of...

2629 sym R (4488 sym/12 pcs) 6 img

Hierarchical Clustering in R

22.01.2016

Hello everyone! In this post, I will show you how to do hierarchical clustering in R. We will use the iris dataset again, like we did for K means clustering. What is hierarchical clustering? If you recall from the post about k means clustering, it requires us to specify the number of clusters, and finding the optimal number of clusters can often ...

3500 sym R (764 sym/6 pcs) 16 img

Predicting wine quality using Random Forests

04.02.2016

Hello everyone! In this article I will show you how to run the random forest algorithm in R. We will use the wine quality data set (white) from the UCI Machine Learning Repository. What is the Random Forest Algorithm? In a previous post, I outlined how to build decision trees in R. While decision trees are easy to interpret, they tend to be rathe...

3528 sym R (2198 sym/9 pcs) 8 img

Interactive plotting with rbokeh

17.02.2016

Hello everyone! In this post, I will show you how you can use rbokeh to build interactive graphs and maps in R. What is bokeh? Bokeh is a popular python library used for building interactive plots and maps, and now it is also available in R, thanks to Ryan Hafen. It is a very powerful for creating good looking plots for the web easi...

2788 sym R (1372 sym/4 pcs) 9 img