Publications by Teja Kodali
Visualizing MLS Player Salaries with ggplot2
Recently, I came across this great visualization of MLS Player salaries. I tried to do something similar with ggplot2, and while I was unable to replicate the interactivity or the tree-map nature of the graph, the graph still looks pretty cool. Data The data is contained in this pdf file. I obtained a CSV file extracted from the PDF file by using...
3219 sym R (1235 sym/4 pcs) 8 img
Using Decision Trees to Predict Infant Birth Weights
In this article, I will show you how to use decision trees to predict whether the birth weights of infants will be low or not. We will use the birthwt data from the MASS library. What is a decision tree? A decision tree is an algorithm that builds a flowchart like graph to illustrate the possible outcomes of a decision. To build the tree, the alg...
4244 sym R (886 sym/9 pcs) 12 img
K Means Clustering in R
Hello everyone, hope you had a wonderful Christmas! In this post I will show you how to do k means clustering in R. We will use the iris dataset from the datasets library. What is K Means Clustering? K Means Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. Unsupervised learning means that ther...
2956 sym R (1771 sym/5 pcs) 4 img
Data manipulation with tidyr
Hello everyone! In this article, I will show you how you can use tidyr for data manipulation. tidyr is a package by Hadley Wickham that makes it easy to tidy your data. It is often used in conjunction with dplyr. Data is said to be tidy when each column represents a variable, and each row represents an observation. I will demonstrate the usage of...
2629 sym R (4488 sym/12 pcs) 6 img
Hierarchical Clustering in R
Hello everyone! In this post, I will show you how to do hierarchical clustering in R. We will use the iris dataset again, like we did for K means clustering. What is hierarchical clustering? If you recall from the post about k means clustering, it requires us to specify the number of clusters, and finding the optimal number of clusters can often ...
3500 sym R (764 sym/6 pcs) 16 img
Predicting wine quality using Random Forests
Hello everyone! In this article I will show you how to run the random forest algorithm in R. We will use the wine quality data set (white) from the UCI Machine Learning Repository. What is the Random Forest Algorithm? In a previous post, I outlined how to build decision trees in R. While decision trees are easy to interpret, they tend to be rathe...
3528 sym R (2198 sym/9 pcs) 8 img
Interactive plotting with rbokeh
Hello everyone! In this post, I will show you how you can use rbokeh to build interactive graphs and maps in R. What is bokeh? Bokeh is a popular python library used for building interactive plots and maps, and now it is also available in R, thanks to Ryan Hafen. It is a very powerful for creating good looking plots for the web easi...
2788 sym R (1372 sym/4 pcs) 9 img