Publications by Daniel Emaasit

Kick-off Meetup (Intro to using R for data analysis)

02.09.2014

In our first meetup, we were introduced to R as a data analysis tool. Dr. Dennis Murphy spent some time telling us about a brief history of R and then went into a brief introduction to using R for data analysis for the beginner. You can find slides for this meetup in our github repository or click here to launch the presentation. The post Kick-...

858 sym

dplyr: Data Manipulation in R

07.10.2014

dplyr is a package for efficient data manipulation based on the grammar of data manipulation by Hadley Wickham. This package is efficient in manipulating data frames, data tables, databases and more. Dr. Dennis Murphy gave an interactive presentation of the elements that make dplyr the go-to package for data munging in R. He also provided Rscri...

1007 sym

R for in-Hadoop Analytics: with Big Data Developer meetup Group

26.10.2014

We were honoured to have a joint event with the Big Data Developer Meetup Group where we were introduced to IBMs BigR package for in-Hadoop Analytics. Mr. Rafeal Coss and Mr. Brandon MacKenzie demonstrated the workings of BigR, the integration of R into Hadoop using IBM BigInsights. You can download the slides of this presentation by clicking...

1417 sym

ggplot2: Elegant Graphics for Data Analysis

05.11.2014

ggplot2 is a plotting system for R, based on the grammar of graphics by Hadley Wickham. ggplot2 tries to take the good parts of base and lattice graphics and none of the bad parts. As a contributor to the package, Dr. Dennis Murphy was able to paint a clear picture on how ggplot2 takes care of many of the fiddly details that make plotting a hass...

1209 sym

R and Science of Predictive Analytics

03.12.2014

Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. We begun with an overview of prediction methods in R, and then discussed of some case studies of how R is being used for real world problems. The post R and Science of Predictive Analytics ...

811 sym

Classification and Regression Trees using R

07.01.2015

Recursive partitioning is a fundamental tool in data mining. It helps us explore the structure of a set of data, while developing easy to visualize decision rules for predicting a categorical (classification tree) or continuous (regression tree) outcome. Classification and regression trees can be generated through the rpart package. The post Cl...

852 sym

Launching DataScience.Vegas Blog

23.02.2015

We are glad to announce the launch of DataScience.Vegas as a blog that aggregates all the events, news and information impacting the Las Vegas data science community. Our community has witnessed the birth and steady growth of several data science meetup groups with a very enthusiastic group of devoted members. We are a community of data scienti...

2302 sym

Scalable Machine Learning for Big Data Using R and H2O

28.02.2015

Part I Part II H2O is an open source parallel processing engine for machine learning on Big Data. This prediction engine is designed by, h20, a Mountain View-based startup that has implemented a number of impressive statistical and machine learning algorithms to run on HDFS, S3, SQL and NoSQL. We were honored to have Tom Kraljevic (Vice Preside...

2577 sym

Launch Apache Spark on AWS EC2 and Initialize SparkR Using RStudio

10.11.2015

This post was first published on SparkIQ Labs’ blog and re-posted on my personal blog. Introduction In this blog post, we shall learn how to launch a Spark stand alone cluster on Amazon Web Services (AWS) Elastic Compute Cloud (EC2) for analysis of Big Data. This is a continuation from our previous blog, which showed us how to download Apac...

6653 sym 50 img

Launching Data Science Africa Blog

26.01.2016

We are glad to announce the launch of datascience-africa.org as a blog that aggregates all the events, news and information impacting the data science community in some of the major cities in Africa. Our community has witnessed the birth and steady growth of several data science meetup groups with a very enthusiastic group of devoted members. ...

2640 sym