Publications by Perceptive Analytics

Machine Learning Using Support Vector Machines

19.04.2017

Support Vector Machines (SVM) is a data classification method that separates data using hyperplanes. The concept of SVM is very intuitive and easily understandable. If we have labeled data, SVM can be used to generate multiple separating hyperplanes such that the data space is divided into segments and each segment contains only one kind of data....

8719 sym R (3604 sym/9 pcs) 14 img

Implementing Parallel Processing in R

07.08.2017

If something takes less time if done through parallel processing, why not do it and save time? Modern laptops and PCs today have multi core processors with sufficient amount of memory available and one can use it to generate outputs quickly. Parallelizing your codes has its own numerous advantages. Instead of waiting several minutes o...

7362 sym R (8460 sym/16 pcs)

Exploring Assumptions of K-means Clustering using R

07.08.2017

K-Means Clustering is a well known technique based on unsupervised learning. As the name mentions, it forms ‘K’ clusters over the data using mean of the data. Unsupervised algorithms are a class of algorithms one should tread on carefully. Using the wrong algorithm will give completely botched up results and all the effort will go...

8040 sym R (4607 sym/12 pcs) 20 img

How to Perform Hierarchical Clustering using R

18.12.2017

What is Hierarchical Clustering? Clustering is a technique to club similar data points into one group and separate out dissimilar observations into different groups or clusters. In Hierarchical Clustering, clusters are created such that they have a predetermined ordering i.e. a hierarchy. For example, consider the concept hierarchy of a library....

12420 sym R (2873 sym/9 pcs) 24 img

How to implement Random Forests in R

09.01.2018

Imagine you were to buy a car, would you just go to a store and buy the first one that you see? No, right? You usually consult few people around you, take their opinion, add your research to it and then go for the final decision. Let’s take a simpler scenario: whenever you go for a movie, do you ask your friends for reviews about th...

8683 sym R (7388 sym/9 pcs)

Understanding Naïve Bayes Classifier Using R

22.01.2018

The Best Algorithms are the Simplest The field of data science has progressed from simple linear regression models to complex ensembling techniques but the most preferred models are still the simplest and most interpretable. Among them are regression, logistic, trees and naive bayes techniques. Naive Bayes algorithm, in particular is a logic bas...

12183 sym R (4725 sym/10 pcs)

Whys and Hows of Apply Family of Functions in R

22.02.2018

Introduction to Looping system Imagine you were to perform a simple task, let’s say calculating sum of columns for 3X3 matrix, what do you think is the best way? Calculating it directly using traditional methods such as calculator or even pen and paper doesn’t sound like a bad approach. A lot of us may prefer to just calculate it manually in...

9026 sym R (3612 sym/14 pcs)

Steps to Perform Survival Analysis in R

26.03.2018

Another way of analysis? When there are so many tools and techniques of prediction modelling, why do we have another field known as survival analysis? As one of the most popular branch of statistics, Survival analysis is a way of prediction at various points in time. This is to say, while other prediction models make predictions of whether an ev...

9030 sym R (8684 sym/11 pcs) 8 img

Discriminant Analysis: Statistics All The Way

27.03.2018

Discriminant analysis is used when the variable to be predicted is categorical in nature. This analysis requires that the way to define data points to the respective categories is known which makes it different from cluster analysis where the classification criteria is not know. It works by calculating a score based on all the predict...

9625 sym R (4074 sym/10 pcs) 8 img

Exploratory Factor Analysis in R

10.05.2018

Changing Your Viewpoint for Factors In real life, data tends to follow some patterns but the reasons are not apparent right from the start of the data analysis. Taking a common example of a demographics based survey, many people will answer questions in a particular ‘way’. For example, all married men will have higher expenses than single me...

9411 sym R (4121 sym/6 pcs) 2 img