Publications by Nagdev Amruthnath
Testing the Effect of Data Imputation on Model Accuracy
Most of us have come across situations where, we do not have enough data for building reliable models due to various reasons such as, it’s expensive to collect data (human studies), limited resources, lack of historical data availability (earth quakes). Even before we begin talking about how to overcome the challenge, let’s first talk about w...
9613 sym R (14433 sym/26 pcs) 4 img 8 tbl
Data Science Application in Manufacturing
Last week, I had a great opportunity to give a talk on data science application in manufacturing at Acharya Institute of Technology(AIT), Bangalore. Being an alumni, AIT has a special place in my heart. A lot of curious young minds who attended my session had great questions. Some of the highlights of Q&A session are Questions What is the differe...
5414 sym
Why balancing your data set is important?
In real world, its not uncommon to come across unbalanced data sets where, you might have class A with 90 observations and class B with 10 observations. One of the rules in machine learning is, its important to balance out the data set or at least get it close to balance it. The main reason for this is to give equal priority to each class in laym...
4107 sym R (1045 sym/2 pcs) 8 img
Visualizing Principle Components for Images
Principle Component Analysis (PCA) is a great tool for a data analysis projects for a lot of reasons. If you have never heard of PCA, in simple words it does a linear transformation of your features using covariance or correlation. I will add a few links below if you want to know more about it. Some of the applications of PCA are dimensional redu...
4413 sym R (1075 sym/3 pcs) 6 img
How to become a data scientist in 30 days?
On a late evening, I was scrolling through Reddit and came across a news article about “Why Bill Gates wants us all to get vaccinated?”. The news site looked legitimate. I was half way through the article and saw quite a few grammatical errors. Me being lurker, I switched to comments and saw a few of them mention the article being AI generate...
12716 sym R (670 sym/2 pcs) 4 img
Will Netflix Renew the Show?
In last couple of years, Netflix has become a part of my lifestyle. At the end of my day when I turn on my TV, by default i’m tuned to check out Netflix. I always look forward for Friday when they release their original content and make sure I binge them by the end of my weekend. My wife and I recently binged their reality TV show called “Ind...
6765 sym R (4882 sym/6 pcs) 2 img
How to use CI/CD for your ML Projects?
The terms CI/CD stands for Continuous Integration and Continuous Delivery – Deployment. Before we jump into how all these work, let’s take a step back and walk through the process of ML. Most of the data scientists do their data analytics on their laptops. For every data analytics projects there are various steps involved and most common one�...
9704 sym R (1104 sym/3 pcs) 18 img
Benford’s Law: Applying to Existing Data
Benford’s Law is one of the most underrated and widely used techniques that are commonly used in various applications. United States IRS neither confirms nor denies their use of Benford’s law to detect any number of manipulations in income tax filing. Across the Atlantic, the EU is very open and proudly claims its use of Benford’s law. Toda...
10938 sym R (3873 sym/6 pcs) 10 img 1 tbl
Big Data Ignite 2020 Webinar Series
Big Data Ignite (BDI) was born out of a shared vision: To foster a local center of excellence in advanced computing technologies and practice. After initial success in organizing local Meetup groups, co-founders Elliott and Tuhin realized that to achieve their goal, the scope and scale of activism would need to grow. So, in 2016, the Big Data Ign...
2538 sym
Sentiment Analysis on Reddit using R
According to Wikipedia, Reddit is an American social news aggregation, web content rating, and discussion website. Registered members submit content to the site such as links, text posts, images, and videos, which are then voted up or down by other members. Posts are organized by subject into user-created boards called “communities” or �...
3259 sym R (867 sym/4 pcs)