Publications by Nagdev Amruthnath

Testing the Effect of Data Imputation on Model Accuracy

20.06.2020

Most of us have come across situations where, we do not have enough data for building reliable models due to various reasons such as, it’s expensive to collect data (human studies), limited resources, lack of historical data availability (earth quakes). Even before we begin talking about how to overcome the challenge, let’s first talk about w...

9613 sym R (14433 sym/26 pcs) 4 img 8 tbl

Data Science Application in Manufacturing

22.06.2020

Last week, I had a great opportunity to give a talk on data science application in manufacturing at Acharya Institute of Technology(AIT), Bangalore. Being an alumni, AIT has a special place in my heart. A lot of curious young minds who attended my session had great questions. Some of the highlights of Q&A session are Questions What is the differe...

5414 sym

Why balancing your data set is important?

24.06.2020

In real world, its not uncommon to come across unbalanced data sets where, you might have class A with 90 observations and class B with 10 observations. One of the rules in machine learning is, its important to balance out the data set or at least get it close to balance it. The main reason for this is to give equal priority to each class in laym...

4107 sym R (1045 sym/2 pcs) 8 img

Visualizing Principle Components for Images

28.06.2020

Principle Component Analysis (PCA) is a great tool for a data analysis projects for a lot of reasons. If you have never heard of PCA, in simple words it does a linear transformation of your features using covariance or correlation. I will add a few links below if you want to know more about it. Some of the applications of PCA are dimensional redu...

4413 sym R (1075 sym/3 pcs) 6 img

How to become a data scientist in 30 days?

29.06.2020

On a late evening, I was scrolling through Reddit and came across a news article about “Why Bill Gates wants us all to get vaccinated?”. The news site looked legitimate. I was half way through the article and saw quite a few grammatical errors. Me being lurker, I switched to comments and saw a few of them mention the article being AI generate...

12716 sym R (670 sym/2 pcs) 4 img

Will Netflix Renew the Show?

07.08.2020

In last couple of years, Netflix has become a part of my lifestyle. At the end of my day when I turn on my TV, by default i’m tuned to check out Netflix. I always look forward for Friday when they release their original content and make sure I binge them by the end of my weekend. My wife and I recently binged their reality TV show called “Ind...

6765 sym R (4882 sym/6 pcs) 2 img

How to use CI/CD for your ML Projects?

13.08.2020

The terms CI/CD stands for Continuous Integration and Continuous Delivery – Deployment. Before we jump into how all these work, let’s take a step back and walk through the process of ML. Most of the data scientists do their data analytics on their laptops. For every data analytics projects there are various steps involved and most common one�...

9704 sym R (1104 sym/3 pcs) 18 img

Benford’s Law: Applying to Existing Data

18.08.2020

Benford’s Law is one of the most underrated and widely used techniques that are commonly used in various applications. United States IRS neither confirms nor denies their use of Benford’s law to detect any number of manipulations in income tax filing. Across the Atlantic, the EU is very open and proudly claims its use of Benford’s law. Toda...

10938 sym R (3873 sym/6 pcs) 10 img 1 tbl

Big Data Ignite 2020 Webinar Series

15.09.2020

Big Data Ignite (BDI) was born out of a shared vision: To foster a local center of excellence in advanced computing technologies and practice. After initial success in organizing local Meetup groups, co-founders Elliott and Tuhin realized that to achieve their goal, the scope and scale of activism would need to grow. So, in 2016, the Big Data Ign...

2538 sym

Sentiment Analysis on Reddit using R

24.06.2021

According to Wikipedia, Reddit is an American social news aggregation, web content rating, and discussion website. Registered members submit content to the site such as links, text posts, images, and videos, which are then voted up or down by other members. Posts are organized by subject into user-created boards called “communities” or �...

3259 sym R (867 sym/4 pcs)