Publications by Anoop
Reproducible Research Assignment 1
’’’ This assignment makes use of data from a personal activity monitoring device. This device collects data at 5 minute intervals through out the day. The data consists of two months of data from an anonymous individual collected during the months of October and November, 2012 and include the number of steps taken in 5 minute intervals each...
1504 sym R (4376 sym/32 pcs) 4 img
Automatic spam detection using a logistic regression model
library(kernlab) data(spam) Subsampling spam dataset #perform the subsampling trainIndicator = rbinom(4601, size = 1, prob = 0.5) table(trainIndicator) ## trainIndicator ## 0 1 ## 2332 2269 trainSpam = spam[trainIndicator == 1,] testSpam = spam[trainIndicator == 0, ] **Exploratory Data Analysis* head(trainSpam) table(trainSpam$type...
1152 sym R (7119 sym/26 pcs) 4 img
Collaborative Filtering in R
Recommendation systems (sometimes called recommender systems) are a collection of algorithms used to recommend items to users based on information taken from the user. These systems have become ubiquitous and can commonly be seen in online stores, movies databases and job finders. In this notebook, we will explore recommendation systems based on ...
6110 sym R (19352 sym/55 pcs)
Linear Regression in R
The work of a data scientist goes way beyond performing basic tasks such as analyzing and visualizing data. Now, I will explore another exciting tool of data analysis: predictions and estimates Get to know that data This data corresponds to the percentage of the population with access to improved sanitation facilities and life expectancy (in year...
1005 sym R (5221 sym/25 pcs) 5 img
Decision Trees in R with Mushrooms dataset
Decision Trees in R Hello, and welcome to the Decision Trees in R. Here, we will be going over what Decision Trees are, what they are used for, and how to utilize them in the R environment. The Classification Problem Suppose we are lost in a forest, and are very hungry. Unable to go on without eating something first, we take a look around, only t...
4526 sym R (9375 sym/29 pcs) 1 img
Random Forests in R
Random Forests in R we will be going over what Random Forests are, what they are used for, and how to use them in an R environment. Why do we need Random Forests? You might be familiar with the concept of Decision Trees – a probabilistic predictive model which can be used to classify data in a wide array of applications. Decision Trees are crea...
4329 sym R (2258 sym/21 pcs) 1 img
K Nearest Neighbours using R
we are going to introduce the 𝐾 -nearest neighbors (KNN) algorithm and show some practical ways of using it in R with the knn function that exists in the class library.we are going to introduce the 𝐾 -nearest neighbors (KNN) algorithm and show some practical ways of using it in R with the knn function that exists in the class library. I...
3974 sym R (9533 sym/42 pcs)
Exploratory Data Analysis Project 2
Fine particulate matter (PM2.5) is an ambient air pollutant for which there is strong evidence that it is harmful to human health. In the United States, the Environmental Protection Agency (EPA) is tasked with setting national ambient air quality standards for fine PM and for tracking the emissions of this pollutant into the atmosphere. Approxima...
1449 sym R (4452 sym/21 pcs) 6 img
Electric Power Consumption
This assignment uses data from the UC Irvine Machine Learning Repository, a popular repository for machine learning datasets. In particular, we will be using the “Individual household electric power consumption Data Set” which I have made available on the course web site. This assignment uses data from the UC Irvine Machine Learning Reposit...
150 sym R (3468 sym/8 pcs) 4 img
Logistic Regression in R
*What is different between Linear and Logistic Regression?* While Linear Regression is suited for estimating continuous values (e.g. estimating house price), it isn’t the best tool for predicting the class of an observed data point. In order to estimate a classification, we need some sort of guidance on what would be the most probable class f...
1802 sym R (15873 sym/68 pcs) 6 img