Publications by Amit Kapoor
Data607 - Project2B
About the data The data is collected from the R package: fueleconomy. The fueleconomy package’s data was sourced from the EPA (Environmental Protection Agency). In this package, the data is stored in vehicles dataset. Fuel economy data contains data for all cars sold in the US from 1984 to 2015. The package fueleconomy has 33,442 rows and 12 va...
3013 sym R (4851 sym/25 pcs) 4 img 3 tbl
Data605 - Assignment6
Problem 1 A box contains 54 red marbles, 9 white marbles, and 75 blue marbles. If a marble is randomly selected from the box, what is the probability that it is red or blue? Express your answer as a fraction or a decimal number rounded to four decimal places. Solution # funtion to calc probability of given marble Prob <- function(m, total) { r...
4237 sym R (3669 sym/31 pcs)
Data605 - Assignment5
Choose independently two numbers B and C at random from the interval [0, 1] with uniform density. Prove that B and C are proper probability distributions. Note that the point (B,C) is then chosen at random in the unit square. set.seed(67) n <- 10000 B <- runif(n, min = 0, max = 1) C <- runif(n, min = 0, max = 1) cat("min(B)->", format(min(B), sci...
456 sym R (1375 sym/17 pcs) 2 img
Data 607 - Assignment 5
Introduction Data manipulation is one of the most important part of Data Science. The purpose of this assignment is to perform data manipulation using R packages tidyr and dplyr. Data manipulation involves data rearrangement, manipulation and its analysis to make it ready for applicable model. Problem Statement We have been provided the data for...
3379 sym R (8050 sym/23 pcs) 4 img
Data 607 - Project 1
Project Overview This project is to process a text file of chess tournament results as shown below. The tournament results follow some structure and below is the glimpse of data from the given file. The goal of this project is to generate a .CSV file (which could for example be imported into a SQL database) with the following information of the ...
3535 sym R (19904 sym/45 pcs) 3 img 1 tbl
Data 605 - Assignment 4
Problem set 1 1. In this problem, we’ll verify using R that SVD and Eigenvalues are related as worked out in the weekly module. Given a 3x2 matrix A \(A = \left( \begin{matrix} 1&2&3 \\ -1&0&4 \end{matrix} \right)\) write code in R to compute X = A\(\mathbf{A}^\intercal\) and Y = \(\mathbf{A}^\intercal\)A. Then, compute the eigenvalues and ei...
2430 sym R (4326 sym/41 pcs)
Data 607 - Assignment 3
library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(stringr) theLink <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/majors-l...
1794 sym R (2936 sym/21 pcs)
Data 605 - HW2
1. Problem set 1 (1) Show that \(A\) \(\mathbf{A}^\intercal\) \(\neq\) \(\mathbf{A}^\intercal\) \(A\) in general. (Proof and demonstration.) Proof:- Let A be a m x n matrix. By definition its transpose \(\mathbf{A}^\intercal\) will be n x m matrix. Given the definition of matrix multiplication, multiplying A to \(\mathbf{A}^\intercal\) will ret...
2798 sym R (875 sym/5 pcs)
Data 607 - Assginment 1
Overview This data originates from “Where People Go To Check The Weather”. The source of the data is a Survey Monkey Audience poll commissioned by FiveThirtyEight and conducted from April 6 to April 10, 2015. Data Source https://fivethirtyeight.com/features/weather-forecast-news-app-habits/ Data Description RespondentID Do you typically ch...
2177 sym R (10071 sym/18 pcs) 2 img
Data 605 - HW1
1. Problem set 1 You can think of vectors representing many dimensions of related information. For instance, Netflix might store all the ratings a user gives to movies in a vector. This is clearly a vector of very large dimensions (in the millions) and very sparse as the user might have rated only a few movies. Similarly, Amazon might store the i...
2196 sym R (1138 sym/15 pcs)