Publications by Orli Khaimova

DATA 607 Project 1: “Data Analysis: Chess Tournament”

20.09.2020

Reading the Data URL <- "https://raw.githubusercontent.com/okhaimova/DATA-607/master/Project1/tournamentinfo" tournamenttemp <- read.csv(URL, header = FALSE, sep = "|") Cleaning up the Data After reading the data, we have to clean it up. I removed the dashes and then separated the data into two separate data frames. One consists of the pair num...

1717 sym R (3946 sym/9 pcs)

DATA 607 Assignment 3

13.09.2020

Problem 1 Using the 173 majors listed in fivethirtyeight.com’s [College Majors dataset] (https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/), provide code that identifies the majors that contain either “DATA” or “STATISTICS” URL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/college-ma...

1999 sym R (1873 sym/14 pcs)

Week 5 Homework

27.09.2020

Loading Libraries library(tidyverse) ## -- Attaching packages --------------------------------------------------------------------------------- tidyverse 1.3.0 -- ## v ggplot2 3.3.2 v purrr 0.3.4 ## v tibble 3.0.3 v dplyr 1.0.0 ## v tidyr 1.1.0 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.5.0 ## -- Conflicts ----------...

1084 sym R (4436 sym/20 pcs) 3 img

Week 9 Assignment – Web APIs

25.10.2020

Description Your task is to choose one of the New York Times APIs, construct an interface in R to read in the JSON data, and transform it into an R DataFrame. Current best sellers lists for hardcover fiction books I pulled data from a list of the current hardcover fiction book and transformed it into an R Data Frame. Since the data frame was lar...

909 sym R (3217 sym/3 pcs)

DATA 607 Final Project - COVID Rates vs Election Results Presentation

09.12.2020

COVID rates vs. Election Results in NYC DATA 607 Final Project Shana GreenMark GonsalvesDominika Markowska-DesvallonsOrli KhaimovaJohn Mazon Introduction As a group, we worked with public CSV data from NYC Health Dept relating to positive cases by zip code in New York City. Secondly, we utilized our shared GitHub to upload a CSV with informat...

7386 sym R (11265 sym/25 pcs) 12 img

DATA 606 Project

04.12.2020

Overview Forced expiratory volume (FEV) is a measure of lung capacity. It measures how much a person can exhale during a forced breath. During the 1970s, data was collected in Boston in youths. This particular data is a cross sectional subset of the larger study which also examined second-hand smoke. The data contains 654 cases, each one being a ...

3854 sym R (2833 sym/9 pcs) 11 img

Week 10 Assignment – Sentiment Analysis

01.11.2020

Overview The task at hand was to get the primary example code from Chapter 2 about sentiment analysis in Text Mining with R1. Then, we had to extend the code by either working with a different corpus or another sentiment lexicon. Sentiments dataset We are loading the sentiments data set with the AFINN, bing, and nrc lexicon. They are based on si...

4096 sym R (11320 sym/64 pcs) 8 img

DATA 607 Project 4: “Document Classification”

15.11.2020

Introduction As a group, we worked with two files containing spam and ham to predict if a document is spam or not .By utilizing our ‘training’ documents, our group was able to classify the “test” documents. We were able to communicate via zoom meeting and collaborating with Github. For this project, we started with a spam/ham dataset, the...

4665 sym R (5686 sym/16 pcs)

DATA 607 Final Project - COVID Rates vs Election Results

06.12.2020

Introduction As a group, we worked with public CSV data from NYC Health Dept relating to positive cases by zip code in New York City. Secondly, we utilized our shared GitHub to upload a CSV with information regarding Presidential Voting Results by election district which we were able to find here. We were able to communicate via phone call, text ...

6873 sym R (12602 sym/27 pcs) 12 img

DATA 605 Final

26.05.2021

Problem 1 Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of \(\mu = \sigma = \frac{N+1}{2}\) set.seed(1234) N = 10 X = runif(10000, 1, N) Y = rnor...

10507 sym R (32267 sym/72 pcs) 18 img 5 tbl