Publications by Sung Lee

Data 607 Week 7 Assignment

14.03.2020

Assignment on RPubs Rmd on Github Introduction This week’s assignment is to work with the web uibiquitous forms of html, xml, and json. This assignment will work with and identify differences between three files: books.html, books.xml, and books.json. The books in the brief inventory are from the dystopian science fiction genre. One book Look...

2365 sym R (3640 sym/24 pcs)

Data 605 Final Project

03.12.2020

The YouTube video of the presentation is here Problem 1 Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of \(\mu = \sigma = (N+1)/2\). set.seed(2020) ...

7909 sym R (29668 sym/71 pcs) 7 img

Data 605 Final Project

02.12.2020

Problem 1 Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of \(\mu = \sigma = (N+1)/2\). set.seed(2020) N <- 8 # mu = sigma = (N + 1)/2 mu <- (N+1)...

7863 sym R (29668 sym/71 pcs) 7 img

Data 608 Homework 2

22.09.2021

Introduction¶Author: Sung Lee Semester: Fall 2021 This Jupyer Notebook is for my Data 608 course. GitHub Link In [1]: # The following lines were added to get the Jupyter Notebook working in Colab !pip install datashader !pip install pyproj # Be sure to update IPython otherwise GeoJSON will not work # Also be sure to restart the runtime envir...

7954 sym R (15827 sym/17 pcs) 4 img 2 tbl

Data 608 Homework 1

03.09.2021

** Original source for this assignment is from https://github.com/charleyferrari/CUNY_DATA_608/blob/master/module1/hw1.rmd Principles of Data Visualization and Introduction to ggplot2 I have provided you with data about the 5,000 fastest growing companies in the US, as compiled by Inc. magazine. lets read this in: inc <- read.csv("https://raw.gi...

2720 sym R (5044 sym/19 pcs) 3 img 1 tbl

Data 621: Homework 1

18.09.2021

Introduction The purpose of this homework assignment is to build a multiple linear regression model on the training data to predict the number of wins for the team given in the data set. Data Exporation Describe the size and the variables in the moneyball training data set. Consider that too much detail will cause a manager to lose interest whil...

7605 sym R (43754 sym/97 pcs) 7 img 1 tbl

Data 621 Homework 2

01.10.2021

Overview In this homework assignment, you will work through various classification metrics. You will be asked to create functions in R to carry out the various calculations. You will also investigate some functions in packages that will let you obtain the equivalent results. Finally, you will create graphical output that also can be used to evalu...

5835 sym R (12372 sym/26 pcs) 4 img 2 tbl

Data 622 Homework 1

11.03.2022

Introduction This assignment is the first homework for Data 622. The following is the assignment: Visit the following website and explore the range of sizes of this dataset (from 100 to 5 million records). https://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/ Based on your computer’s capabilities (memory, CPU), s...

6225 sym R (16395 sym/50 pcs) 2 img

Data 622 Homework 2

26.03.2022

Assignment Based on the latest topics presented, bring a dataset of your choice and create a Decision Tree where you can solve a classification or regression problem and predict the outcome of a particular feature or detail of the data used. Switch variables to generate 2 decision trees and compare the results. Create a random forest for regr...

3824 sym R (13492 sym/44 pcs) 3 img

Data 622 Homework 4

04.05.2022

RPubs Link GitHub Link Assignment You get to decide which dataset you want to work on. THe data set must be different You can work on a problem from your work, or something you are interested in. You may also obtain a dataset from sites such as Kaggle, Data.Gov, Census Bureau, USGS or other open data portal. Select one of the methodologies...

7966 sym R (17549 sym/48 pcs) 82 img 87 tbl