Publications by Rongen Zhang

UnstructuredDataMgmt-Summer2021-Lab2

10.06.2021

CIS 4730Unstructured Data Management Lab: Data types Rongen Zhang Recap The RStudio environment Basic operators Basic data types Define variables variable_name <- some_value target <- data Agenda Advanced data types Vector List Matrix Data frame R Data Types Numeric Character Logic Factor Vector: a set of values, all of the same data typ...

4600 sym R (6780 sym/135 pcs)

Web Crawling

06.07.2021

CIS 4730Unstructured Data Management Lab: Web scraping Rongen Zhang Getting data from the Web There are many ways to obtain data from the Internet; let’s consider four categories: click-and-download on the internet as a “flat” file, such as .csv, .xls install-and-play an API for which someone has written a handy R package API-query publi...

4951 sym R (15709 sym/30 pcs)

Publish Document

06.07.2021

CIS 4730Unstructured Data Management Lab: Web scraping Rongen Zhang Getting data from the Web There are many ways to obtain data from the Internet; let’s consider four categories: click-and-download on the internet as a “flat” file, such as .csv, .xls install-and-play an API for which someone has written a handy R package API-query publi...

4951 sym R (15713 sym/30 pcs)

Lab 05 Data Manipulation

24.06.2021

CIS 4730Unstructured Data Management Lab: Data manipulation Rongen Zhang The tidyverse The tidyverse is a collection of R packages designed for data science. https://www.tidyverse.org/ Install the complete tidyverse with: install.packages("tidyverse") Load the tidyverse into the R environment library(tidyverse) ## ── Attaching package...

5548 sym R (8340 sym/47 pcs) 1 img

CIS 4730 - Lab 03

15.06.2021

CIS 4730Unstructured Data Management Lab: Data input, output, and summary Rongen Zhang Agenda Data input Data output Data summary Summarizing data with figures Getting data into R Importing data into R is fairly simple. We can use built-in functions or libraries to read data from the following sources: Text file (.txt) Comma-separated values...

6663 sym R (10506 sym/50 pcs) 14 img

Lab 04 Flow Control

20.06.2021

CIS 4730 Unstructured Data Management Lab: Flow control Rongen Zhang Flow control if-else for while function if-else An if statement consists of a logic condition (TRUE or FALSE) followed by one or more statements. # Template in words if(a logic condition) { Get inside the curly brackets and run this block when the condition is true } #...

3378 sym R (2911 sym/37 pcs) 1 tbl

Lab 06 Text Processing

29.06.2021

CIS 4730Unstructured Data Management Lab: Text processing Rongen Zhang Agenda String manipulation Regular expression Package stringr The stringr package provides a cohesive set of functions designed to make working with strings as easy as possible. stringr is included in the tidyverse but it is not loaded automatically with library(tidyverse)...

5353 sym R (4260 sym/89 pcs)

Classification and Clustering

08.07.2021

CIS 4730Unstructured Data Management Text Classification and Clustering Rongen Zhang Text Classification and Clustering Supervised Learning: Text classification k-nearest neighbor (KNN) Support vector machine (SVM) Unsupervised Learning: Text clustering K-means Hierarchical clustering Data for this lab session We will use the iris data set...

3115 sym R (6313 sym/37 pcs) 7 img

Web Crawling

18.10.2021

CIS 4730Unstructured Data Management Lab: Web scraping Rongen Zhang Getting data from the Web There are many ways to obtain data from the Internet; let’s consider four categories: click-and-download on the internet as a “flat” file, such as .csv, .xls install-and-play an API for which someone has written a handy R package API-query publi...

4929 sym R (14230 sym/29 pcs) 1 img

Lab 03 Data Input Output and Summary

09.09.2021

CIS 4730Unstructured Data Management Lab: Data input, output, and summary Rongen Zhang Agenda Data input Data output Data summary Summarizing data with figures Getting data into R Importing data into R is fairly simple. We can use built-in functions or libraries to read data from the following sources: Text file (.txt) Comma-separated values...

6639 sym R (10181 sym/48 pcs) 14 img