Publications by Shin Lee

BDJ21-W10-1

25.10.2021

Regular Expression A regular expression is a special string for describing a certain text pattern. Character classes Regex provides another useful constructs called character classes that are used to match a certain class of characters. The most common character classes in most regex engines are: Character Matches Same as \\d any digit [0-9] ...

2353 sym R (1214 sym/17 pcs) 5 tbl

BDJ21-W9-1

25.10.2021

What is stringr? stringr is a package designed specially for text pre-processing. This package provides three main families of useful functions to process strings more consistent, simpler and easier: Character manipulation: these functions manipulate individual characters within the strings in character vector objects Whitespace tools to add, re...

2668 sym R (37931 sym/46 pcs) 1 tbl

BDJ21-W7-2

18.10.2021

Example of Web Scraping from NAVER News News Headline News Company Name Upload Time News Highlight Main Text library(XML) library(httr) news <- readLines("https://news.naver.com/main/read.naver?mode=LSD&mid=shm&sid1=104&oid=001&aid=0012726579") parsed_news <- htmlParse(news) parsed_news '//*[@id="articleTitle"]' headline <- xpathSAppl...

291 sym R (3875 sym/2 pcs)

BDJ21-W7-1

13.10.2021

Parsing library(httr) url <- "http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html" fortune <- httr::GET(url) fortune ## Response [http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html] ## Date: 2021-10-13 06:56 ## Status: 200 ## Content-Type: text/html; charset=UTF-8 ## Size: 776 B ## ...

1701 sym R (5553 sym/47 pcs)

BDJ21-W6-2

10.10.2021

Parsing library(httr) url <- "http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html" fortune <- httr::GET(url) fortune ## Response [http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html] ## Date: 2021-10-10 14:36 ## Status: 200 ## Content-Type: text/html; charset=UTF-8 ## Size: 776 B ## ...

908 sym R (7062 sym/51 pcs) 1 img

BDJ21_W5-2

30.09.2021

2.2 Syntax rules 1. Tags, elements, and attributes Elements <title>First HTML</title> Attributes <a href="https://en.wikipedia.org/wiki/Main_Page">Link to Wikipedia!</a> https://en.wikipedia.org/wiki/Main_Page 2. Tree structure <html> <head> <title>First HTML</title> </head> <body> <p>I am your first HTML file!</p> ...

2441 sym R (1430 sym/11 pcs) 1 img

BDJ21_W4-1

27.09.2021

Data Storytelling Scenarios Questions to be considered What type of data is most suited to answer your question? Is the quality of the data sufficiently high to answer your question? Isn’t the information systematically flawed? Web data quality: origin of online data What is the primary sources of secondary data? There may be situations wh...

3742 sym R (245 sym/1 pcs) 3 img

BDJ21_W5-1

25.09.2021

Before we start… Please be noted that you will work with R Markdown documents. R Markdown consists of three parts: 1) contents; 2) codes; 3) outputs (results). First, the content parts describe what you are learning about and asked to work on. Second, the code parts are in grey boxes and are what you can enter in the source window of RStudio. T...

2336 sym R (3205 sym/78 pcs) 1 img

BDM21_W4

23.09.2021

Ch2. HTML 2.1 Brower presentation and source code HTML HTML’s marked up structured Markup definitions: the tags Web content is an interpreted version of the source code How the document is structured and the function of its various parts: headlines, links, tables, etc… Element inspector 2.2 Syntax rules Tags, elements, and attributes...

4768 sym R (43334 sym/85 pcs) 6 img 1 tbl

BDM21_W3

16.09.2021

빅데이터 방법론 Large amounts of data will not overcome the selection problems that make causal inference so difficult Causal Inference and Selection Bias Why causal inference? Simple correlation does not imply causality Among many different factors, important to choose the most significant cause Some facts 한국은 코로나 바이...

5818 sym R (67032 sym/68 pcs) 4 img