Publications by Shin Lee

Big Data Methodology: Week2

12.09.2020

빅데이터 방법론 What is Computational Social Science? Lazer D. et al. (2009) “Computational Social Science”, Science. digital traces -> pictures of indivdual and group behavior one-time, self-reported data -> moment-by-moment, unobtrusive observation possible questions: Can the diversity of news and content we receive predict our ...

6692 sym R (1442 sym/24 pcs) 3 img

R Markdown for Week 2-2

11.09.2020

2+2 ## [1] 4 Vectors We can simply consider a vector to be an ordered sequence of values of the same data type. A sequence is ordered such that the two sequences represented below are treated as two different entities by R: Vectors c(100,20,40,15,90) ## [1] 100 20 40 15 90 vector <- c(100,20,40,15,90) Type Example numeric c(1,2,3) logi...

769 sym R (1633 sym/25 pcs) 2 img

html_ex

22.09.2020

173 sym 1 img

BDM-Week4

23.09.2020

Ch2. HTML 2.1 Brower presentation and source code HTML HTML’s marked up structured Markup definitions: the tags Web content is an interpreted version of the source code How the document is structured and the function of its various parts: headlines, links, tables, etc… Element inspector 2.2 Syntax rules Tags, elements, and attributes...

4042 sym R (2360 sym/19 pcs) 3 img

OurFirstHTML

27.09.2020

I am your first HTML file! Link to Wikipedia! ...

64 sym

BDJ20-W6

05.10.2020

Parsing Loading and representing the contents of HTML/XML files in an R session Inspecting content on the Web: browser to display HTML content nicely Importing HTML files into R and extracting info. from them: parser in R to construct useful representations of HTML documents What is parsing? Reading vs. Parsing Reading does not care to underst...

2832 sym R (9301 sym/68 pcs) 2 img

BDM-Week6

07.10.2020

XPath Web scraping process Asking what information we are interested in and identifying where the information is located in a specific document Tailoring a query to the document and obtaining the desired information (Re)casting the extracted values into a format that facilitates further analysis XPath - a query language for web documents XPath...

1919 sym R (7541 sym/61 pcs) 3 img

Document

09.10.2020

library(httr) url <- "http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html" fortune <- httr::GET(url) library(XML) parsed_fortune <- htmlParse(fortune, encoding = "UTF-8") XPath - a query language for web documents XPath is a query language that is useful for addressing and extracting parts from HTML/XML documents. Te...

1415 sym R (28387 sym/74 pcs) 1 img

BDM-Week9

28.10.2020

The circle of web scraping Information identification Choice of strategy Data retrieval Information extraction Data preparation Data validation Debugging and maintenance Generalization Retrieval scenarios library(RCurl) library(XML) library(stringr) Downloading ready-made files Data in CSV files. CSV election results data The Maryland State...

2101 sym R (164547 sym/146 pcs)

BDJ20-W9-1

26.10.2020

What is stringr? stringr is a package designed specially for text pre-processing. This package provides three main families of useful functions to process strings more consistent, simpler and easier: Character manipulation: these functions manipulate individual characters within the strings in character vector objects Whitespace tools to add, re...

3039 sym R (37892 sym/47 pcs)