Publications by Shin Lee
Big Data Methodology: Week2
빅데이터 방법론 What is Computational Social Science? Lazer D. et al. (2009) “Computational Social Science”, Science. digital traces -> pictures of indivdual and group behavior one-time, self-reported data -> moment-by-moment, unobtrusive observation possible questions: Can the diversity of news and content we receive predict our ...
6692 sym R (1442 sym/24 pcs) 3 img
R Markdown for Week 2-2
2+2 ## [1] 4 Vectors We can simply consider a vector to be an ordered sequence of values of the same data type. A sequence is ordered such that the two sequences represented below are treated as two different entities by R: Vectors c(100,20,40,15,90) ## [1] 100 20 40 15 90 vector <- c(100,20,40,15,90) Type Example numeric c(1,2,3) logi...
769 sym R (1633 sym/25 pcs) 2 img
html_ex
Sorry, Disney+ is not available in your region.Terms of UsePrivacy PolicyYour California Privacy RightsChildren's Online Privacy PolicyCareers© Disney. All Rights Reserved. ...
173 sym 1 img
BDM-Week4
Ch2. HTML 2.1 Brower presentation and source code HTML HTML’s marked up structured Markup definitions: the tags Web content is an interpreted version of the source code How the document is structured and the function of its various parts: headlines, links, tables, etc… Element inspector 2.2 Syntax rules Tags, elements, and attributes...
4042 sym R (2360 sym/19 pcs) 3 img
BDJ20-W6
Parsing Loading and representing the contents of HTML/XML files in an R session Inspecting content on the Web: browser to display HTML content nicely Importing HTML files into R and extracting info. from them: parser in R to construct useful representations of HTML documents What is parsing? Reading vs. Parsing Reading does not care to underst...
2832 sym R (9301 sym/68 pcs) 2 img
BDM-Week6
XPath Web scraping process Asking what information we are interested in and identifying where the information is located in a specific document Tailoring a query to the document and obtaining the desired information (Re)casting the extracted values into a format that facilitates further analysis XPath - a query language for web documents XPath...
1919 sym R (7541 sym/61 pcs) 3 img
Document
library(httr) url <- "http://www.r-datacollection.com/materials/ch-4-xpath/fortunes/fortunes.html" fortune <- httr::GET(url) library(XML) parsed_fortune <- htmlParse(fortune, encoding = "UTF-8") XPath - a query language for web documents XPath is a query language that is useful for addressing and extracting parts from HTML/XML documents. Te...
1415 sym R (28387 sym/74 pcs) 1 img
BDM-Week9
The circle of web scraping Information identification Choice of strategy Data retrieval Information extraction Data preparation Data validation Debugging and maintenance Generalization Retrieval scenarios library(RCurl) library(XML) library(stringr) Downloading ready-made files Data in CSV files. CSV election results data The Maryland State...
2101 sym R (164547 sym/146 pcs)
BDJ20-W9-1
What is stringr? stringr is a package designed specially for text pre-processing. This package provides three main families of useful functions to process strings more consistent, simpler and easier: Character manipulation: these functions manipulate individual characters within the strings in character vector objects Whitespace tools to add, re...
3039 sym R (37892 sym/47 pcs)