Publications by Shin Lee
BDJ20-W9-2
Regular Expression Last time, we learned some basic functions from the stringr package for handling and working with text in R. But in this course, we want to unleash the power of strings manipulation. So we are going to learn about regular expressions. What are Regular Expressions? The name “Regular Expression” does not say much. However, r...
13116 sym R (4152 sym/30 pcs)
BDM-TwitterAPI
Mining Social Web 빅데이터로서의 디지털 텍스트는 매우 다양한 방식으로 수집할 수 있고 또 그 도구 또한 끊임없이 확장되고 있습니다. 대표적인 빅데이터 수집 도구 중 하나는 트위터 API(Application Programming Interface)입니다. 트위터는 사회관계망 서비스이자 마이크로 �...
5069 sym R (522 sym/2 pcs)
BDM-W12
tinytex::install_tinytex() Review on web scraping library(XML) library(stringr) baseurl <- "https://movie.naver.com/movie/point/af/list.nhn?&page=" pages <- seq(from=1, to=1000, by=1) # Maximum of 1000 pages urls <- str_c(baseurl, pages) class(urls) length(urls) head(urls) tail(urls) Things to remember before web scraping Different XPATH expr...
3626 sym R (3511 sym/13 pcs)
rtweet tutorial
This R markdown comes from R documentation for the rtweet package at https://www.rdocumentation.org/packages/rtweet/versions/0.7.0 rweet rtweet provides users a range of functions designed to extract data from Twitter’s REST and streaming APIs. Installation To get the current released version from CRAN: #install.packages("httpuv") library(htt...
4039 sym R (3979 sym/8 pcs)
BDJ20-W11-2
Web scraping in practice library(XML) library(rvest) ## Loading required package: xml2 ## ## Attaching package: 'rvest' ## The following object is masked from 'package:XML': ## ## xml library(stringr) page <- readLines("https://news.daum.net/ranking/popular") page_parsed <- htmlParse(page) First Headline XPath: ‘//[@id="mArticle"]/di...
4368 sym R (102327 sym/188 pcs) 1 img
BDM-Week11
Web scraping in practice library(XML) library(rvest) ## Loading required package: xml2 ## ## Attaching package: 'rvest' ## The following object is masked from 'package:XML': ## ## xml library(stringr) page <- readLines("https://news.daum.net/ranking/popular") page_parsed <- htmlParse(page) First Headline XPath: ‘//[@id="mArticle"]/di...
4273 sym R (101686 sym/152 pcs) 1 img
BDJ20-W11-1
Web scraping in practice library(XML) library(rvest) ## Loading required package: xml2 ## ## Attaching package: 'rvest' ## The following object is masked from 'package:XML': ## ## xml library(stringr) page <- readLines("https://news.daum.net/ranking/popular") page_parsed <- htmlParse(page) First Headline XPath: ‘//[@id="mArticle"]/di...
4273 sym R (89338 sym/144 pcs) 1 img
BDJ20-Week10-1
Regular Expression A regular expression is a special string for describing a certain text pattern. Character classes Regex provides another useful constructs called character classes that are used to match a certain class of characters. The most common character classes in most regex engines are: Character Matches Same as \\d any digit [0-9] ...
4596 sym R (1211 sym/17 pcs)
BDM-Week10
Web scraping in practice library(XML) library(rvest) ## Loading required package: xml2 ## ## Attaching package: 'rvest' ## The following object is masked from 'package:XML': ## ## xml library(stringr) page <- readLines("https://news.daum.net/ranking/popular") page_parsed <- htmlParse(page) First Headline XPath: ‘//[@id="mArticle"]/di...
4273 sym R (87011 sym/126 pcs) 1 img
BDM-Week11-rtweet
This R markdown comes from R documentation for the rtweet package at https://www.rdocumentation.org/packages/rtweet/versions/0.7.0 rweet rtweet provides users a range of functions designed to extract data from Twitter’s REST and streaming APIs. Installation To get the current released version from CRAN: #install.packages("httpuv") library(ht...
871 sym R (985 sym/8 pcs)