Publications by Shin Lee

HMI-W7-1

25.04.2020

Learnig Objectives Understand the task of supervised machine learning, and learn about feature representation Learn about the way in which textual data are applied to machine learning algorithms Introduce tidy data principles and see how to make data tidy with the functions from the magrittr and dplyr packages. See how the tidytext package appli...

11170 sym R (18855 sym/38 pcs) 1 img 3 tbl

Human Media Interaction: Week 6

21.04.2020

Ch 5: Basic Text Processing Learning Goals Understand some of the basic text processing steps such as tokenization, stop word removal, stemming, and lemmatization Basic Text (Pre-)Processing Automated text analysis always requires some form of text processing. Consider the following example of a tweet: Today’s the day, ladies and gents. Mr. ...

9202 sym R (103004 sym/66 pcs) 1 img 2 tbl

ITM-Week6-2

20.04.2020

Tokenization using tidytext 자, 이제는 “tidy” 데이터가 무엇인지, 그 원리를 살펴보고 이를 기반으로 한 어휘 빈도수 분석에 효율적인 함수를 제공하는 “tidytext” 패키지를 소개해드리도록 하겠습니다. 특히, 이 패키지의 unnest_tokens() 함수는 텍스트 전처리에 매우 편�...

6855 sym R (32950 sym/98 pcs) 12 img

Week5

14.04.2020

Ch. 4: Lexical Resources Learning Objectives Learn about the representation and content of two lexical resources, LIWC and Bing Learn about how to tokenize texts into words using the stringr package Learn about what regex is and how it is used for tokenization What is a lexicon resource Lexical resource is a collection of lexical items such as...

21427 sym R (4512 sym/47 pcs) 5 tbl

Week5

13.04.2020

텍스트 마이닝 작업 순서 1. 자료 수집과 분석을 위해 사용할 SW 선택하기 대단위의 텍스트를 컴퓨터로 분석하기 위한 SW는 많이 있지만, 그중에서 R은 “데이터과학”의 영역에서 애용되고 있는 프로그래밍 환경이다. 그 이유는 R이 무료이면서 오픈소스라는 장점�...

4147 sym R (34346 sym/107 pcs) 2 img

TM_Week4

12.04.2020

CSV 데이터 자, 그럼 제가 모은 트윗 데이터를 자료실에 올려두었는데요. 각자 다운로드 받도록 하겠습니다. 각자 컴퓨터에 다운로드 받은 파일을 RStudio의 Working Directory 즉, 작업 폴더에 옮기도록 하겠습니다. 이제, load() 함수를 이용해서, 해당 파일을 불러오겠습니...

656 sym R (9997604 sym/5 pcs)

Week 4: R script

07.04.2020

R Base Functions for Text Pre-Processing Text mining begins with understanding text data in natural language. It is the act of pre-processing text into data that are appropriate to analysis. Today, we will see how R can be used for text pre-processing. However, we will not install any package for text analysis, nothing but a couple of ones for pr...

7509 sym R (370682 sym/117 pcs) 1 img 1 tbl

텍스트마이닝개론 Week 2

23.03.2020

R에서 프로젝트 사용하기 RStudio 프로젝트를 사용하면 작업 공간과 문서를 쉽게 관리 할 수 있습니다. 프로젝트 만들기 RStudio 프로젝트는 R 작업 디렉토리와 연결됩니다. RStudio 프로젝트를 다음과 같은 곳에 만들 수 있습니다. 새로운 디렉토리에서 또는 이미 R 코�...

3596 sym R (5815 sym/96 pcs)

텍스트마이닝개론 Week 3

30.03.2020

Feature Analysis? 분석하고자 하는 텍스트의 의미와 숨은 구조를 가장 잘 보여주는 특징, 성격, 또는 차원이 무엇인지 결정하는 작업 예: 어휘, 동의어, 등장 인물과 관계도, 의미 구조 앞으로 우리가 다루게 될 주제는 대부분 “특징(feature) 추출”과 “지식 공학,” 즉 ...

8093 sym R (71438 sym/140 pcs) 1 img 2 tbl

HMI-W7-2

28.04.2020

Learnig Objectives Understand the task of supervised machine learning, and learn about feature representation Learn about the way in which textual data are applied to machine learning algorithms Introduce tidy data principles and see how to make data tidy with the functions from the magrittr and dplyr packages. See how the tidytext package appli...

3043 sym R (11653 sym/43 pcs) 1 img