Publications by Nikita Sleptcov

Documents_example

05.05.2020

## Corpus consisting of 39 documents, showing 39 documents: ## ## Text Types Tokens Sentences num ## 054_AnnualReport_HuskyEnergy_2019.txt 6996 88257 2349 54 ## 055_AnnualReport_HuskyEnergy_2018.txt 6756 88275 2368 55 ## 056_AnnualReport_HuskyEnergy_2017.txt 6901 ...

34 sym R (16045 sym/8 pcs) 11 img

WP2

01.05.2020

library(pdftools) library(readtext) library(quanteda) ## Package version: 2.0.1 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked from 'package:utils': ## ## View library(tm) ## Loading required package: NLP ## ## Attachin...

39 sym R (21528 sym/52 pcs) 12 img

KeyATM Example

11.09.2020

# # Work Package 2: Textual Analysis # # Read in data, turn into a df aif_corpus <- corpus(text_df, text_field = "text", docvars = c("num", "type", "company", "date")) aif_tok <- tokens(aif_corpus, remove_numbers = TRUE, split_hyphens = TRUE, remove_punct = TRUE, ...

1617 sym R (87569 sym/49 pcs) 6 img

Uncertainty draft project

08.10.2020

This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (randomly selected by me, n=132). text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), encoding = "UTF-8", docvarsfrom = "filenames", ...

2403 sym R (75834 sym/22 pcs) 4 img

KeyATM example model for Uni- and Bigrams

07.10.2020

This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (randomly selected by me, n=132). text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), encoding = "UTF-8", docvarsfrom = "filenames", ...

2216 sym R (75984 sym/23 pcs) 4 img

Code for NOTs

08.10.2020

This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (n=132). This time, the corpus is reduced to n-grams (phrases) containing NOT text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), encoding = "UTF-8", ...

1533 sym R (240786 sym/8 pcs) 7 img

Presentation on categorical data analysis

20.12.2020

For categorical variables such as gender (for most of the surveys) we can test few things only, unfortunately. Let’s look at frequencies for gender identity, for example. Frequencies sncs_data$gender Type: Character Valid Total gender Freq % % Cum. % % Cum. Man 177 40.78 40.78 36.27 36.27 Woman 257 59.22 100.00 52.66 88.93 <NA> 54 11.0...

2236 sym

Trump's Twitter Quick Look

07.11.2020

Quick look at twitter of @RealDonaldTrump Mining 3000 most recent Trump’s tweets What Trumps tweets from Time of day Trump usually tweets Day of the week Trump tweets the most The most frequently used words in his tweets Sentiment Analysis The first graph shows a sentiment analysis of his tweets during a period of time. The second graph sho...

1018 sym 7 img

Sample of the docs (33%)

01.11.2020

This document will present some basic statistics performed on a corpus of strategic corporate communication (Work Package 2: Textual Data). I have random sampled 33% (can be changed) of the total number of documents that comes down to n=1398. Because the method is probabilistic, I need to set seed for “replicability” of results. This table sh...

2051 sym R (15055 sym/11 pcs) 4 img