Publications by Nikita Sleptcov
Documents_example
## Corpus consisting of 39 documents, showing 39 documents: ## ## Text Types Tokens Sentences num ## 054_AnnualReport_HuskyEnergy_2019.txt 6996 88257 2349 54 ## 055_AnnualReport_HuskyEnergy_2018.txt 6756 88275 2368 55 ## 056_AnnualReport_HuskyEnergy_2017.txt 6901 ...
34 sym R (16045 sym/8 pcs) 11 img
WP2
library(pdftools) library(readtext) library(quanteda) ## Package version: 2.0.1 ## Parallel computing: 2 of 4 threads used. ## See https://quanteda.io for tutorials and examples. ## ## Attaching package: 'quanteda' ## The following object is masked from 'package:utils': ## ## View library(tm) ## Loading required package: NLP ## ## Attachin...
39 sym R (21528 sym/52 pcs) 12 img
KeyATM Example
# # Work Package 2: Textual Analysis # # Read in data, turn into a df aif_corpus <- corpus(text_df, text_field = "text", docvars = c("num", "type", "company", "date")) aif_tok <- tokens(aif_corpus, remove_numbers = TRUE, split_hyphens = TRUE, remove_punct = TRUE, ...
1617 sym R (87569 sym/49 pcs) 6 img
Uncertainty draft project
This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (randomly selected by me, n=132). text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), encoding = "UTF-8", docvarsfrom = "filenames", ...
2403 sym R (75834 sym/22 pcs) 4 img
KeyATM example model for Uni- and Bigrams
This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (randomly selected by me, n=132). text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), encoding = "UTF-8", docvarsfrom = "filenames", ...
2216 sym R (75984 sym/23 pcs) 4 img
Code for NOTs
This document will present some basic statistical treatments performed on a corpus of Annual Information Forms (n=132). This time, the corpus is reduced to n-grams (phrases) containing NOT text_df <- readtext(paste0("~/Google Drive File Stream/My Drive/R/Projects/Work package 2/Frames/AIF", "*"), encoding = "UTF-8", ...
1533 sym R (240786 sym/8 pcs) 7 img
Presentation on categorical data analysis
For categorical variables such as gender (for most of the surveys) we can test few things only, unfortunately. Let’s look at frequencies for gender identity, for example. Frequencies sncs_data$gender Type: Character Valid Total gender Freq % % Cum. % % Cum. Man 177 40.78 40.78 36.27 36.27 Woman 257 59.22 100.00 52.66 88.93 <NA> 54 11.0...
2236 sym
Trump's Twitter Quick Look
Quick look at twitter of @RealDonaldTrump Mining 3000 most recent Trump’s tweets What Trumps tweets from Time of day Trump usually tweets Day of the week Trump tweets the most The most frequently used words in his tweets Sentiment Analysis The first graph shows a sentiment analysis of his tweets during a period of time. The second graph sho...
1018 sym 7 img
Sample of the docs (33%)
This document will present some basic statistics performed on a corpus of strategic corporate communication (Work Package 2: Textual Data). I have random sampled 33% (can be changed) of the total number of documents that comes down to n=1398. Because the method is probabilistic, I need to set seed for “replicability” of results. This table sh...
2051 sym R (15055 sym/11 pcs) 4 img