Publications by Anna Shirokanova

Tidy predict

10.03.2020

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv") head(mydata) ## admit gre gpa rank ## 1 0 380 3.61 3 ## 2 1 660 3.67 3 ## 3 1 800 4.00 1 ## 4 1 640 3.19 4 ## 5 0 520 2.93 4 ## 6 1 760 3.00 2 mydata$rank <- factor(mydata$rank) mylogit <- glm(admit ~ gre + gpa + rank, data = mydata...

193 sym R (3400 sym/15 pcs) 2 img

ANOVA in R

01.03.2020

One-way analysis of variance (ANOVA) in R Anna Shirokanova and Olesya Volchenko March 2, 2020 + update 2021 This seminar Recap of one-way ANOVA Example of one-way ANOVA assumptions post hoc tests Non-parametric equivalent of ANOVA Effect size ANOVA is used: to compare means of three or more independent (unrelated) groups and determine whe...

9527 sym R (8795 sym/48 pcs) 11 img 1 tbl

Comparing Two Means in R

18.02.2020

Comparing Two Means Anna Shirokanova and Olesya Volchenko February 17, 2020 Let’s compare mean values Mean values refer to continuous variables. Mean values can be compared across groups. If there are two groups, t-test is the parametric test you need to make a conclusion about the two populations. If there are more than two groups, use one...

9639 sym R (6884 sym/33 pcs) 18 img

Clusters 102

11.02.2020

Clusters 102 Anna Shirokanova 10 02 2020 Example 1: Fisher’s Irises We know the species from the beginning, thus, can compare solutions IRL, ‘species’ is unknown, and the solution has to be tested against reality In this dataset, the object features are metric only. Start by describing the data and looking into most different variabl...

11266 sym R (31068 sym/112 pcs) 35 img 1 tbl

Logistic Regression Practice in R

28.01.2020

Logistic Regression Anna 27 January 2020 ‘Look, it’s binary!’ Recall the logistic regression assumptions: Sample size: logit models require more cases than OLS regression because they use maximum likelihood estimation techniques, not OLS (recommended n >= 400, both for train and test) Maximum likelihood means that “the parameter estim...

15548 sym R (18540 sym/107 pcs) 18 img

Data Manipulation and Basic Stats in R

26.01.2020

Behold, ye creatures, basic data manipulation. Act I Volchenko, Shirokanova January 26, 2020 Goals of this class Load the data Select relevant variables Name the variables the way you like Create a subset based on criteria Recode variables when you need this Summarise the dataset as a whole and by groups Create suitable graphs to summarize o...

4581 sym R (16782 sym/62 pcs) 13 img

Binary Logistic Regression

02.10.2020

Logistic Regression Anna 01 November, 2020 Linear regression vs. Binary logistic regression See: https://thestatsgeek.com/2014/02/08/r-squared-in-logistic-regression/ Learning objectives for this class: go through an example of binary logistic regression its differences from linear regression interpretation of results Look, it’s binary...

18744 sym R (23362 sym/108 pcs) 23 img

Scrape + clean + visualize web data

02.10.2020

Inspired by: https://www.youtube.com/watch?v=l37n_HDD1qs library(tidyverse) library(stringr) library(purrr) library(rvest) library(robotstxt) # paths_allowed(paths = c("https://www.amazon.com/Best-Sellers-Unlocked-Cell-Phones/zgbs/wireless/2407749011")) phones <- read_html("https://www.amazon.com/Best-Sellers-Unlocked-Cell-Phones/zgbs/wir...

1379 sym R (17822 sym/14 pcs) 2 img

An Artsy Corrplot in Plotly

25.01.2021

Source: https://towardsdatascience.com/beautiful-correlation-plots-in-r-a-new-approach-d3b93d9c77be Whenever you think about heatmaps or coloured corrplots, there is a place between science and art. Learn to adjust plotly objects according to your needs by exploring this script (see the source). Mind the differences it takes to adjust the looks o...

915 sym R (6303 sym/17 pcs) 1 img

Python Integration into an R Script

22.01.2021

R part Here is your main project in R: data <- mtcars str(data) ## 'data.frame': 32 obs. of 11 variables: ## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... ## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... ## $ disp: num 160 160 108 258 360 ... ## $ hp : num 110 110 93 110 175 105 245 62 95 123 ... ## $ drat: num 3.9 3.9 ...

1100 sym R (3130 sym/24 pcs) 2 img