Publications by Anna Shirokanova
Tidy predict
mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv") head(mydata) ## admit gre gpa rank ## 1 0 380 3.61 3 ## 2 1 660 3.67 3 ## 3 1 800 4.00 1 ## 4 1 640 3.19 4 ## 5 0 520 2.93 4 ## 6 1 760 3.00 2 mydata$rank <- factor(mydata$rank) mylogit <- glm(admit ~ gre + gpa + rank, data = mydata...
193 sym R (3400 sym/15 pcs) 2 img
ANOVA in R
One-way analysis of variance (ANOVA) in R Anna Shirokanova and Olesya Volchenko March 2, 2020 + update 2021 This seminar Recap of one-way ANOVA Example of one-way ANOVA assumptions post hoc tests Non-parametric equivalent of ANOVA Effect size ANOVA is used: to compare means of three or more independent (unrelated) groups and determine whe...
9527 sym R (8795 sym/48 pcs) 11 img 1 tbl
Comparing Two Means in R
Comparing Two Means Anna Shirokanova and Olesya Volchenko February 17, 2020 Let’s compare mean values Mean values refer to continuous variables. Mean values can be compared across groups. If there are two groups, t-test is the parametric test you need to make a conclusion about the two populations. If there are more than two groups, use one...
9639 sym R (6884 sym/33 pcs) 18 img
Clusters 102
Clusters 102 Anna Shirokanova 10 02 2020 Example 1: Fisher’s Irises We know the species from the beginning, thus, can compare solutions IRL, ‘species’ is unknown, and the solution has to be tested against reality In this dataset, the object features are metric only. Start by describing the data and looking into most different variabl...
11266 sym R (31068 sym/112 pcs) 35 img 1 tbl
Logistic Regression Practice in R
Logistic Regression Anna 27 January 2020 ‘Look, it’s binary!’ Recall the logistic regression assumptions: Sample size: logit models require more cases than OLS regression because they use maximum likelihood estimation techniques, not OLS (recommended n >= 400, both for train and test) Maximum likelihood means that “the parameter estim...
15548 sym R (18540 sym/107 pcs) 18 img
Data Manipulation and Basic Stats in R
Behold, ye creatures, basic data manipulation. Act I Volchenko, Shirokanova January 26, 2020 Goals of this class Load the data Select relevant variables Name the variables the way you like Create a subset based on criteria Recode variables when you need this Summarise the dataset as a whole and by groups Create suitable graphs to summarize o...
4581 sym R (16782 sym/62 pcs) 13 img
Binary Logistic Regression
Logistic Regression Anna 01 November, 2020 Linear regression vs. Binary logistic regression See: https://thestatsgeek.com/2014/02/08/r-squared-in-logistic-regression/ Learning objectives for this class: go through an example of binary logistic regression its differences from linear regression interpretation of results Look, it’s binary...
18744 sym R (23362 sym/108 pcs) 23 img
Scrape + clean + visualize web data
Inspired by: https://www.youtube.com/watch?v=l37n_HDD1qs library(tidyverse) library(stringr) library(purrr) library(rvest) library(robotstxt) # paths_allowed(paths = c("https://www.amazon.com/Best-Sellers-Unlocked-Cell-Phones/zgbs/wireless/2407749011")) phones <- read_html("https://www.amazon.com/Best-Sellers-Unlocked-Cell-Phones/zgbs/wir...
1379 sym R (17822 sym/14 pcs) 2 img
An Artsy Corrplot in Plotly
Source: https://towardsdatascience.com/beautiful-correlation-plots-in-r-a-new-approach-d3b93d9c77be Whenever you think about heatmaps or coloured corrplots, there is a place between science and art. Learn to adjust plotly objects according to your needs by exploring this script (see the source). Mind the differences it takes to adjust the looks o...
915 sym R (6303 sym/17 pcs) 1 img
Python Integration into an R Script
R part Here is your main project in R: data <- mtcars str(data) ## 'data.frame': 32 obs. of 11 variables: ## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... ## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... ## $ disp: num 160 160 108 258 360 ... ## $ hp : num 110 110 93 110 175 105 245 62 95 123 ... ## $ drat: num 3.9 3.9 ...
1100 sym R (3130 sym/24 pcs) 2 img