Publications by BayesianN

important functions for string manipulation

18.12.2023

introduction most data cleaning processes involve working with structured and unstructured character/string datatypes . the ability to manipulate string data can be a super power Load in the necessary packages library(tidyverse) library(odbc) library(DBI) library(RSQLite) create a fake dataset in R using tribble() original_table<-tribble(~co...

2482 sym 9 tbl

Answering complex business questions using SQL

14.12.2023

tools evolve CREATE TABLE sales ( "customer_id" VARCHAR(1), "order_date" DATE, "product_id" INTEGER ); INSERT INTO sales ("customer_id", "order_date", "product_id") VALUES ('A', '2021-01-01', '1'), ('A', '2021-01-01', '2'), ('A', '2021-01-07', '2'), ('A', '2021-01-10', '3'), ('A', '2021-01-11', '3'), ('A', '2021-0...

6924 sym 3 img 18 tbl

job retention analysis

13.12.2023

Library Setup library(tidyverse) library(knitr) library(gtsummary) library(ggpubr) library(RColorBrewer) library(lemon) library(paletteer) library(survival) library(survminer) library(cowplot) library(rms) library(car) library(patchwork) options("encoding" = "UTF-8") knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE...

17947 sym Python (22615 sym/32 pcs) 12 img 1 tbl

probability distributions in R

29.11.2023

Introduction to Distribution Theory Probability theory Bongani Ncube 2023-11-29 Probability concepts The probability of an event (E) is the number of ways event E can occur divided by the total number of probable outcomes. We live in a world where decision making is based on conditions of uncertainty. It is therefore important to know the chanc...

8674 sym 7 img 4 tbl

R and SQL, a bit about both

27.11.2023

0.0.1 Load in required packages library(odbc) library(DBI) library(RSQLite) library(tidyverse) options(scipen = 999) 0.0.2 Read in data sales<-readr::read_csv("sales.csv") 0.0.3 What variables do we have names(sales) ## [1] "Order ID" "Product" "Quantity Ordered" "Price Each" ## [5] "Order Date" "Purchase Addr...

3280 sym 1 img 8 tbl

INTRODUCTION TO SQL window functions

22.11.2023

0.1 Window functions A window function performs an aggregate-like operation on a set of query rows. However, whereas an aggregate operation groups query rows into a single result row, a window function produces a result for each query row: 0.1.1 Anatomy of a window function FUNCTION_NAME() OVER() ORDER BY PARTITION BY ROWS/RANGE PRECEDING/F...

4012 sym 10 tbl

more sql queries

20.11.2023

library(tidyverse) library(odbc) library(DBI) library(RSQLite) ## read in the dataset df <- readr::read_csv("recipe_site_traffic_2212.csv") ## sample 100 observations and select first 3 variables set.seed(1123) data1<- df |> sample_n(size=25) |> select(1,2,3,6) ## sample 100 observations and select subsequent 3 variables(including...

3785 sym 21 tbl

airbnb dashboard in progress

20.11.2023

AIRBNB : Data Exploration Sidebar In this study i explored the AIRBNB dataset through data visualisation . The other goal was to also expand my knowledge of using flexdashboard for dashboard design Overview Most listed neighborhoods neighhood counts Measure Distribution correlations boxplots key findings Statistical correlations As expe...

1530 sym 13 img

SQL joins

16.11.2023

0.0.1 Introduction Greetings , hope you will enjoy SQL JOINS coming from an R afficionado who has dealt mainly with R joins . These notes are based on how much i have understood and spent some time looking for pictures from the internet to aid in the presentation. > The presentation assumes you are already familiar with a bit of SQL 0.0.2 Mut...

5685 sym 11 img

movie rating in SQL

12.11.2023

0.0.1 Explanatory data analysis still learn to work with date formats in SQL so will start my analysis using R again since , updating a database using the DBI package is still trivial , i have resorted to creating new columns USING R instead ,thus am Calculating the return on investment as the worldwide_gross/production_budget. dat_new<-dat_...

2137 sym