Publications by Phuong Linh

osmdata & ggplot: HCMC map

25.04.2022

HCMC Streetmap 1 Loading packages library(tidyverse) library(osmdata) library(extrafont) 2 Location definition # Area that we want to collect spatial data city_location <- "HCMC Vietnam" 3 OSM objects # OSM objects that we want to get spatial all_street_types <- c("motorway", "primary", "secondary", "tertiary") secondary_streets <- c("reside...

210 sym R (2959 sym/8 pcs) 2 img

cohort analysis by Month

22.05.2022

Clear Workspace rm(list = ls()) library(dplyr) library(readxl) library(xlsx) library(tidyr) library(data.table) Load data data <- read_excel("Data Track.xlsx") data2 <- read_excel("Data Track 2.xlsx") df <- data %>% full_join(data2, by="UserPhone") Data cleaning df <- df %>% mutate_if(is.numeric , replace_na, replace = 0) df_unique <- df %>% d...

456 sym R (8244 sym/27 pcs)

churn rate by week - part 1

14.05.2022

1 Clear Workspace rm(list = ls()) library(ggplot2) library(dplyr) library(pastecs) library(fpc) library(FactoMineR) library(readxl) library(xlsx) data <- read_excel("data_cohort analysis.xlsx") library(data.table) mydata <- subset(data, select = -c(1,4)) df <- melt(setDT(mydata), id.vars = c("profile_uuid","Start Date"), variable.name = "active_d...

451 sym R (10925 sym/35 pcs) 1 img 3 tbl

loop with apply family

13.05.2022

1 Objective Creation of Example Data Example 1: apply() Function Example 2: lapply() Function Example 3: sapply() Function Example 4: vapply() Function Example 5: tapply() Function Example 6: mapply() Function 2 Creation of Example Data my_data <- data.frame(x1 = 1:5, x2 = 2:6, x3 = 3) my_data ## x...

363 sym R (1153 sym/22 pcs)

Loop in R

13.05.2022

1 Loop Through Vector in R 1.1 Example 1 for(i in 1:10) { x1 <- i^2 print(x1) } ## [1] 1 ## [1] 4 ## [1] 9 ## [1] 16 ## [1] 25 ## [1] 36 ## [1] 49 ## [1] 64 ## [1] 81 ## [1] 100 1.2 Example 2 vec <- c(6, 3, 0, 9, 5) vec ## [1] 6 3 0 9 5 length(vec) ## [1] 5 for(i in 1:length(vec)) { out <- vec[i] + 10 print(out) } ## [1] 16 ## [1] 13 #...

522 sym R (2679 sym/37 pcs)

Text Analysis: numerical statistic tf-idf

30.04.2022

1 Clear Workspace rm(list = ls()) 2 Objective Project Gutenberg offers over 53,000 free books. This project will use four of Twain’s best novels for this analysis: Roughing It, Life on the Mississippi, The Adventures of Tom Sawyer, Adventures of Huckleberry Finn. 3 Load library library(tidyverse) library(tidyr) library(ggplot2) library(tidyt...

1236 sym R (7329 sym/24 pcs) 4 img 3 tbl

Unsupervised learning: Clustering method

28.04.2022

Note: The objective is not to predict a given outcome but to discover useful patterns in the data. 1 Clear Workspace rm(list = ls()) 2 Load packages library(ggplot2) library(dplyr) library(pastecs) library(fpc) library(FactoMineR) library(readxl) 3 Load data raw.data <- read_excel("segmentationdata.xlsx") data <- raw.data 4 Data wrangling 4....

1995 sym R (8830 sym/54 pcs) 13 img

Data Mining: Decision Tree vs Logistic Regression

29.04.2022

1 Load library library(ggplot2) library(dplyr) library(gridExtra) library(corrplot) diabetes <- read.csv('diabetes.csv') dim(diabetes) # number of row and column ## [1] 768 9 str(diabetes) # data structure ## 'data.frame': 768 obs. of 9 variables: ## $ Pregnancies : int 6 1 8 1 0 5 3 10 2 8 ... ## $ Glucose ...

1480 sym R (8564 sym/27 pcs) 4 img

Data Mining: Linear model vs Non-linear model

29.04.2022

1 Objective Compare model performance between Linear model (OLS) vs Non-linear model (e.g. Support Vector Machine, Random Forest) Performance metrics considered: Root mean squared error (RMSE) and R2 Select best model for Prediction 2 Clear Workspace rm(list = ls()) 3 Load library library(ggplot2) library(dplyr) library(pastecs) library(car) ...

2444 sym R (19816 sym/116 pcs) 8 img

Data Wrangling: Happy Planet Index

30.04.2022

1 Objective Data wrangling Correlation test PCA Pam Clustering World map 2 Clear Workspace rm(list = ls()) 3 Setup 3.1 Load packages library(xlsx) library(dplyr) library(plotly) library(stringr) library(cluster) library(FactoMineR) library(factoextra) library(ggplot2) library(reshape2) library(ggthemes) library(NbClust) 3.2 Load data hpi <- ...

1196 sym R (21322 sym/53 pcs) 14 img 2 tbl