Publications by Nguyen Chi Dung
Bayesian Optimization for searching optimal Recall for XGBoost Classifier (Python)
Motivations Tinh chỉnh tham số để tìm tham số tối ưu cho mô hình là công việc tốn thời gian và nặng nhọc. Bayesian Optimization là một cách tiếp cận hiệu quả để tinh chỉnh tham số cho các mô hình Machine Learning. Data used and results Trước hết train và đánh giá một loạt Machine Lea...
9840 sym R (5580 sym/3 pcs)
Should we impute missing data? (Python)
Motivations The real-world data often has a lot of missing values. The cause of missing values can be data corruption or failure to record data. Handling missing data is important as many machine learning algorithms do not support data with missing values. However, in the case of XGBoost we may not need to impute missing data before training XGbo...
4396 sym R (1735 sym/1 pcs)
Data pre-processing for Kalapa Credit Scoring Challenge
Motivations According to an article by New York Times: Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets. A case from Kalapa Credit Scoring Challenge # Clear...
16492 sym R (10816 sym/5 pcs)
Compare Maximum Profit between Random Forest and Logistic (tidymodels)
Introduction I will write something later… # https://github.com/tidymodels/themis # https://www.tidymodels.org/ # https://www.tmwr.org/ # https://juliasilge.com/blog/xgboost-tune-volleyball/ # Clear our workspace: rm(list = ls()) # Load hmeq.csv dataset: library(tidyverse) hmeq <- read_csv("http://www.creditriskanalytics.net/uploads...
11663 sym R (8078 sym/4 pcs) 3 img
Shapes of Histograms
Introduction The purpose of drawing histograms, like that of all other statistical techniques, is to acquire information. Once we have the information, we frequently need to describe what we’ve learned to others. We describe the shape of histograms on the basis of the following characteristics. Symmetry: A histogram is said to be symmetric if, ...
6769 sym R (1956 sym/1 pcs) 1 img
Infographics Using R
Introduction to Infographics Infographics are graphic visual representations of information, data, or knowledge intended to present information quickly and clearly. They can improve cognition by utilizing graphics to enhance the human visual system’s ability to see patterns and trends. Similar pursuits are information visualization, data visual...
11584 sym R (5588 sym/1 pcs) 1 img
Hierarchical Clustering
Introduction The previous post provided a practical application of using K-means Clustering to real-world dataset. Hierarchical clustering is an alternative approach to k-means clustering for identifying groups in the dataset. It does not require us to pre-specify the number of clusters to be generated as is required by the k-means approach. Furt...
10940 sym R (2047 sym/5 pcs) 3 img 1 tbl
Reshaping Data from Wide to Long (and Vice Versa)
Introduction Reshaping Data is about changing the way data is organized into rows and columns. There are situations when we need convert data to Long Form from Wide Form (and vice versa). Examples of Resharping Data We use tidyr::pivot_longer() for converting to long from wide: # Generate a fake data for purpose of explanation: set.seed(29) ...
3245 sym R (887 sym/3 pcs) 3 tbl
Administrative Map of Vietnam + Spratly and Paracel Islands
Introduction Choropleth Maps là một trong những công cụ hình ảnh hóa dữ liệu hữu ích và được sử dụng phổ biến. Sử dụng công cụ hình ảnh này có truyền tải một cách nhanh chóng và ấn tượng, ví dụ, thị phần của hãng Coca Cola tại các tỉnh ở VN ra sao hoặc tỉ lệ đói ng...
21271 sym R (10000 sym/14 pcs) 8 img 1 tbl
Tidy Tuesday Project for Data Science
R Codes creating Plot # Some nice projects: 1. https://github.com/cnicault/tidytuesday # 2. https://github.com/rfordatascience/tidytuesday # 3. https://github.com/zhiiiyang/tidytuesday # 4. https://github.com/jack-davison/TidyTuesday # Load some libraries: library(tidyverse) lib...
9800 sym R (6772 sym/1 pcs) 1 img