Publications by Liam

Rfast

11.11.2022

Rfast tutorial Mi Lin 2022-11-12 什么是Rfast Rfast 提供了一系列高效的分析函数，例如R中自带的outer函数，这个向量可以计算两个向量的外积(outer product) require(Rfast) ## Loading required package: Rfast ## Loading required package: Rcpp ## Loading required package: RcppZiggurat x <- 1:9; names(x) <- x y <- 2:...

2133 sym

interpretable Machine Learning

18.11.2022

book：https://ema.drwhy.ai/introduction.html Which variables contribute to the selected prediction 加性归因的分解图在试图理解模型对单个观察的预测时，最常见的问题可能是：哪些变量对这个结果的贡献最大？没有单一的最佳方法可以用来回答这个问题。我们介绍了分解 (BD) 图，�...

6285 sym R (18205 sym/82 pcs) 23 img

dealwithdataframe

24.11.2022

tidy corrrelation library(inspectdf) inspect_cor(iris) ## # A tibble: 6 × 7 ## col_1 col_2 corr p_value lower upper pcnt_nna ## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Petal.Width Petal.Length 0.963 1.73e-31 0.949 0.973 100 ## 2 Petal.Length Sepal.Length 0.872 4.13e-26 0.827 0....

203 sym R (2258 sym/15 pcs) 1 img

fst和arrow

25.11.2022

fst（https://www.fstpackage.org/）提供了一种快速、简单和灵活的方法来序列化数据帧。其读写文件的速度更快。基本用法 # Generate some random data frame with 10 million rows and various column types nr_of_rows <- 1e7 df <- data.frame( Logical = sample(c(TRUE, FALSE, NA), prob = c(0.85, 0.1, 0.05), nr_of_r...

286 sym 1 img

变量特征自动选择

25.11.2022

rSAFE软件可用于对复杂模型提取特征，然后用于拟合更简单的可解释模型，提高其整体性能。 library(rSAFE) ## Welcome to rSAFE (version: 0.1.4). head(apartments) ## m2.price construction.year surface floor no.rooms district ## 1 5897 1953 25 3 1 Srodmiescie ## 2 1818 ...

258 sym R (553230 sym/26 pcs) 4 img

factorMerger

25.11.2022

factorMerger是一组支持事后测试的工具，可以提取与给定响应相关的因素的层次结构。其基本的工作流程如图所示: cheatsheet: https://raw.githubusercontent.com/ModelOriented/factorMerger/master/materials/factorMerger-cheatsheet.png 来看一个例子，代码如下所示。 library(factorMerger) ## Welcome to ...

217 sym R (1858 sym/10 pcs) 3 img

变量重要性vip

25.11.2022

Model-based variable importance Permutation-based variable importance SHAP-based variable importance PDP/ICE-based variable importance 常用函数 https://koalaverse.github.io/vip/reference/index.html 模型指标：metric_函数提取公式：get_formula() 列举指标：list_metrics() 变量重要性：vi() 基于 ICE 的变量重要性：v...

1531 sym R (9957 sym/55 pcs) 12 img

可解释机器学习book

25.11.2022

可解释性 https://christophm.github.io/interpretable-ml-book/interpretability.html 很难（从数学上）定义可解释性。我喜欢 Miller (2017) 3对可解释性的（非数学）定义：可解释性是人类理解决策原因的程度。另一个是：可解释性是人类可以一致地预测模型结果的程度 4. 机器学习�...

827 sym

模型验证相关auditor

25.11.2022

auditor用于预测模型的可视化探索、解释和调试。 rm(list = ls()) library(auditor) library(randomForest) ## randomForest 4.7-1.1 ## Type rfNews() to see new features/changes/bug fixes. library(DALEX) ## Welcome to DALEX (version: 2.4.2). ## Find examples and detailed introduction at: http://ema.drwhy.ai/ ## ## Attaching package: '...

327 sym R (2912 sym/26 pcs) 1 img

tidyfst

25.11.2022

tidyfst是一个以data.table为后端的 tidy 数据操作动词工具包。 github ：https://github.com/hope-data-science/tidyfst cheatsheet：https://github.com/hope-data-science/tidyfst/blob/master/docs/tidyfst_cheatsheet.pdf 如果有需要，可以直接使用datatable #basic usage filter row library(tidyfst) ## ## Life's short, use R. l...

711 sym R (16747 sym/43 pcs)