Publications by YoungStatS

Regularization by Noise for Stochastic Differential and Stochastic Partial Differential Equations

02.06.2022

Regularization by Noise for Stochastic Differential and Stochastic Partial Differential Equations The regularizing effects of noisy perturbations of differential equations is a central subject of stochastic analysis. Recent breakthroughs initiated a new wave of interest, particularly concerning non-Markovian, infinite dimensional, and rough-stoc...

1535 sym 2 img

Making an Online Risk Calculator

10.11.2020

We first started making risk calculators with the website, http://nomograms.org. There are several here, and they are patient friendly. Later, we launched rcalc.ccf.org, which has many more risk calculators, although the intended audience is the clinician. It is considerably more expensive and time-consuming to make these patient friendly. There ...

3913 sym R (831 sym/2 pcs) 10 img

Online cash register data in the measurement of retail trade turnover

02.12.2020

The Hungarian Central Statistical Office uses online cash register data for the measurement of retail trade and food services. The data source allows the statistical office to considerably cut administrative burden of data providers without quality loss in retail trade statistics. Online cash register data is widely used in tracking the latest de...

5789 sym 8 img

Functional Regression Control Chart: a New Framework for Profile Monitoring

03.12.2020

Introduction New statistical process control (SPC) methods have to be developed in order to handle more and more complex data, which are available because of the advent of new data acquisition technologies. In particular, in many practical situations the quality characteristic of a process can be modelled as a function defined on a compact domain...

9455 sym 14 img 1 tbl

Causal discovery in the presence of discrete latent variables

14.12.2020

We address the problem of causal structure learning in the presence of hidden variables. Given a target variable and a vector of covariates, we are trying to infer the set of observable causal parents of the target variable. There are many good reasons for being interested in causal predictors. Given a target variable $Y$, and a vector $X = (X^1,...

10996 sym 6 img

The Mulitple Latent Block Model for mixed data

04.01.2021

Abstract Co-clustering techniques, which group observations and features simultaneously, have proven to be efficient in summarising data sets. They exploit the dualism between rows and columns and the data set is summarized in blocks (the crossing of a row-cluster and a column-cluster). However, in the case of mixed data sets (with features of di...

11771 sym 10 img

Machine learning for causal inference that works

25.01.2021

I’ve kindly been invited to share a few words about a recent paper my colleagues and I published in Bayesian Analysis: “Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects”. In that paper, we motivate and describe a method that we call Bayesian causal forests (BCF), which is now imple...

9473 sym R (2827 sym/7 pcs) 2 img

PLS for Big Data: A unified parallel algorithm for regularised group PLS

27.01.2021

We look at the problem of learning latent structure between two blocks of data through the partial least squares (PLS) approach. These methods include approaches for supervised and unsupervised statistical learning. We review these methods and present approaches to decrease the computation time and scale the method to big data Given two blocks of...

5793 sym 2 img

Locally adapative k-nearest neighbour classification

30.01.2021

Abstract Binary classification is one of the cornerstones of modern data science, but, until recently, our understanding of classical methods such as the k-nn algorithm was limited to settings where feature vectors were compactly supported. Based on a new analysis of this classifier, we propose a variant with significantly lower risk for heavy-ta...

7804 sym 10 img

Give me an adequate correlation: assessing relationships in percentage (or proportional) data

03.02.2021

Correlations and negative bias We assume that you are quite familiar with the following problem. Consider a data set where the information is expressed in percentages or proportions. An example are household expenditures, given as average amounts (in Euros) the households are spending on food, housing, transportation, etc. Since the expenditures ...

11062 sym R (2309 sym/1 pcs) 12 img