Alert button
Picture for Zhehao Li

Zhehao Li

Alert button

Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach

Jun 24, 2023
Jian Chen, Zhehao Li, Xiaojie Mao

Figure 1 for Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach
Figure 2 for Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach
Figure 3 for Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach
Figure 4 for Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach

We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labeled data may lead to severely biased results when deployed to the full population. Our paper tackles this challenge by exploiting the fact that in many applications the historical decisions were made by a set of heterogeneous decision-makers. In particular, we analyze this setup in a principled instrumental variable (IV) framework. We establish conditions for the full-population risk of any given prediction rule to be point-identified from the observed data and provide sharp risk bounds when the point identification fails. We further propose a weighted learning approach that learns prediction rules robust to the label selection bias in both identification settings. Finally, we apply our proposed approach to a semi-synthetic financial dataset and demonstrate its superior performance in the presence of selection bias.

Viaarxiv icon

Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach

Jun 13, 2023
Jian Chen, Zhehao Li, Xiaojie Mao

Figure 1 for Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach
Figure 2 for Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach
Figure 3 for Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach
Figure 4 for Learning under Selective Labels with Heterogeneous Decision-makers: An Instrumental Variable Approach

We study the problem of learning with selectively labeled data, which arises when outcomes are only partially labeled due to historical decision-making. The labeled data distribution may substantially differ from the full population, especially when the historical decisions and the target outcome can be simultaneously affected by some unobserved factors. Consequently, learning with only the labeled data may lead to severely biased results when deployed to the full population. Our paper tackles this challenge by exploiting the fact that in many applications the historical decisions were made by a set of heterogeneous decision-makers. In particular, we analyze this setup in a principled instrumental variable (IV) framework. We establish conditions for the full-population risk of any given prediction rule to be point-identified from the observed data and provide sharp risk bounds when the point identification fails. We further propose a weighted learning approach that learns prediction rules robust to the label selection bias in both identification settings. Finally, we apply our proposed approach to a semi-synthetic financial dataset and demonstrate its superior performance in the presence of selection bias.

Viaarxiv icon