Finding Most Influential Sets

Published in Proceedings of the International Conference on Machine Learning (ICML), 2026

Abstract: Identifying \emph{most influential sets} (MIS) — size-k subsets whose removal maximally changes a target estimand — is typically infeasible because it requires searching over n choose k subsets. For estimands with linear-fractional leave-set-out effects, we show that MIS selection reduces to a one-parameter sequence of top-k problems. Dinkelbach’s method yields an algorithm with O(n) cost per iteration and finite termination. For fixed residualized inputs, the algorithm returns a globally optimal set for the univariate ratio objective, including the oracle-residualized partial linear model. With estimated nuisance functions, uniform denominator and generated-score stability imply approximation to the first-order oracle orthogonal-score objective; exact set recovery follows under a separation condition. Simulations and applications show that the method recovers exact MIS that were previously computationally inaccessible.

Status: published

Co-author: Nikolas Kuschnig

Recommended citation: Konrad, L.D., Kuschnig, N. (2026). " Finding Most Influential Sets." Proceedings of the International Conference on Machine Learning (ICML).
Download Paper | Download Bibtex