-
Evaluating Fairness in Black-box Algorithmic Markets: A Case Study of Ride Sharing in Chicago
Authors:
Yuhan Liu,
Yuhan Zheng,
Siyuan Zhang,
Lydia T. Liu
Abstract:
This study examines fairness within the rideshare industry, focusing on both drivers' wages and riders' trip fares. Through quantitative analysis, we found that drivers' hourly wages are significantly influenced by factors such as race/ethnicity, health insurance status, tenure to the platform, and working hours. Despite platforms' policies not intentionally embedding biases, disparities persist b…
▽ More
This study examines fairness within the rideshare industry, focusing on both drivers' wages and riders' trip fares. Through quantitative analysis, we found that drivers' hourly wages are significantly influenced by factors such as race/ethnicity, health insurance status, tenure to the platform, and working hours. Despite platforms' policies not intentionally embedding biases, disparities persist based on these characteristics. For ride fares, we propose a method to audit the pricing policy of a proprietary algorithm by replicating it; we conduct a hypothesis test to determine if the predicted rideshare fare is greater than the taxi fare, taking into account the approximation error in the replicated model. Challenges in accessing data and transparency hinder our ability to isolate discrimination from other factors, underscoring the need for collaboration with rideshare platforms and drivers to enhance fairness in algorithmic wage determination and pricing.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
On the Actionability of Outcome Prediction
Authors:
Lydia T. Liu,
Solon Barocas,
Jon Kleinberg,
Karen Levy
Abstract:
Predicting future outcomes is a prevalent application of machine learning in social impact domains. Examples range from predicting student success in education to predicting disease risk in healthcare. Practitioners recognize that the ultimate goal is not just to predict but to act effectively. Increasing evidence suggests that relying on outcome predictions for downstream interventions may not ha…
▽ More
Predicting future outcomes is a prevalent application of machine learning in social impact domains. Examples range from predicting student success in education to predicting disease risk in healthcare. Practitioners recognize that the ultimate goal is not just to predict but to act effectively. Increasing evidence suggests that relying on outcome predictions for downstream interventions may not have desired results.
In most domains there exists a multitude of possible interventions for each individual, making the challenge of taking effective action more acute. Even when causal mechanisms connecting the individual's latent states to outcomes is well understood, in any given instance (a specific student or patient), practitioners still need to infer -- from budgeted measurements of latent states -- which of many possible interventions will be most effective for this individual. With this in mind, we ask: when are accurate predictors of outcomes helpful for identifying the most suitable intervention?
Through a simple model encompassing actions, latent states, and measurements, we demonstrate that pure outcome prediction rarely results in the most effective policy for taking actions, even when combined with other measurements. We find that except in cases where there is a single decisive action for improving the outcome, outcome prediction never maximizes "action value", the utility of taking actions. Making measurements of actionable latent states, where specific actions lead to desired outcomes, considerably enhances the action value compared to outcome prediction, and the degree of improvement depends on action costs and the outcome model. This analysis emphasizes the need to go beyond generic outcome prediction in interventional settings by incorporating knowledge of plausible actions and latent states.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Restarted Nonnegativity Preserving Tensor Splitting Methods via Relaxed Anderson Acceleration for Solving Multi-linear Systems
Authors:
Dongdong Liu Ting Hua nd Xifu Liu
Abstract:
Multilinear systems play an important role in scientific calculations of practical problems. In this paper, we consider a tensor splitting method with a relaxed Anderson acceleration for solving multilinear systems. The new method preserves nonnegativity for every iterative step and improves the existing ones. Furthermore, the convergence analysis of the proposed method is given. The new algorithm…
▽ More
Multilinear systems play an important role in scientific calculations of practical problems. In this paper, we consider a tensor splitting method with a relaxed Anderson acceleration for solving multilinear systems. The new method preserves nonnegativity for every iterative step and improves the existing ones. Furthermore, the convergence analysis of the proposed method is given. The new algorithm performs effectively for numerical experiments.
△ Less
Submitted 17 October, 2024; v1 submitted 19 November, 2022;
originally announced November 2022.
-
Lost in Translation: Reimagining the Machine Learning Life Cycle in Education
Authors:
Lydia T. Liu,
Serena Wang,
Tolani Britton,
Rediet Abebe
Abstract:
Machine learning (ML) techniques are increasingly prevalent in education, from their use in predicting student dropout, to assisting in university admissions, and facilitating the rise of MOOCs. Given the rapid growth of these novel uses, there is a pressing need to investigate how ML techniques support long-standing education principles and goals. In this work, we shed light on this complex lands…
▽ More
Machine learning (ML) techniques are increasingly prevalent in education, from their use in predicting student dropout, to assisting in university admissions, and facilitating the rise of MOOCs. Given the rapid growth of these novel uses, there is a pressing need to investigate how ML techniques support long-standing education principles and goals. In this work, we shed light on this complex landscape drawing on qualitative insights from interviews with education experts. These interviews comprise in-depth evaluations of ML for education (ML4Ed) papers published in preeminent applied ML conferences over the past decade. Our central research goal is to critically examine how the stated or implied education and societal objectives of these papers are aligned with the ML problems they tackle. That is, to what extent does the technical problem formulation, objectives, approach, and interpretation of results align with the education problem at hand. We find that a cross-disciplinary gap exists and is particularly salient in two parts of the ML life cycle: the formulation of an ML problem from education goals and the translation of predictions to interventions. We use these insights to propose an extended ML life cycle, which may also apply to the use of ML in other domains. Our work joins a growing number of meta-analytical studies across education and ML research, as well as critical analyses of the societal impact of ML. Specifically, it fills a gap between the prevailing technical understanding of machine learning and the perspective of education researchers working with students and in policy.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Strategic Ranking
Authors:
Lydia T. Liu,
Nikhil Garg,
Christian Borgs
Abstract:
Strategic classification studies the design of a classifier robust to the manipulation of input by strategic individuals. However, the existing literature does not consider the effect of competition among individuals as induced by the algorithm design. Motivated by constrained allocation settings such as college admissions, we introduce strategic ranking, in which the (designed) individual reward…
▽ More
Strategic classification studies the design of a classifier robust to the manipulation of input by strategic individuals. However, the existing literature does not consider the effect of competition among individuals as induced by the algorithm design. Motivated by constrained allocation settings such as college admissions, we introduce strategic ranking, in which the (designed) individual reward depends on an applicant's post-effort rank in a measurement of interest. Our results illustrate how competition among applicants affects the resulting equilibria and model insights. We analyze how various ranking reward designs, belonging to a family of step functions, trade off applicant, school, and societal utility, as well as how ranking design counters inequities arising from disparate access to resources. In particular, we find that randomization in the reward design can mitigate two measures of disparate impact, welfare gap and access.
△ Less
Submitted 21 February, 2022; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Bandit Learning in Decentralized Matching Markets
Authors:
Lydia T. Liu,
Feng Ruan,
Horia Mania,
Michael I. Jordan
Abstract:
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player s…
▽ More
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon $T$, attains $\mathcal{O}(\log(T))$ stable regret when preferences of the arms over players are shared, and $\mathcal{O}(\log(T)^2)$ regret when there are no assumptions on the preferences on either side. Moreover, in the setting where a single player may deviate, we show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are fully general.
△ Less
Submitted 21 June, 2021; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Balancing Competing Objectives with Noisy Data: Score-Based Classifiers for Welfare-Aware Machine Learning
Authors:
Esther Rolf,
Max Simchowitz,
Sarah Dean,
Lydia T. Liu,
Daniel Björkegren,
Moritz Hardt,
Joshua Blumenstock
Abstract:
While real-world decisions involve many competing objectives, algorithmic decisions are often evaluated with a single objective function. In this paper, we study algorithmic policies which explicitly trade off between a private objective (such as profit) and a public objective (such as social welfare). We analyze a natural class of policies which trace an empirical Pareto frontier based on learned…
▽ More
While real-world decisions involve many competing objectives, algorithmic decisions are often evaluated with a single objective function. In this paper, we study algorithmic policies which explicitly trade off between a private objective (such as profit) and a public objective (such as social welfare). We analyze a natural class of policies which trace an empirical Pareto frontier based on learned scores, and focus on how such decisions can be made in noisy or data-limited regimes. Our theoretical results characterize the optimal strategies in this class, bound the Pareto errors due to inaccuracies in the scores, and show an equivalence between optimal strategies and a rich class of fairness-constrained profit-maximizing policies. We then present empirical results in two different contexts -- online content recommendation and sustainable abalone fisheries -- to underscore the applicability of our approach to a wide range of practical decisions. Taken together, these results shed light on inherent trade-offs in using machine learning for decisions that impact social welfare.
△ Less
Submitted 15 July, 2020; v1 submitted 14 March, 2020;
originally announced March 2020.
-
The Disparate Equilibria of Algorithmic Decision Making when Individuals Invest Rationally
Authors:
Lydia T. Liu,
Ashia Wilson,
Nika Haghtalab,
Adam Tauman Kalai,
Christian Borgs,
Jennifer Chayes
Abstract:
The long-term impact of algorithmic decision making is shaped by the dynamics between the deployed decision rule and individuals' response. Focusing on settings where each individual desires a positive classification---including many important applications such as hiring and school admissions, we study a dynamic learning setting where individuals invest in a positive outcome based on their group's…
▽ More
The long-term impact of algorithmic decision making is shaped by the dynamics between the deployed decision rule and individuals' response. Focusing on settings where each individual desires a positive classification---including many important applications such as hiring and school admissions, we study a dynamic learning setting where individuals invest in a positive outcome based on their group's expected gain and the decision rule is updated to maximize institutional benefit. By characterizing the equilibria of these dynamics, we show that natural challenges to desirable long-term outcomes arise due to heterogeneity across groups and the lack of realizability. We consider two interventions, decoupling the decision rule by group and subsidizing the cost of investment. We show that decoupling achieves optimal outcomes in the realizable case but has discrepant effects that may depend on the initial conditions otherwise. In contrast, subsidizing the cost of investment is shown to create better equilibria for the disadvantaged group even in the absence of realizability.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Competing Bandits in Matching Markets
Authors:
Lydia T. Liu,
Horia Mania,
Michael I. Jordan
Abstract:
Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the…
▽ More
Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards. Our model extends the standard multi-armed bandits framework to multiple players, with the added feature that arms have preferences over players. We study both centralized and decentralized approaches to this problem and show surprising exploration-exploitation trade-offs compared to the single player multi-armed bandits setting.
△ Less
Submitted 12 July, 2020; v1 submitted 12 June, 2019;
originally announced June 2019.
-
Steerable $e$PCA: Rotationally Invariant Exponential Family PCA
Authors:
Zhizhen Zhao,
Lydia T. Liu,
Amit Singer
Abstract:
In photon-limited imaging, the pixel intensities are affected by photon count noise. Many applications, such as 3-D reconstruction using correlation analysis in X-ray free electron laser (XFEL) single molecule imaging, require an accurate estimation of the covariance of the underlying 2-D clean images. Accurate estimation of the covariance from low-photon count images must take into account that p…
▽ More
In photon-limited imaging, the pixel intensities are affected by photon count noise. Many applications, such as 3-D reconstruction using correlation analysis in X-ray free electron laser (XFEL) single molecule imaging, require an accurate estimation of the covariance of the underlying 2-D clean images. Accurate estimation of the covariance from low-photon count images must take into account that pixel intensities are Poisson distributed, hence the classical sample covariance estimator is sub-optimal. Moreover, in single molecule imaging, including in-plane rotated copies of all images could further improve the accuracy of covariance estimation. In this paper we introduce an efficient and accurate algorithm for covariance matrix estimation of count noise 2-D images, including their uniform planar rotations and possibly reflections. Our procedure, steerable $e$PCA, combines in a novel way two recently introduced innovations. The first is a methodology for principal component analysis (PCA) for Poisson distributions, and more generally, exponential family distributions, called $e$PCA. The second is steerable PCA, a fast and accurate procedure for including all planar rotations for PCA. The resulting principal components are invariant to the rotation and reflection of the input images. We demonstrate the efficiency and accuracy of steerable $e$PCA in numerical experiments involving simulated XFEL datasets and rotated Yale B face data.
△ Less
Submitted 17 December, 2019; v1 submitted 20 December, 2018;
originally announced December 2018.
-
The implicit fairness criterion of unconstrained learning
Authors:
Lydia T. Liu,
Max Simchowitz,
Moritz Hardt
Abstract:
We clarify what fairness guarantees we can and cannot expect to follow from unconstrained machine learning. Specifically, we characterize when unconstrained learning on its own implies group calibration, that is, the outcome variable is conditionally independent of group membership given the score. We show that under reasonable conditions, the deviation from satisfying group calibration is upper b…
▽ More
We clarify what fairness guarantees we can and cannot expect to follow from unconstrained machine learning. Specifically, we characterize when unconstrained learning on its own implies group calibration, that is, the outcome variable is conditionally independent of group membership given the score. We show that under reasonable conditions, the deviation from satisfying group calibration is upper bounded by the excess risk of the learned score relative to the Bayes optimal score function. A lower bound confirms the optimality of our upper bound. Moreover, we prove that as the excess risk of the learned score decreases, it strongly violates separation and independence, two other standard fairness criteria.
Our results show that group calibration is the fairness criterion that unconstrained learning implicitly favors. On the one hand, this means that calibration is often satisfied on its own without the need for active intervention, albeit at the cost of violating other criteria that are at odds with calibration. On the other hand, it suggests that we should be satisfied with calibration as a fairness criterion only if we are at ease with the use of unconstrained machine learning in a given application.
△ Less
Submitted 25 January, 2019; v1 submitted 29 August, 2018;
originally announced August 2018.
-
On the Local Minima of the Empirical Risk
Authors:
Chi Jin,
Lydia T. Liu,
Rong Ge,
Michael I. Jordan
Abstract:
Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally significantly more well-behaved from an optimization point of view than the empirical risk. In particular, sampling can create many spurious loca…
▽ More
Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally significantly more well-behaved from an optimization point of view than the empirical risk. In particular, sampling can create many spurious local minima. We consider a general framework which aims to optimize a smooth nonconvex function $F$ (population risk) given only access to an approximation $f$ (empirical risk) that is pointwise close to $F$ (i.e., $\|F-f\|_{\infty} \le ν$). Our objective is to find the $ε$-approximate local minima of the underlying function $F$ while avoiding the shallow local minima---arising because of the tolerance $ν$---which exist only in $f$. We propose a simple algorithm based on stochastic gradient descent (SGD) on a smoothed version of $f$ that is guaranteed to achieve our goal as long as $ν\le O(ε^{1.5}/d)$. We also provide an almost matching lower bound showing that our algorithm achieves optimal error tolerance $ν$ among all algorithms making a polynomial number of queries of $f$. As a concrete example, we show that our results can be directly used to give sample complexities for learning a ReLU unit.
△ Less
Submitted 17 October, 2018; v1 submitted 25 March, 2018;
originally announced March 2018.
-
Delayed Impact of Fair Machine Learning
Authors:
Lydia T. Liu,
Sarah Dean,
Esther Rolf,
Max Simchowitz,
Moritz Hardt
Abstract:
Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.
We study how static fairness criteria interact with temporal indicators of well-being, such as long-term…
▽ More
Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect.
We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not.
We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably.
Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.
△ Less
Submitted 7 April, 2018; v1 submitted 12 March, 2018;
originally announced March 2018.
-
$e$PCA: High Dimensional Exponential Family PCA
Authors:
Lydia T. Liu,
Edgar Dobriban,
Amit Singer
Abstract:
Many applications, such as photon-limited imaging and genomics, involve large datasets with noisy entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient when the noise is non-Gaussian.
We develop $e$PCA…
▽ More
Many applications, such as photon-limited imaging and genomics, involve large datasets with noisy entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient when the noise is non-Gaussian.
We develop $e$PCA (exponential family PCA), a new methodology for PCA on exponential family distributions. $e$PCA can be used for dimensionality reduction and denoising of large data matrices. $e$PCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a simple and deterministic way using moment calculations, shrinkage, and random matrix theory.
We provide several theoretical justifications for our estimator, including the finite-sample convergence rate, and the Marchenko-Pastur law in high dimensions. $e$PCA compares favorably to PCA and various PCA alternatives for exponential families, in simulations as well as in XFEL and SNP data analysis. An open-source implementation is available.
△ Less
Submitted 6 March, 2017; v1 submitted 16 November, 2016;
originally announced November 2016.
-
Dielectron production in Ar+KCl collisions at 1.76A GeV
Authors:
G. Agakishiev,
A. Balanda,
D. Belver,
A. Belyaev,
A. Blanco,
M. Böhmer,
J. L. Boyard,
P. Cabanelas,
E. Castro,
S. Chernenko,
T. Christ,
M. Destefanis,
F. Dohrmann,
A. Dybczak,
T. Eberl,
E. Epple,
L. Fabbbietti,
O. Fateev,
P. Finocchiaro,
P. Fonte,
J. Friese,
I. Fröhlich,
T. Galatyuk,
J. A. Garzon,
R. Gernhäuser
, et al. (76 additional authors not shown)
Abstract:
We present results on dielectron production in Ar+KCl collisions at 1.76A GeV. For the first time $ω$ mesons could be reconstructed in a heavy-ion reaction at a bombarding energy which is well below the production threshold in free nucleon-nucleon collisions. The omega multiplicity has been extracted and compared to the yields of other particles, in particular of the phi meson. At intermediate e+e…
▽ More
We present results on dielectron production in Ar+KCl collisions at 1.76A GeV. For the first time $ω$ mesons could be reconstructed in a heavy-ion reaction at a bombarding energy which is well below the production threshold in free nucleon-nucleon collisions. The omega multiplicity has been extracted and compared to the yields of other particles, in particular of the phi meson. At intermediate e+e- invariant masses, we find a strong enhancement of the pair yield over a reference spectrum from elementary nucleon-nucleon reactions suggesting the onset of non-trivial effects of the nuclear medium. Transverse-mass spectra and angular distributions have been reconstructed in three invariant mass bins. In the former unexpectedly large slopes are found for high-mass pairs. The latter, in particular the helicity-angle distributions, are largely consistent with expectations for a pair cocktail dominated at intermediate masses by delta Dalitz decays.
△ Less
Submitted 24 June, 2011; v1 submitted 4 March, 2011;
originally announced March 2011.