-
Fair-OBNC: Correcting Label Noise for Fairer Datasets
Authors:
Inês Oliveira e Silva,
Sérgio Jesus,
Hugo Ferreira,
Pedro Saleiro,
Inês Sousa,
Pedro Bizarro,
Carlos Soares
Abstract:
Data used by automated decision-making systems, such as Machine Learning models, often reflects discriminatory behavior that occurred in the past. These biases in the training data are sometimes related to label noise, such as in COMPAS, where more African-American offenders are wrongly labeled as having a higher risk of recidivism when compared to their White counterparts. Models trained on such…
▽ More
Data used by automated decision-making systems, such as Machine Learning models, often reflects discriminatory behavior that occurred in the past. These biases in the training data are sometimes related to label noise, such as in COMPAS, where more African-American offenders are wrongly labeled as having a higher risk of recidivism when compared to their White counterparts. Models trained on such biased data may perpetuate or even aggravate the biases with respect to sensitive information, such as gender, race, or age. However, while multiple label noise correction approaches are available in the literature, these focus on model performance exclusively. In this work, we propose Fair-OBNC, a label noise correction method with fairness considerations, to produce training datasets with measurable demographic parity. The presented method adapts Ordering-Based Noise Correction, with an adjusted criterion of ordering, based both on the margin of error of an ensemble, and the potential increase in the observed demographic parity of the dataset. We evaluate Fair-OBNC against other different pre-processing techniques, under different scenarios of controlled label noise. Our results show that the proposed method is the overall better alternative within the pool of label correction methods, being capable of attaining better reconstructions of the original labels. Models trained in the corrected data have an increase, on average, of 150% in demographic parity, when compared to models trained in data with noisy labels, across the considered levels of label noise.
△ Less
Submitted 14 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Show Me What's Wrong!: Combining Charts and Text to Guide Data Analysis
Authors:
Beatriz Feliciano,
Rita Costa,
Jean Alves,
Javier Liébana,
Diogo Duarte,
Pedro Bizarro
Abstract:
Analyzing and finding anomalies in multi-dimensional datasets is a cumbersome but vital task across different domains. In the context of financial fraud detection, analysts must quickly identify suspicious activity among transactional data. This is an iterative process made of complex exploratory tasks such as recognizing patterns, grouping, and comparing. To mitigate the information overload inhe…
▽ More
Analyzing and finding anomalies in multi-dimensional datasets is a cumbersome but vital task across different domains. In the context of financial fraud detection, analysts must quickly identify suspicious activity among transactional data. This is an iterative process made of complex exploratory tasks such as recognizing patterns, grouping, and comparing. To mitigate the information overload inherent to these steps, we present a tool combining automated information highlights, Large Language Model generated textual insights, and visual analytics, facilitating exploration at different levels of detail. We perform a segmentation of the data per analysis area and visually represent each one, making use of automated visual cues to signal which require more attention. Upon user selection of an area, our system provides textual and graphical summaries. The text, acting as a link between the high-level and detailed views of the chosen segment, allows for a quick understanding of relevant details. A thorough exploration of the data comprising the selection can be done through graphical representations. The feedback gathered in a study performed with seven domain experts suggests our tool effectively supports and guides exploratory analysis, easing the identification of suspicious information.
△ Less
Submitted 2 October, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
RIFF: Inducing Rules for Fraud Detection from Decision Trees
Authors:
João Lucas Martins,
João Bravo,
Ana Sofia Gomes,
Carlos Soares,
Pedro Bizarro
Abstract:
Financial fraud is the cause of multi-billion dollar losses annually. Traditionally, fraud detection systems rely on rules due to their transparency and interpretability, key features in domains where decisions need to be explained. However, rule systems require significant input from domain experts to create and tune, an issue that rule induction algorithms attempt to mitigate by inferring rules…
▽ More
Financial fraud is the cause of multi-billion dollar losses annually. Traditionally, fraud detection systems rely on rules due to their transparency and interpretability, key features in domains where decisions need to be explained. However, rule systems require significant input from domain experts to create and tune, an issue that rule induction algorithms attempt to mitigate by inferring rules directly from data. We explore the application of these algorithms to fraud detection, where rule systems are constrained to have a low false positive rate (FPR) or alert rate, by proposing RIFF, a rule induction algorithm that distills a low FPR rule set directly from decision trees. Our experiments show that the induced rules are often able to maintain or improve performance of the original models for low FPR tasks, while substantially reducing their complexity and outperforming rules hand-tuned by experts.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Deep-Graph-Sprints: Accelerated Representation Learning in Continuous-Time Dynamic Graphs
Authors:
Ahmad Naser Eddin,
Jacopo Bono,
David Aparício,
Hugo Ferreira,
Pedro Ribeiro,
Pedro Bizarro
Abstract:
Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Traditional methods for extracting knowledge from these graphs often depend on feature engineering or deep learning. Feature engineering is limited by the manual and time-intensive nature of crafting features, while deep learning approaches suffer from high inference latency, making them impractical…
▽ More
Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Traditional methods for extracting knowledge from these graphs often depend on feature engineering or deep learning. Feature engineering is limited by the manual and time-intensive nature of crafting features, while deep learning approaches suffer from high inference latency, making them impractical for real-time applications. This paper introduces Deep-Graph-Sprints (DGS), a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements. We benchmark DGS against state-of-the-art feature engineering and graph neural network methods using five diverse datasets. The results indicate that DGS achieves competitive performance while improving inference speed up to 12x compared to other deep learning approaches on our tested benchmarks. Our method effectively bridges the gap between deep representation learning and low-latency application requirements for CTDGs.
△ Less
Submitted 23 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Aequitas Flow: Streamlining Fair ML Experimentation
Authors:
Sérgio Jesus,
Pedro Saleiro,
Inês Oliveira e Silva,
Beatriz M. Jorge,
Rita P. Ribeiro,
João Gama,
Pedro Bizarro,
Rayid Ghani
Abstract:
Aequitas Flow is an open-source framework for end-to-end Fair Machine Learning (ML) experimentation in Python. This package fills the existing integration gaps in other Fair ML packages of complete and accessible experimentation. It provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling rapid and simple experiments and result analysis. Aimed at…
▽ More
Aequitas Flow is an open-source framework for end-to-end Fair Machine Learning (ML) experimentation in Python. This package fills the existing integration gaps in other Fair ML packages of complete and accessible experimentation. It provides a pipeline for fairness-aware model training, hyperparameter optimization, and evaluation, enabling rapid and simple experiments and result analysis. Aimed at ML practitioners and researchers, the framework offers implementations of methods, datasets, metrics, and standard interfaces for these components to improve extensibility. By facilitating the development of fair ML practices, Aequitas Flow seeks to enhance the adoption of these concepts in AI technologies.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Cost-Sensitive Learning to Defer to Multiple Experts with Workload Constraints
Authors:
Jean V. Alves,
Diogo Leitão,
Sérgio Jesus,
Marco O. P. Sampaio,
Javier Liébana,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent…
▽ More
Learning to defer (L2D) aims to improve human-AI collaboration systems by learning how to defer decisions to humans when they are more likely to be correct than an ML classifier. Existing research in L2D overlooks key real-world aspects that impede its practical adoption, namely: i) neglecting cost-sensitive scenarios, where type I and type II errors have different costs; ii) requiring concurrent human predictions for every instance of the training dataset; and iii) not dealing with human work-capacity constraints. To address these issues, we propose the \textit{deferral under cost and capacity constraints framework} (DeCCaF). DeCCaF is a novel L2D approach, employing supervised learning to model the probability of human error under less restrictive data requirements (only one expert prediction per instance) and using constraint programming to globally minimize the error cost, subject to workload limitations. We test DeCCaF in a series of cost-sensitive fraud detection scenarios with different teams of 9 synthetic fraud analysts, with individual work-capacity constraints. The results demonstrate that our approach performs significantly better than the baselines in a wide array of scenarios, achieving an average $8.4\%$ reduction in the misclassification cost. The code used for the experiments is available at https://github.com/feedzai/deccaf
△ Less
Submitted 19 August, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
DiConStruct: Causal Concept-based Explanations through Black-Box Distillation
Authors:
Ricardo Moreira,
Jacopo Bono,
Mário Cardoso,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predi…
▽ More
Model interpretability plays a central role in human-AI decision-making systems. Ideally, explanations should be expressed using human-interpretable semantic concepts. Moreover, the causal relations between these concepts should be captured by the explainer to allow for reasoning about the explanations. Lastly, explanation methods should be efficient and not compromise the performance of the predictive task. Despite the rapid advances in AI explainability in recent years, as far as we know to date, no method fulfills these three properties. Indeed, mainstream methods for local concept explainability do not produce causal explanations and incur a trade-off between explainability and prediction performance. We present DiConStruct, an explanation method that is both concept-based and causal, with the goal of creating more interpretable local explanations in the form of structural causal models and concept attributions. Our explainer works as a distillation model to any black-box machine learning model by approximating its predictions while producing the respective explanations. Because of this, DiConStruct generates explanations efficiently while not impacting the black-box prediction task. We validate our method on an image dataset and a tabular dataset, showing that DiConStruct approximates the black-box models with higher fidelity than other concept explainability baselines, while providing explanations that include the causal relations between the concepts.
△ Less
Submitted 26 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
FiFAR: A Fraud Detection Dataset for Learning to Defer
Authors:
Jean V. Alves,
Diogo Leitão,
Sérgio Jesus,
Marco O. P. Sampaio,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud det…
▽ More
Public dataset limitations have significantly hindered the development and benchmarking of learning to defer (L2D) algorithms, which aim to optimally combine human and AI capabilities in hybrid decision-making systems. In such systems, human availability and domain-specific concerns introduce difficulties, while obtaining human predictions for training and evaluation is costly. Financial fraud detection is a high-stakes setting where algorithms and human experts often work in tandem; however, there are no publicly available datasets for L2D concerning this important application of human-AI teaming. To fill this gap in L2D research, we introduce the Financial Fraud Alert Review Dataset (FiFAR), a synthetic bank account fraud detection dataset, containing the predictions of a team of 50 highly complex and varied synthetic fraud analysts, with varied bias and feature dependence. We also provide a realistic definition of human work capacity constraints, an aspect of L2D systems that is often overlooked, allowing for extensive testing of assignment systems under real-world conditions. We use our dataset to develop a capacity-aware L2D method and rejection learning approach under realistic data availability conditions, and benchmark these baselines under an array of 300 distinct testing scenarios. We believe that this dataset will serve as a pivotal instrument in facilitating a systematic, rigorous, reproducible, and transparent evaluation and comparison of L2D methods, thereby fostering the development of more synergistic human-AI collaboration in decision-making systems. The public dataset and detailed synthetic expert information are available at: https://github.com/feedzai/fifar-dataset
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Adversarial training for tabular data with attack propagation
Authors:
Tiago Leon Melo,
João Bravo,
Marco O. P. Sampaio,
Paolo Romano,
Hugo Ferreira,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintai…
▽ More
Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintainers. In such applications data is often tabular and the space available for attackers to manipulate undergoes complex feature engineering transformations, to provide useful signals for model training, to a space attackers cannot access. Thus, we propose a new form of adversarial training where attacks are propagated between the two spaces in the training loop. We then test this method empirically on a real world dataset in the domain of credit card fraud detection. We show that our method can prevent about 30% performance drops under moderate attacks and is essential under very aggressive attacks, with a trade-off loss in performance under no attacks smaller than 7%.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
The GANfather: Controllable generation of malicious activity to improve defence systems
Authors:
Ricardo Ribeiro Pereira,
Jacopo Bono,
João Tiago Ascensão,
David Aparício,
Pedro Ribeiro,
Pedro Bizarro
Abstract:
Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7--4 trillion euros are laundered annually and…
▽ More
Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7--4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.
△ Less
Submitted 25 July, 2023;
originally announced July 2023.
-
From random-walks to graph-sprints: a low-latency node embedding framework on continuous-time dynamic graphs
Authors:
Ahmad Naser Eddin,
Jacopo Bono,
David Aparício,
Hugo Ferreira,
João Ascensão,
Pedro Ribeiro,
Pedro Bizarro
Abstract:
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin…
▽ More
Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
△ Less
Submitted 16 February, 2024; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Fairness-Aware Data Valuation for Supervised Learning
Authors:
José Pombal,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework tha…
▽ More
Data valuation is a ML field that studies the value of training instances towards a given predictive task. Although data bias is one of the main sources of downstream model unfairness, previous work in data valuation does not consider how training instances may influence both performance and fairness of ML models. Thus, we propose Fairness-Aware Data vauatiOn (FADO), a data valuation framework that can be used to incorporate fairness concerns into a series of ML-related tasks (e.g., data pre-processing, exploratory data analysis, active learning). We propose an entropy-based data valuation metric suited to address our two-pronged goal of maximizing both performance and fairness, which is more computationally efficient than existing metrics. We then show how FADO can be applied as the basis for unfairness mitigation pre-processing techniques. Our methods achieve promising results -- up to a 40 p.p. improvement in fairness at a less than 1 p.p. loss in performance compared to a baseline -- and promote fairness in a data-centric way, where a deeper understanding of data quality takes center stage.
△ Less
Submitted 29 March, 2023;
originally announced March 2023.
-
Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation
Authors:
Sérgio Jesus,
José Pombal,
Duarte Alves,
André Cruz,
Pedro Saleiro,
Rita P. Ribeiro,
João Gama,
Pedro Bizarro
Abstract:
Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this…
▽ More
Evaluating new techniques on realistic datasets plays a crucial role in the development of ML research and its broader adoption by practitioners. In recent years, there has been a significant increase of publicly available unstructured data resources for computer vision and NLP tasks. However, tabular data -- which is prevalent in many high-stakes domains -- has been lagging behind. To bridge this gap, we present Bank Account Fraud (BAF), the first publicly available privacy-preserving, large-scale, realistic suite of tabular datasets. The suite was generated by applying state-of-the-art tabular data generation techniques on an anonymized,real-world bank account opening fraud detection dataset. This setting carries a set of challenges that are commonplace in real-world applications, including temporal dynamics and significant class imbalance. Additionally, to allow practitioners to stress test both performance and fairness of ML methods, each dataset variant of BAF contains specific types of data bias. With this resource, we aim to provide the research community with a more realistic, complete, and robust test bed to evaluate novel and existing methods.
△ Less
Submitted 28 November, 2022; v1 submitted 23 November, 2022;
originally announced November 2022.
-
LaundroGraph: Self-Supervised Graph Representation Learning for Anti-Money Laundering
Authors:
Mário Cardoso,
Pedro Saleiro,
Pedro Bizarro
Abstract:
Anti-money laundering (AML) regulations mandate financial institutions to deploy AML systems based on a set of rules that, when triggered, form the basis of a suspicious alert to be assessed by human analysts. Reviewing these cases is a cumbersome and complex task that requires analysts to navigate a large network of financial interactions to validate suspicious movements. Furthermore, these syste…
▽ More
Anti-money laundering (AML) regulations mandate financial institutions to deploy AML systems based on a set of rules that, when triggered, form the basis of a suspicious alert to be assessed by human analysts. Reviewing these cases is a cumbersome and complex task that requires analysts to navigate a large network of financial interactions to validate suspicious movements. Furthermore, these systems have very high false positive rates (estimated to be over 95\%). The scarcity of labels hinders the use of alternative systems based on supervised learning, reducing their applicability in real-world applications.
In this work we present LaundroGraph, a novel self-supervised graph representation learning approach to encode banking customers and financial transactions into meaningful representations. These representations are used to provide insights to assist the AML reviewing process, such as identifying anomalous movements for a given customer. LaundroGraph represents the underlying network of financial interactions as a customer-transaction bipartite graph and trains a graph neural network on a fully self-supervised link prediction task. We empirically demonstrate that our approach outperforms other strong baselines on self-supervised link prediction using a real-world dataset, improving the best non-graph baseline by $12$ p.p. of AUC. The goal is to increase the efficiency of the reviewing process by supplying these AI-powered insights to the analysts upon review. To the best of our knowledge, this is the first fully self-supervised system within the context of AML detection.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
FairGBM: Gradient Boosting with Fairness Constraints
Authors:
André F Cruz,
Catarina Belém,
Sérgio Jesus,
João Bravo,
Pedro Saleiro,
Pedro Bizarro
Abstract:
Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance loss…
▽ More
Tabular data is prevalent in many high-stakes domains, such as financial services or public policy. Gradient Boosted Decision Trees (GBDT) are popular in these settings due to their scalability, performance, and low training cost. While fairness in these domains is a foremost concern, existing in-processing Fair ML methods are either incompatible with GBDT, or incur in significant performance losses while taking considerably longer to train. We present FairGBM, a dual ascent learning framework for training GBDT under fairness constraints, with little to no impact on predictive performance when compared to unconstrained GBDT. Since observational fairness metrics are non-differentiable, we propose smooth convex error rate proxies for common fairness criteria, enabling gradient-based optimization using a ``proxy-Lagrangian'' formulation. Our implementation shows an order of magnitude speedup in training time relative to related work, a pivotal aspect to foster the widespread adoption of FairGBM by real-world practitioners.
△ Less
Submitted 3 March, 2023; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Lightweight Automated Feature Monitoring for Data Streams
Authors:
João Conde,
Ricardo Moreira,
João Torres,
Pedro Cardoso,
Hugo R. C. Ferreira,
Marco O. P. Sampaio,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and co…
▽ More
Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and constant memory footprint and a small computational cost in streaming applications. The method is based on a multi-variate statistical test and is data driven by design (full reference distributions are estimated from the data). It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs (to aid in root cause analysis). The computational and memory lightness of the system results from the use of Exponential Moving Histograms. In our experimental study, we analyze the system's behavior with its parameters and, more importantly, show examples where it detects problems that are not directly related to a single feature. This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.
△ Less
Submitted 19 July, 2022; v1 submitted 18 July, 2022;
originally announced July 2022.
-
Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions
Authors:
José Pombal,
André F. Cruz,
João Bravo,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial…
▽ More
In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial services, for example. The awareness of this problem has given rise to the field of Fair ML, which focuses on studying, measuring, and mitigating unfairness in algorithmic prediction, with respect to a set of protected groups (e.g., race or gender). However, the underlying causes for algorithmic unfairness still remain elusive, with researchers divided between blaming either the ML algorithms or the data they are trained on. In this work, we maintain that algorithmic unfairness stems from interactions between models and biases in the data, rather than from isolated contributions of either of them. To this end, we propose a taxonomy to characterize data bias and we study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings. On our real-world account-opening fraud use case, we find that each setting entails specific trade-offs, affecting fairness in expected value and variance -- the latter often going unnoticed. Moreover, we show how algorithms compare differently in terms of accuracy and fairness, depending on the biases affecting the data. Finally, we note that under specific data bias conditions, simple pre-processing interventions can successfully balance group-wise error rates, while the same techniques fail in more complex settings.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods
Authors:
Kasun Amarasinghe,
Kit T. Rodolfa,
Sérgio Jesus,
Valerie Chen,
Vladimir Balayan,
Pedro Saleiro,
Pedro Bizarro,
Ameet Talwalkar,
Rayid Ghani
Abstract:
Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility. In this work, we seek to bridge this gap by conducting a study that evaluates thre…
▽ More
Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in their design, resulting in limited conclusions of methods' real-world utility. In this work, we seek to bridge this gap by conducting a study that evaluates three popular explainable ML methods in a setting consistent with the intended deployment context. We build on a previous study on e-commerce fraud detection and make crucial modifications to its setup relaxing the simplifying assumptions made in the original work that departed from the deployment context. In doing so, we draw drastically different conclusions from the earlier work and find no evidence for the incremental utility of the tested methods in the task. Our results highlight how seemingly trivial experimental design choices can yield misleading conclusions, with lessons about the necessity of not only evaluating explainable ML methods using tasks, data, users, and metrics grounded in the intended deployment contexts but also developing methods tailored to specific applications. In addition, we believe the design of this experiment can serve as a template for future study designs evaluating explainable ML methods in other real-world contexts.
△ Less
Submitted 21 February, 2023; v1 submitted 24 June, 2022;
originally announced June 2022.
-
Human-AI Collaboration in Decision-Making: Beyond Learning to Defer
Authors:
Diogo Leitão,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements…
▽ More
Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements, such as the availability of predictions from humans for every instance or ground-truth labels that are independent from said humans. Furthermore, neither L2D nor alternative approaches tackle fundamental issues of deploying HAIC systems in real-world settings, such as capacity management or dealing with dynamic environments. In this paper, we aim to identify and review these and other limitations, pointing to where opportunities for future research in HAIC may lie.
△ Less
Submitted 13 July, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Prisoners of Their Own Devices: How Models Induce Data Bias in Performative Prediction
Authors:
José Pombal,
Pedro Saleiro,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases o…
▽ More
The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society. Much work has been devoted to measuring unfairness in static ML environments, but not in dynamic, performative prediction ones, in which most real-world use cases operate. In the latter, the predictive model itself plays a pivotal role in shaping the distribution of the data. However, little attention has been heeded to relating unfairness to these interactions. Thus, to further the understanding of unfairness in these settings, we propose a taxonomy to characterize bias in the data, and study cases where it is shaped by model behaviour. Using a real-world account opening fraud detection case study as an example, we study the dangers to both performance and fairness of two typical biases in performative prediction: distribution shifts, and the problem of selective labels.
△ Less
Submitted 27 June, 2022;
originally announced June 2022.
-
ConceptDistil: Model-Agnostic Distillation of Concept Explanations
Authors:
João Bento Sousa,
Ricardo Moreira,
Vladimir Balayan,
Pedro Saleiro,
Pedro Bizarro
Abstract:
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. Previous work has focused on providing concepts for specific models (eg, neural networks) or data types (eg, images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDis…
▽ More
Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. Previous work has focused on providing concepts for specific models (eg, neural networks) or data types (eg, images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation. ConceptDistil is decomposed into two components:(1) a concept model that predicts which domain concepts are present in a given instance, and (2) a distillation model that tries to mimic the predictions of a black-box model using the concept model predictions. We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks, bringing concept-explainability to any black-box model.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
Data+Shift: Supporting visual investigation of data distribution shifts by data scientists
Authors:
João Palmeiro,
Beatriz Malveiro,
Rita Costa,
David Polido,
Ricardo Moreira,
Pedro Bizarro
Abstract:
Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+S…
▽ More
Machine learning on data streams is increasingly more present in multiple domains. However, there is often data distribution shift that can lead machine learning models to make incorrect decisions. While there are automatic methods to detect when drift is happening, human analysis, often by data scientists, is essential to diagnose the causes of the problem and adjust the system. We propose Data+Shift, a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features in the context of fraud detection. Design requirements were derived from interviews with data scientists. Data+Shift is integrated with JupyterLab and can be used alongside other data science tools. We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Anti-Money Laundering Alert Optimization Using Machine Learning with Graphs
Authors:
Ahmad Naser Eddin,
Jacopo Bono,
David Aparício,
David Polido,
João Tiago Ascensão,
Pedro Bizarro,
Pedro Ribeiro
Abstract:
Money laundering is a global problem that concerns legitimizing proceeds from serious felonies (1.7-4 trillion euros annually) such as drug dealing, human trafficking, or corruption. The anti-money laundering systems deployed by financial institutions typically comprise rules aligned with regulatory frameworks. Human investigators review the alerts and report suspicious cases. Such systems suffer…
▽ More
Money laundering is a global problem that concerns legitimizing proceeds from serious felonies (1.7-4 trillion euros annually) such as drug dealing, human trafficking, or corruption. The anti-money laundering systems deployed by financial institutions typically comprise rules aligned with regulatory frameworks. Human investigators review the alerts and report suspicious cases. Such systems suffer from high false-positive rates, undermining their effectiveness and resulting in high operational costs. We propose a machine learning triage model, which complements the rule-based system and learns to predict the risk of an alert accurately. Our model uses both entity-centric engineered features and attributes characterizing inter-entity relations in the form of graph-based features. We leverage time windows to construct the dynamic graph, optimizing for time and space efficiency. We validate our model on a real-world banking dataset and show how the triage model can reduce the number of false positives by 80% while detecting over 90% of true positives. In this way, our model can significantly improve anti-money laundering operations.
△ Less
Submitted 17 June, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
GUDIE: a flexible, user-defined method to extract subgraphs of interest from large graphs
Authors:
Maria Inês Silva,
David Aparício,
Beatriz Malveiro,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for…
▽ More
Large, dense, small-world networks often emerge from social phenomena, including financial networks, social media, or epidemiology. As networks grow in importance, it is often necessary to partition them into meaningful units of analysis. In this work, we propose GUDIE, a message-passing algorithm that extracts relevant context around seed nodes based on user-defined criteria. We design GUDIE for rich, labeled graphs, and expansions consider node and edge attributes. Preliminary results indicate that GUDIE expands to insightful areas while avoiding unimportant connections. The resulting subgraphs contain the relevant context for a seed node and can accelerate and extend analysis capabilities in finance and other critical networks.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Finding NeMo: Fishing in banking networks using network motifs
Authors:
Xavier Fontes,
David Aparício,
Maria Inês Silva,
Beatriz Malveiro,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a no…
▽ More
Banking fraud causes billion-dollar losses for banks worldwide. In fraud detection, graphs help understand complex transaction patterns and discovering new fraud schemes. This work explores graph patterns in a real-world transaction dataset by extracting and analyzing its network motifs. Since banking graphs are heterogeneous, we focus on heterogeneous network motifs. Additionally, we propose a novel network randomization process that generates valid banking graphs. From our exploratory analysis, we conclude that network motifs extract insightful and interpretable patterns.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
Active learning for imbalanced data under cold start
Authors:
Ricardo Barata,
Miguel Leite,
Ricardo Pacheco,
Marco O. P. Sampaio,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Modern systems that rely on Machine Learning (ML) for predictive modelling, may suffer from the cold-start problem: supervised models work well but, initially, there are no labels, which are costly or slow to obtain. This problem is even worse in imbalanced data scenarios, where labels of the positive class take longer to accumulate. We propose an Active Learning (AL) system for datasets with orde…
▽ More
Modern systems that rely on Machine Learning (ML) for predictive modelling, may suffer from the cold-start problem: supervised models work well but, initially, there are no labels, which are costly or slow to obtain. This problem is even worse in imbalanced data scenarios, where labels of the positive class take longer to accumulate. We propose an Active Learning (AL) system for datasets with orders of magnitude of class imbalance, in a cold start streaming scenario. We present a computationally efficient Outlier-based Discriminative AL approach (ODAL) and design a novel 3-stage sequence of AL labeling policies where ODAL is used as warm-up. Then, we perform empirical studies in four real world datasets, with various magnitudes of class imbalance. The results show that our method can more quickly reach a high performance model than standard AL policies without ODAL warm-up. Its observed gains over random sampling can reach 80% and be competitive with policies with an unlimited annotation budget or additional historical data (using just 2% to 10% of the labels).
△ Less
Submitted 22 October, 2021; v1 submitted 16 July, 2021;
originally announced July 2021.
-
Railgun: managing large streaming windows under MAD requirements
Authors:
Ana Sofia Gomes,
João Oliveirinha,
Pedro Cardoso,
Pedro Bizarro
Abstract:
Some mission critical systems, e.g., fraud detection, require accurate, real-time metrics over long time sliding windows on applications that demand high throughput and low latencies. As these applications need to run 'forever' and cope with large, spiky data loads, they further require to be run in a distributed setting. We are unaware of any streaming system that provides all those properties. I…
▽ More
Some mission critical systems, e.g., fraud detection, require accurate, real-time metrics over long time sliding windows on applications that demand high throughput and low latencies. As these applications need to run 'forever' and cope with large, spiky data loads, they further require to be run in a distributed setting. We are unaware of any streaming system that provides all those properties. Instead, existing systems take large simplifications, such as implementing sliding windows as a fixed set of overlapping windows, jeopardizing metric accuracy (violating regulatory rules) or latency (breaching service agreements). In this paper, we propose Railgun, a fault-tolerant, elastic, and distributed streaming system supporting real-time sliding windows for scenarios requiring high loads and millisecond-level latencies. We benchmarked an initial prototype of Railgun using real data, showing significant lower latency than Flink and low memory usage independent of window size. Further, we show that Railgun scales nearly linearly, respecting our msec-level latencies at high percentiles (<250ms @ 99.9%) even under a load of 1 million events per second.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Weakly Supervised Multi-task Learning for Concept-based Explainability
Authors:
Catarina Belém,
Vladimir Balayan,
Pedro Saleiro,
Pedro Bizarro
Abstract:
In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations based on model features. To obtain faithful concept-based explanations, we leverage multi-task learning to train a neural network that jointly learns to predict…
▽ More
In ML-aided decision-making tasks, such as fraud detection or medical diagnosis, the human-in-the-loop, usually a domain-expert without technical ML knowledge, prefers high-level concept-based explanations instead of low-level explanations based on model features. To obtain faithful concept-based explanations, we leverage multi-task learning to train a neural network that jointly learns to predict a decision task based on the predictions of a precedent explainability task (i.e., multi-label concepts). There are two main challenges to overcome: concept label scarcity and the joint learning. To address both, we propose to: i) use expert rules to generate a large dataset of noisy concept labels, and ii) apply two distinct multi-task learning strategies combining noisy and golden labels. We compare these strategies with a fully supervised approach in a real-world fraud detection application with few golden labels available for the explainability task. With improvements of 9.26% and of 417.8% at the explainability and decision tasks, respectively, our results show it is possible to improve performance at both tasks by combining labels of heterogeneous quality.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Promoting Fairness through Hyperparameter Optimization
Authors:
André F. Cruz,
Pedro Saleiro,
Catarina Belém,
Carlos Soares,
Pedro Bizarro
Abstract:
Considerable research effort has been guided towards algorithmic fairness but real-world adoption of bias reduction techniques is still scarce. Existing methods are either metric- or model-specific, require access to sensitive attributes at inference time, or carry high development or deployment costs. This work explores the unfairness that emerges when optimizing ML models solely for predictive p…
▽ More
Considerable research effort has been guided towards algorithmic fairness but real-world adoption of bias reduction techniques is still scarce. Existing methods are either metric- or model-specific, require access to sensitive attributes at inference time, or carry high development or deployment costs. This work explores the unfairness that emerges when optimizing ML models solely for predictive performance, and how to mitigate it with a simple and easily deployed intervention: fairness-aware hyperparameter optimization (HO). We propose and evaluate fairness-aware variants of three popular HO algorithms: Fair Random Search, Fair TPE, and Fairband. We validate our approach on a real-world bank account opening fraud case-study, as well as on three datasets from the fairness literature. Results show that, without extra training cost, it is feasible to find models with 111% mean fairness increase and just 6% decrease in performance when compared with fairness-blind HO.
△ Less
Submitted 11 October, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
GuiltyWalker: Distance to illicit nodes in the Bitcoin network
Authors:
Catarina Oliveira,
João Torres,
Maria Inês Silva,
David Aparício,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Money laundering is a global phenomenon with wide-reaching social and economic consequences. Cryptocurrencies are particularly susceptible due to the lack of control by authorities and their anonymity. Thus, it is important to develop new techniques to detect and prevent illicit cryptocurrency transactions. In our work, we propose new features based on the structure of the graph and past labels to…
▽ More
Money laundering is a global phenomenon with wide-reaching social and economic consequences. Cryptocurrencies are particularly susceptible due to the lack of control by authorities and their anonymity. Thus, it is important to develop new techniques to detect and prevent illicit cryptocurrency transactions. In our work, we propose new features based on the structure of the graph and past labels to boost the performance of machine learning methods to detect money laundering. Our method, GuiltyWalker, performs random walks on the bitcoin transaction graph and computes features based on the distance to illicit transactions. We combine these new features with features proposed by Weber et al. and observe an improvement of about 5pp regarding illicit classification. Namely, we observe that our proposed features are particularly helpful during a black market shutdown, where the algorithm by Weber et al. was low performing.
△ Less
Submitted 21 July, 2021; v1 submitted 10 February, 2021;
originally announced February 2021.
-
How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations
Authors:
Sérgio Jesus,
Catarina Belém,
Vladimir Balayan,
João Bento,
Pedro Saleiro,
Pedro Bizarro,
João Gama
Abstract:
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hur…
▽ More
There have been several research works proposing new Explainable AI (XAI) methods designed to generate model explanations having specific properties, or desiderata, such as fidelity, robustness, or human-interpretability. However, explanations are seldom evaluated based on their true practical impact on decision-making tasks. Without that assessment, explanations might be chosen that, in fact, hurt the overall performance of the combined system of ML model + end-users. This study aims to bridge this gap by proposing XAI Test, an application-grounded evaluation methodology tailored to isolate the impact of providing the end-user with different levels of information. We conducted an experiment following XAI Test to evaluate three popular post-hoc explanation methods -- LIME, SHAP, and TreeInterpreter -- on a real-world fraud detection task, with real data, a deployed ML model, and fraud analysts. During the experiment, we gradually increased the information provided to the fraud analysts in three stages: Data Only, i.e., just transaction data without access to model score nor explanations, Data + ML Model Score, and Data + ML Model Score + Explanations. Using strong statistical analysis, we show that, in general, these popular explainers have a worse impact than desired. Some of the conclusion highlights include: i) showing Data Only results in the highest decision accuracy and the slowest decision time among all variants tested, ii) all the explainers improve accuracy over the Data + ML Model Score variant but still result in lower accuracy when compared with Data Only; iii) LIME was the least preferred by users, probably due to its substantially lower variability of explanations from case to case.
△ Less
Submitted 22 January, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Wigner-Weyl description of massless Dirac plasmas
Authors:
J. L. Figueiredo,
J. P. S. Bizarro,
H. Terças
Abstract:
We derive a quantum kinetic model describing the dynamics of graphene electrons in phase space based on the Wigner--Weyl formalism. To take into account the quantum nature of the carriers, we make use of the quantum Liouville equation for the density matrix. By relating the density matrix elements with the Wigner function, the equation of motion for the latter is established, with the Coulomb inte…
▽ More
We derive a quantum kinetic model describing the dynamics of graphene electrons in phase space based on the Wigner--Weyl formalism. To take into account the quantum nature of the carriers, we make use of the quantum Liouville equation for the density matrix. By relating the density matrix elements with the Wigner function, the equation of motion for the latter is established, with the Coulomb interaction being introduced self-consistently (i.e., in the Hartree approximation). The long-wavelength limit for the plasmon dispersion relation is obtained, for both ungated and gated situations. As an application, we derive the corresponding fluid equations from first principles and discuss the correct value of the effective hydrodynamic mass of the carriers. This constitutes a crucial point in establishing the appropriate fluid description of Dirac electrons, thus paving the way to a more comprehensive description of graphene plasmonics.
△ Less
Submitted 25 August, 2021; v1 submitted 30 December, 2020;
originally announced December 2020.
-
Teaching the Machine to Explain Itself using Domain Knowledge
Authors:
Vladimir Balayan,
Pedro Saleiro,
Catarina Belém,
Ludwig Krippahl,
Pedro Bizarro
Abstract:
Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by developing explanation methods but there is sti…
▽ More
Machine Learning (ML) has been increasingly used to aid humans to make better and faster decisions. However, non-technical humans-in-the-loop struggle to comprehend the rationale behind model predictions, hindering trust in algorithmic decision-making systems. Considerable research work on AI explainability attempts to win back trust in AI systems by developing explanation methods but there is still no major breakthrough. At the same time, popular explanation methods (e.g., LIME, and SHAP) produce explanations that are very hard to understand for non-data scientist persona. To address this, we present JOEL, a neural network-based framework to jointly learn a decision-making task and associated explanations that convey domain knowledge. JOEL is tailored to human-in-the-loop domain experts that lack deep technical ML knowledge, providing high-level insights about the model's predictions that very much resemble the experts' own reasoning. Moreover, we collect the domain feedback from a pool of certified experts and use it to ameliorate the model (human teaching), hence promoting seamless and better suited explanations. Lastly, we resort to semantic mappings between legacy expert systems and domain taxonomies to automatically annotate a bootstrap training set, overcoming the absence of concept-based human annotations. We validate JOEL empirically on a real-world fraud detection dataset. We show that JOEL can generalize the explanations from the bootstrap dataset. Furthermore, obtained results indicate that human teaching can further improve the explanations prediction quality by approximately $13.57\%$.
△ Less
Submitted 27 November, 2020;
originally announced December 2020.
-
TimeSHAP: Explaining Recurrent Models through Sequence Perturbations
Authors:
João Bento,
Pedro Saleiro,
André F. Cruz,
Mário A. T. Figueiredo,
Pedro Bizarro
Abstract:
Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may b…
▽ More
Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the model's score; iv) notably high attribution to client's age, suggesting a potential discriminatory reasoning, later confirmed as higher false positive rates for older clients.
△ Less
Submitted 26 June, 2021; v1 submitted 30 November, 2020;
originally announced December 2020.
-
A Bandit-Based Algorithm for Fairness-Aware Hyperparameter Optimization
Authors:
André F. Cruz,
Pedro Saleiro,
Catarina Belém,
Carlos Soares,
Pedro Bizarro
Abstract:
Considerable research effort has been guided towards algorithmic fairness but there is still no major breakthrough. In practice, an exhaustive search over all possible techniques and hyperparameters is needed to find optimal fairness-accuracy trade-offs. Hence, coupled with the lack of tools for ML practitioners, real-world adoption of bias reduction methods is still scarce. To address this, we pr…
▽ More
Considerable research effort has been guided towards algorithmic fairness but there is still no major breakthrough. In practice, an exhaustive search over all possible techniques and hyperparameters is needed to find optimal fairness-accuracy trade-offs. Hence, coupled with the lack of tools for ML practitioners, real-world adoption of bias reduction methods is still scarce. To address this, we present Fairband, a bandit-based fairness-aware hyperparameter optimization (HO) algorithm. Fairband is conceptually simple, resource-efficient, easy to implement, and agnostic to both the objective metrics, model types and the hyperparameter space being explored. Moreover, by introducing fairness notions into HO, we enable seamless and efficient integration of fairness objectives into real-world ML pipelines. We compare Fairband with popular HO methods on four real-world decision-making datasets. We show that Fairband can efficiently navigate the fairness-accuracy trade-off through hyperparameter optimization. Furthermore, without extra training cost, it consistently finds configurations attaining substantially improved fairness at a comparatively small decrease in predictive accuracy.
△ Less
Submitted 22 October, 2020; v1 submitted 7 October, 2020;
originally announced October 2020.
-
BreachRadar: Automatic Detection of Points-of-Compromise
Authors:
Miguel Araujo,
Miguel Almeida,
Jaime Ferreira,
Luis Silva,
Pedro Bizarro
Abstract:
Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure.…
▽ More
Bank transaction fraud results in over $13B annual losses for banks, merchants, and card holders worldwide. Much of this fraud starts with a Point-of-Compromise (a data breach or a skimming operation) where credit and debit card digital information is stolen, resold, and later used to perform fraud. We introduce this problem and present an automatic Points-of-Compromise (POC) detection procedure. BreachRadar is a distributed alternating algorithm that assigns a probability of being compromised to the different possible locations. We implement this method using Apache Spark and show its linear scalability in the number of machines and transactions. BreachRadar is applied to two datasets with billions of real transaction records and fraud labels where we provide multiple examples of real Points-of-Compromise we are able to detect. We further show the effectiveness of our method when injecting Points-of-Compromise in one of these datasets, simultaneously achieving over 90% precision and recall when only 10% of the cards have been victims of fraud.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Railgun: streaming windows for mission critical systems
Authors:
João Oliveirinha,
Ana Sofia Gomes,
Pedro Cardoso,
Pedro Bizarro
Abstract:
Some mission critical systems, such as fraud detection, require accurate, real-time metrics over long time windows on applications that demand high throughputs and low latencies. As these applications need to run "forever", cope with large and spiky data loads, they further require to be run in a distributed setting. Unsurprisingly, we are unaware of any distributed streaming system that provides…
▽ More
Some mission critical systems, such as fraud detection, require accurate, real-time metrics over long time windows on applications that demand high throughputs and low latencies. As these applications need to run "forever", cope with large and spiky data loads, they further require to be run in a distributed setting. Unsurprisingly, we are unaware of any distributed streaming system that provides all those properties. Instead, existing systems take large simplifications, such as implementing sliding windows as a fixed set of partially overlapping windows, jeopardizing metric accuracy (violating financial regulator rules) or latency (breaching service agreements).
In this paper, we propose Railgun, a fault-tolerant, elastic, and distributed streaming system supporting real-time sliding windows for scenarios requiring high loads and millisecond-level latencies. We benchmarked an initial prototype of Railgun using real data, showing significant lower latency than Flink, and low memory usage, independent of window size.
△ Less
Submitted 10 November, 2020; v1 submitted 1 September, 2020;
originally announced September 2020.
-
Machine learning methods to detect money laundering in the Bitcoin blockchain in the presence of label scarcity
Authors:
Joana Lorenz,
Maria Inês Silva,
David Aparício,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Every year, criminals launder billions of dollars acquired from serious felonies (e.g., terrorism, drug smuggling, or human trafficking) harming countless people and economies. Cryptocurrencies, in particular, have developed as a haven for money laundering activity. Machine Learning can be used to detect these illicit patterns. However, labels are so scarce that traditional supervised algorithms a…
▽ More
Every year, criminals launder billions of dollars acquired from serious felonies (e.g., terrorism, drug smuggling, or human trafficking) harming countless people and economies. Cryptocurrencies, in particular, have developed as a haven for money laundering activity. Machine Learning can be used to detect these illicit patterns. However, labels are so scarce that traditional supervised algorithms are inapplicable. Here, we address money laundering detection assuming minimal access to labels. First, we show that existing state-of-the-art solutions using unsupervised anomaly detection methods are inadequate to detect the illicit patterns in a real Bitcoin transaction dataset. Then, we show that our proposed active learning solution is capable of matching the performance of a fully supervised baseline by using just 5\% of the labels. This solution mimics a typical real-life situation in which a limited number of labels can be acquired through manual annotation by experts.
△ Less
Submitted 5 October, 2021; v1 submitted 29 May, 2020;
originally announced May 2020.
-
ARMS: Automated rules management system for fraud detection
Authors:
David Aparício,
Ricardo Barata,
João Bravo,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Fraud detection is essential in financial services, with the potential of greatly reducing criminal activities and saving considerable resources for businesses and customers. We address online fraud detection, which consists of classifying incoming transactions as either legitimate or fraudulent in real-time. Modern fraud detection systems consist of a machine learning model and rules defined by h…
▽ More
Fraud detection is essential in financial services, with the potential of greatly reducing criminal activities and saving considerable resources for businesses and customers. We address online fraud detection, which consists of classifying incoming transactions as either legitimate or fraudulent in real-time. Modern fraud detection systems consist of a machine learning model and rules defined by human experts. Often, the rules performance degrades over time due to concept drift, especially of adversarial nature. Furthermore, they can be costly to maintain, either because they are computationally expensive or because they send transactions for manual review. We propose ARMS, an automated rules management system that evaluates the contribution of individual rules and optimizes the set of active rules using heuristic search and a user-defined loss-function. It complies with critical domain-specific requirements, such as handling different actions (e.g., accept, alert, and decline), priorities, blacklists, and large datasets (i.e., hundreds of rules and millions of transactions). We use ARMS to optimize the rule-based systems of two real-world clients. Results show that it can maintain the original systems' performance (e.g., recall, or false-positive rate) using only a fraction of the original rules (~ 50% in one case, and ~ 20% in the other).
△ Less
Submitted 14 February, 2020;
originally announced February 2020.
-
Interleaved Sequence RNNs for Fraud Detection
Authors:
Bernardo Branco,
Pedro Abreu,
Ana Sofia Gomes,
Mariana S. C. Almeida,
João Tiago Ascensão,
Pedro Bizarro
Abstract:
Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies.
In this work, we do not require thos…
▽ More
Payment card fraud causes multibillion dollar losses for banks and merchants worldwide, often fueling complex criminal activities. To address this, many real-time fraud detection systems use tree-based models, demanding complex feature engineering systems to efficiently enrich transactions with historical data while complying with millisecond-level latencies.
In this work, we do not require those expensive features by using recurrent neural networks and treating payments as an interleaved sequence, where the history of each card is an unbounded, irregular sub-sequence. We present a complete RNN framework to detect fraud in real-time, proposing an efficient ML pipeline from preprocessing to deployment.
We show that these feature-free, multi-sequence RNNs outperform state-of-the-art models saving millions of dollars in fraud detection and using fewer computational resources.
△ Less
Submitted 17 June, 2020; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Automatic Model Monitoring for Data Streams
Authors:
Fábio Pinto,
Marco O. P. Sampaio,
Pedro Bizarro
Abstract:
Detecting concept drift is a well known problem that affects production systems. However, two important issues that are frequently not addressed in the literature are 1) the detection of drift when the labels are not immediately available; and 2) the automatic generation of explanations to identify possible causes for the drift. For example, a fraud detection model in online payments could show a…
▽ More
Detecting concept drift is a well known problem that affects production systems. However, two important issues that are frequently not addressed in the literature are 1) the detection of drift when the labels are not immediately available; and 2) the automatic generation of explanations to identify possible causes for the drift. For example, a fraud detection model in online payments could show a drift due to a hot sale item (with an increase in false positives) or due to a true fraud attack (with an increase in false negatives) before labels are available. In this paper we propose SAMM, an automatic model monitoring system for data streams. SAMM detects concept drift using a time and space efficient unsupervised streaming algorithm and it generates alarm reports with a summary of the events and features that are important to explain it. SAMM was evaluated in five real world fraud detection datasets, each spanning periods up to eight months and totaling more than 22 million online transactions. We evaluated SAMM using human feedback from domain experts, by sending them 100 reports generated by the system. Our results show that SAMM is able to detect anomalous events in a model life cycle that are considered useful by the domain experts. Given these results, SAMM will be rolled out in a next version of Feedzai's Fraud Detection solution.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
A stable semi-implicit algorithm
Authors:
João P. S. Bizarro,
L. Venâncio,
R. Vilela Mendes
Abstract:
When the singular values of the evolution operator are all smaller or all greater than one, stable integration algorithms are obtained either by explicit or implicit methods. When the singular spectrum mixes greater and smaller than one values, neither explicit nor implicit methods insure stabilty. The problem is solved by using a splitting of the evolution operator and a semi-implicit scheme. The…
▽ More
When the singular values of the evolution operator are all smaller or all greater than one, stable integration algorithms are obtained either by explicit or implicit methods. When the singular spectrum mixes greater and smaller than one values, neither explicit nor implicit methods insure stabilty. The problem is solved by using a splitting of the evolution operator and a semi-implicit scheme. The method is illustrated in the study of a two-field model of the tokamak scrape-off layer.
△ Less
Submitted 16 January, 2021; v1 submitted 11 May, 2019;
originally announced May 2019.