Keywords

1 Introduction

Distributed Artificial Intelligence is a subfield of Artificial Intelligence that studies the coordination among several semi-autonomous agents called participants. Such systems are able to solve more complex problems involving a large amount of data, but there are privacy concerns about sharing sensitive information.

Federated Learning is a novel approach to Distributed Artificial Intelligence that enables privacy-preserving communications by sharing the model (or gradients) instead of the data. A central server sends a model to be trained by the participants with their local data, who send the parameters of the model back to the server to be aggregated. After iterating this process, the output is a model that has been trained with the private information of all participants.

This method is especially useful when dealing with sensitive data, from domains such as finance or healthcare. In this paper, the authors propose a Federated Fuzzy Cognitive Map approach to help diagnose malignant breast tumor cells.

The contributions of this paper can be summarized as follows:

  • Distributed learning. The authors propose a PSO-based FCM learning in a distributed way.

  • Privacy-preserving machine learning. The authors design a training scheme for collaborative FCM learning that offers data privacy. This proposal enables multiple participants to learn a FCM model on their own inputs, preserving the privacy of their own data and complying with data privacy regulations.

  • Implementation. The authors evaluate the performance of the proposal with a well-known dataset of cancer diagnosis. The experimental results show that the proposal achieve a similar performance to other non-distributed methods and improves the performance of the non-collaborative approach.

The rest of this paper is organized as follows. We discuss existing fundamentals of FCM and the learning approach in Sect. 2. Distributed Artificial Intelligence is described in Sect. 3. Then, we present the methodological proposal in Sect. 3. Section 4 describes the details of the experimental approach and the results. Finally, we draw a conclusion in Sect. 5.

2 Fuzzy Cognitive Maps

2.1 Fundamentals

Fuzzy Cognitive Maps (FCMs) were initially proposed by Kosko [3]. FCMs represent concepts, variables or features as nodes, the relationships between them as arcs, and the strengths of those relations as weights. It means that a weight assesses how much node X causes node Y. The fuzzy weights for arcs are normalised on the range \(\{[0,+1]|[-1,+1]\}\), depending if it includes negative values or not. The maximum negative influence is \(- 1\) and the maximum positive influence is \(+1\). The value zero shows that there is no relationship between the concepts. For computational purposes, FCMs can be described via a weight matrix (connection or adjacency matrix) which contains all weight values of edges between the concepts.

The relationships between the nodes are expressed by their weights. That is, if there is a positive causality between two nodes, then \(\varpi _{ij}>0\). If there is a negative causality, then \(\varpi _{ij}< 0\) and if there is no relationship between the two nodes, then \(\varpi _{ij}=0\). The state of the nodes together is shown in the state vector \(c =[c_1,c_2,\ldots ,c_N]\) that gives a snapshot of nodes at any point of the instant in the scenario.

From a formal point of view, it is possible to represent a FCM as a 4-tuple \(\Phi = \langle c, \mathcal {W}, f, r \rangle \), where \(c=\{c_i\}_{i=1}^n\) is the state of the nodes with n as the number of nodes, \(\mathcal {W} = [\varpi _{ij}]_{n\times n}\) is the adjacency matrix representing the weights between the nodes, f is the activation function, and r is the nodes’ range.

FCMs are dynamical systems involving feedback, where the effect of change in the state of a node may affect the state of other nodes, which in turn can affect the former node [7].

The dynamic starts with an initial vector state \(c(0)=\big (c_1(0),\ldots , c_n(0)\big )\), which represents the initial state (value) of each node. The new state of the nodes is computed as an iterative process. It includes an activation function [1] for mapping monotonically the node state into a normalized range \(\{[0, +1]|[-1, +1]\}\). If the range is \([0, +1]\), the unipolar sigmoid is the most used activation function, but hyperbolic tangent is the most used when the range is \([-1, +1]\).

The component i of the vector state at time t\(c_i(t),\) can be computed as

$$\begin{aligned} c_i(t) = f\Bigg (\sum _{j=1}^n \varpi _{ji}\cdot c_j(t-1)\Bigg ). \end{aligned}$$
(1)

Some systems include nodes whose states should be steady because their states are not related with the dynamics of the system but their state has some influence on the state of the other nodes (i.e. sun radiation, wind speed and so on). In such cases, the state of the node is the same along the dynamics \(c_i(t)=c(t-1)\;|\; c_i\in \mathcal {O},\) where \(\mathcal {O}\) is the set of output concepts.

If the activation function f is unipolar sigmoid, then the component i of the vector state \(c_i(t)\) at the instant t is computed as follows

$$\begin{aligned} c_i(t) = \Big (1+e^{-\lambda \cdot \sum _{j=1}^n \varpi _{ji}\cdot c_j(t-1)}\Big )^{-1} \end{aligned}$$
(2)

If the activation function f is hyperbolic tangent, then the component i of the vector state \(c_i(t)\) at the instant t is computed as follows

$$\begin{aligned} c_i(t) = \frac{e^{\lambda \cdot \sum _{j=1}^n \varpi _{ji}\cdot c_j(t-1)}-e^{-\lambda \cdot \sum _{j=1}^n \varpi _{ji}\cdot c_j(t-1)}}{e^{\lambda \cdot \sum _{j=1}^n \varpi _{ji}\cdot c_j(t-1)}+e^{-\lambda \cdot \sum _{j=1}^n \varpi _{ji}\cdot c_j(t-1)}} \end{aligned}$$
(3)

After the dynamics, the FCM reaches one of the three following states after a number of iterations: it settles down to either a fixed pattern of node values (the so-called hidden pattern), a limited cycle, or a fixed-point attractor.

2.2 Augmented FCMs

According to the FCM literature [4], an augmented adjacency matrix is built by aggregating the adjacency matrix of each FCM. The elements’ aggregation depends on whether there are common nodes. If the adjacency matrices had no common nodes, the elements \(\varpi _{ij}\) in the augmented matrix (\(\displaystyle \otimes _{i=1}^{N}\)) are computed by adding the adjacency matrix of each FCM model (\(\mathcal {W}_{i}\)).

The addition method when the adjacency matrices have not common nodes is known as direct sum of matrices, and the augmented matrix is denoted as \(\otimes _{i=1}^{N}\varpi _{i}\). Given a couple of FCMs with no common nodes and even different number of nodes with adjacency matrices \([\varpi _{ij}]_{n\times n}\) and \([\varpi _{kl}]_{m\times m}\), the resulting augmented adjacency matrix is as follows

(4)

where N is the number or adjacency matrices to add, zeroes are actually zero matrices and the dimension of \(\displaystyle \otimes _{i=1}^{N}\varpi _{i}\) is \([\cdot ]_{m+r\times m+r}\). In the case of common nodes, they would be computed as the average or weighted average of the states of the nodes in each adjacency matrix.

2.3 FCM for Classification

FCMs classification capabilities have been analysed by [8]. In general terms, the main goal of a conventional classifier is the mapping of an input to a specific output according to a pattern. In this proposal, the input concepts represent the features of the dataset, while the output concepts are the classes’ labels where the patterns belong.

Figure 1 shows the typical topology of a FCM classifier where the state of the concepts \(c_{1}\) and \(c_{2}\) defines the class where the input vector state belongs. In that sense, if \(c_{1}>c_{2}\) the input vector state belongs to class 1 but if \(c_{1}<c_{2}\) the input vector state belongs to class 2. Note that \(c_{i}\in \{[-1,+1],[0,+1]\}\), therefore if \(c_{1}=0.03\) and \(c_{2}=0.1\), then the input vector state belong to class 2.

Fig. 1.
figure 1

Fuzzy Cognitive Maps classifier

2.4 PSO-Based FCM Learning

FCM learning endeavours are commonly focused on building the adjacency matrix based either on the available historical raw data or on expert knowledge. FCM learning approaches could be divided into three categories [10]: Hebbian, population-based, and hybrid, mixing the main aspects of Hebbian-based and population-based learning algorithm.

The goal of the Hebbian-based FCM learning approaches is to modify adjacency matrices leading the FCM model to either achieve a steady state or converge into an acceptable region for the target system.

Population-based approaches do not need the human intervention. They compute adjacency matrices from historical raw data that best fit the sequence of input state vectors (the instances of the dataset). The learning goal of FCM evolutionary learning is to generate optimal adjacency matrix for modeling systems behaviour.

In this sense, Salmeron et al. [11] proposed an advanced decision support tool based on consultations with a group of experienced medical professionals using FCMs trained with Particle Swarm Optimization (PSO). Also, Salmeron and Froelich [9] apply PSO for time series forecasting.

PSO is a bio-inspired, population-based and stochastic optimization algorithm. The PSO algorithm generates a swarm of particles moving in an n-dimensional search space which must include all potential candidate solutions.

In order to train the FCM adjacency matrices we take into account the \(k^{th}\) particle’s position (a candidate solution or adjacency matrix), denoted as \(\varpi _k=(\varpi _{k_1},\ldots ,\varpi _{k_j})\) and its velocity, \(v_k=(v_{k_1},\ldots ,v_{k_j})\). Note that each particle is a potential solution or FCM candidate and its position \(\varpi _k\) represents its adjacency matrix.

Each particle’s velocity and position are updated at each time step. The position and the velocity of each particle is computed as follows

$$\begin{aligned} \varpi _k(t+1)&= \varpi _k(t) + v_k(t)\end{aligned}$$
(5a)
$$\begin{aligned} v_k(t+1)&= v_k(t) + U(0,\phi _1)\otimes (\dot{\varpi }_k -\varpi _k(t)) + U(0,\phi _2)\otimes (\ddot{\varpi }_k -\varpi _k(t)) \end{aligned}$$
(5b)

where \(U(0,\phi _i)\) is a vector of random numbers generated from a uniform distribution within \([0,\phi _i],\) generated at each iteration and for each particle. Also, \(\dot{\varpi }_k\) is the best position of particle k in all former iterations and \(\ddot{\varpi }_k\) the best position of the whole population in all previous iterations and \(\otimes \) is the component-wise multiplication.

The PSO algorithm’s goal is to locate all the particles in the global optima to a multidimensional hyper-volume. The fitness function used in this research is the complement of the Jaccard similarity coefficient (\(\overline{J}=(Y\times \hat{Y})\setminus J\)). The Jaccard score computes the average of Jaccard similarity coefficients between pairs of the sets of labels. The Jaccard similarity coefficient of the i-th samples, with a ground truth label set and a predicted label set. The complement operation is needed in terms of minimization of the fitness function. The Jaccard similarity coefficient’s complement is computed as follows

$$\begin{aligned} \overline{J}(y_{i},\hat{y}_{i}) = 1- \frac{\vert y_{i} \cap \hat{y}_{i}\vert }{\vert y_{i} \cup \hat{y}_{i}\vert } \end{aligned}$$
(6)

The fitness function is sampled after each particle position update and is the objective function used to compute how close a given particle is in order to be able to achieve the set aims.

3 Methodological Proposal

3.1 Fundamentals

Distributed Artificial Intelligence is a subset of Artificial Intelligence that allows the sharing of information among several agents or participants that interact by cooperation, by coexistence or by competition. Such system manages the distribution of tasks, being therefore more apt to solve complex problems, especially if they involve a large amount of data.

One of the methods available to construct a distributed artificial intelligence system is Federated Learning, proposed by McMahan et al. [5] and further developed in Konecny et al. [2] and McMahan and Ramage [6]. In such system, a central server constructs a model, usually a neural network, and sends it to the participants, who train the model in their private data. Their data never leaves their local devices, therefore ensuring privacy and security. The parameters of the participant’s model are then averaged to obtain a global model. This process may be iterated till convergence.

Described in a formal way, a Federated Learning project is composed by a central server and the participants. The central server is responsible for managing the federated model and the communications with the participants. The participants own the datasets and train the partial models. The whole process is described in Fig. 2 and it is as follows:

  1. 1.

    The central server sends a federated model to each participant. If it is the initial iteration the federated model is proposed by the central server.

  2. 2.

    Each participant trains the received model with their own private dataset.

  3. 3.

    After the partial model is trained, each participant sends the parameters of the model or its gradients to the central server, encrypted to ensure privacy.

  4. 4.

    The central server aggregates the partial model and builds the federated model.

  5. 5.

    The central server checks the termination condition and if it is accomplished the federated model is finished, otherwise the process goes back to step 1.

Fig. 2.
figure 2

Federated Learning process

When the researchers at Google first defined Federated Learning, their initial idea was to allow Android mobile phones to collaborative construct a prediction model without migrating the training data from the phone (see McMahan et al. [6] from the Google AI Blog). A first application they had was to use FL in Gboard on Android, the Google Keyboard, which predicts the most probable next phrase or word based on the user-generated preceding text. Recently, Federated Learning has improved this process, allowing the use of more accurate models with lower latency, ensuring privacy and less power consumption.

One of the main advantages of Federated Learning is the promise of secure and private distributed machine learning, but there are risks associated with sharing data among several agents, such as the reconstruction of training examples from the neural network parameters, the uploading of private data from the agents to the central server, and the protection of the models as intellectual property of the companies. There is a large research interest in privacy-preserving methods applied to Federated Learning, such as the application of Differential Privacy, Secure Multi-Party Computation or Homomorphic encryption.

3.2 FCM Distributed Learning

The proposed methodology combines Federated Learning with learning FCMs using Particle Swarm Optimization. The process is shown in Fig. 3 and it is explained as follows.

  1. 1.

    Triggering the Federated Learning process. The central server triggers the process in the participants machines.

  2. 2.

    Training FCM in the local dataset. Each participant trains a local FCM with their own dataset. The authors apply PSO but this methodology is agnostic to the learning approach. The FCM dynamics is considered steady when the difference between two consecutive vector states is under \(tol= 0.00001\)

  3. 3.

    Sending the trained adjacency matrices and local accuracy for this stage to the central server. The local FCM is stored in the participant devices.

  4. 4.

    Weighting local FCMs using accuracy. The central server aggregates the local FCMs weighting by the accuracy. The aggregation method have been detailed as Sect. 2.2.

  5. 5.

    Aggregating Federated and Local FCMs. The participants aggregate the Federated FCM from the central server and their own local FCM.

  6. 6.

    Sending adjacency matrices and accuracy. Participants send again the local adjacency matrices and the new local accuracy.

  7. 7.

    Checking termination condition. The central server checks if the Federated process has been run 20 iterations as termination condition. If it is not accomplished then it goes back to the step 4.

  8. 8.

    If the termination condition is accomplished then a Federated FCM is achieved.

Fig. 3.
figure 3

Proposed methodology

The main contribution of this paper is the application of Federation Learning paradigm for privacy-preserving FCM distributed and coorperative learning.

4 Experimental Approach

4.1 Dataset

Breast cancer is one of the most common cancers among women, accounting for 25% of all cancer cases that affect women worldwide. According to the American Cancer Society, when breast cancer is detected early, and is in the localized stage, the 5-year relative survival rate is 99%, which makes the early diagnosis of breast cancer a main key in the prognosis and chance of survival of such types of cancer.

Table 1. Dataset details

In recent years the use of Machine Learning algorithms in medicine has increased exponentially, with applications such as EEG analysis and Cancer detection. For example, automatized algorithms have been use to examine biological data such as DNA methylation and RNA sequencing to infer which genes can cause cancer and which genes can instead be able to suppress its expression.

In this paper the authors will use the Breast Cancer Wisconsin Dataset, created by Dr. William H. Wolberg, physician at the University Of Wisconsin Hospital at Madison, and made publicly available at the UC Irvine Machine Learning Repository. The dataset comprises data from digitized images of the fine-needle aspirate of a breast mass that describes features of the nucleus of the current image of 569 patients, of which 212 are malignant and 357 are benign cases.

The first two features correspond to the identifier number and the diagnosis status (our target). The remaining attributes are thirty real attributes that measure the mean, the standard error, and the worst radius, texture, perimeter, area, smoothness, compactness, concave points, concavity, symmetry, and fractal dimension of the nucleus of the solid breast mass (see Table 1). These data were obtained using a graphical computer program called Xcyt, which is capable of perform the analysis of cytological features based on a digital scan. More details can be found in [12, 13].

4.2 Results

After 20 iterations of the Federated Learning process, the Fuzzy Cognitive Map-based classifier is able to predict whether the tumor is malignant with an average accuracy of 0.9383 across all participants, improving the accuracy of a single Fuzzy Cognitive Map trained in the whole data, and the accuracy in each participant before the federation.

The goal of this paper is not the accuracy of the proposal but a distributed and privacy-preserving approach. Nevertheless, our results are similar to the ones found in literature [14] (Table 2).

Table 2. Results of the experiments

5 Conclusions

This paper proposes an innovative methodology for learning Fuzzy Cognitive Maps with Federated Learning. It is a step forward for Distributed Artificial Intelligence and accomplishes the privacy-preserving requirements of the society.

In addition, the authors have developed a method for distributed Fuzzy Cognitive Maps that improves the accuracy of both the algorithm trained in the whole dataset in a local node and the participant’s algorithms before the Federated Learning process.

This method was applied to a cancer detection problem, obtaining an accuracy of 0.9383. The participants in this process do not share their private data, therefore forming a privacy-preserving distributed system.