Research on federated learning method for fault diagnosis in multiple working conditions

As one of the critical components of rotating machinery, fault diagnosis of rolling bearings has great significance. Although deep learning is useful in diagnosing rolling bearing faults, it is difficult to diagnose the faults of bearings under multiple operating conditions. To overcome the above-mentioned problem, this paper designs a modular federated learning network for fault diagnosis in multiple working conditions by using dynamic routing technology as the federation strategy for federated learning of the multiple modular neural network. First, according to different working conditions, the collected multi-working condition data are divided into different groups for feeding of modular network to extract the local features under different working conditions. Then, an additional deep neural network is constructed to extract the feature involved in data without working condition division. Finally, the global adaptive feature extraction of each working condition can be obtained by designing a federated strategy based on dynamic routing technology to achieve the weights allocation scheme of the modular neural network. The bearing dataset of Case Western Reserve University is taken as a benchmark dataset to verify the effectiveness of the proposed method.


INTRODUCTION
As an indispensable core component in the connection and transmission chain of mechanical equipment, rolling bearings play an important role in aerospace, electric power, metallurgy, and other industrial fields [1] . Mechanical equipment working in a complex environment is prone to a higher fault rate related to rolling bearings. Thus, the fault diagnosis of rolling bearings is significant. During the working process of mechanical equipment, the change of load is inevitable, and the change of load will cause the change of the motor speed, which can be seen in the collected data containing vibration data of different loads. Therefore, the fault diagnosis under multiple working conditions is of much practical significance.
In recent years, various methods for fault diagnosis of rolling bearings have been emerged. Data-driven fault diagnosis methods do not require precise physical models and expert knowledge, and they can directly extract data features for fault diagnosis. Therefore, data-driven fault diagnosis methods have attracted much attention from experts in the field of fault diagnosis [2][3] .
Deep learning is a data-driven fault diagnosis method. According to the variance of network structure, it can be classified into four categories: convolutional neural network (CNN)-based methods, long short-term memory network (LSTM)-based methods, deep neural network (DNN)-based methods, and DBN-based methods [4][5][6][7] . DNN and its variations have become some of the most commonly used methods for bearing fault diagnosis due to their simple structure and advantages in processing sequence data. Kong et al. [8] designed a feature fusion layer to fuse different types of features extracted by DNNs with different activation functions. Liu et al. [9] performed a short-time Fourier transform on the sound signal of the rolling bearing to generate a spectrogram, and DNN was used to extract features involved in spectrogram. This improved the fault diagnosis capability of the model. Shao et al. [10] used DAE to extract low-level features from raw vibration signals polluted with Gaussian noise, and then used SDAE to extract high-level features from low-level features to improve fault diagnosis performance. Although the above-mentioned deep learning fault diagnosis method can achieve satisfactory fault diagnosis results, it does not take the situation of multiple working conditions into account.
Due to changes in the external environment, load variation, etc., bearings usually operate under different working conditions in the sense that their process characteristics are quite different; the statistical distribution of the data collected from each working condition are also significantly distinguishable, which violates the constraint of independent and identical distribution required by traditional deep learning algorithms. At present, statistical methods, variational mode decomposition, and decision tree methods are often used to accomplish fault diagnosis under multiple working conditions [11][12][13] . Sun et al. [11] first used PCA to reduce the dimensionality of the data, and then constructed a decision tree to implement a multi-sensor-based multi-condition fault diagnosis method. Song et al. [12] used a recursive local outlier factor algorithm for adaptive pattern recognition to obtain the principal components according to the cumulative data contribution rate, and the analyzed the critical components that were obtained. The abovementioned method requires the design of a feature extractor, requires professional knowledge to perform data processing on the collected data signal, and the real-time performance of fault diagnosis cannot be guaranteed.
Existing fault diagnosis methods based on deep learning cannot effectively extract features from multiworking condition data. Therefore, some researchers first classify the working conditions, and then perform fault diagnosis under the situation of multiple working conditions. Zhou et al. [14] established the DNN model separately for each working condition's data, which can realize the fault diagnosis of multiple working conditions. Chen et al. [15] proposed a hierarchical fault diagnosis method based on CNN. The first layer performs working condition recognition and the second layer performs fault type diagnosis. The above-mentioned fault diagnosis methods have high requirements for the accuracy of working condition division, and the result of fault diagnosis is dependent on the efficiency of the working condition division. On the other hand, some scholars use data preprocessing techniques to eliminate modal differences between data and then perform fault diagnosis. Gu et al. [16] used local nearest-neighbor standardization to eliminate the differences between multimodal data and obtain accurate fault diagnosis results. Che et al. [17] used horizontal and vertical analysis methods to obtain the amplitude, variance, and standard deviation of the bearing signal. Then, PCA was used to extract the potential features. They designed the decisionlevel fusion of CNN and DBN to achieve multi-working condition bearing fault diagnosis. Some researchers also use transfer learning methods to solve fault diagnosis in multiple working conditions. Qiao et al. [18] proposed an adaptive convolutional neural network, which used a small amount of labeled bearing data to train a deep learning model, and then the sample size was increased according to the sequential tracking method to improve the generalization ability of the model. Zhao et al. [19] proposed a transfer learning framework based on deep multi-scale convolutional neural networks. The model trained in the source domain can be used for multi-working condition fault diagnosis of equipment after fine-tuning. Han et al. [20] proposed a deep transfer learning based on a joint domain adaptation algorithm. When the training set and the test set belong to different working conditions, they can be mapped to the same feature subspace to process the training set such that fault diagnosis for variable working conditions can be well accomplished. However, transfer learning requires a large number of labeled training datasets to train the network.
In engineering practice, due to the constraints of the environment and equipment complexity, the working conditions of the bearing are not easy to recognize. Therefore, it is necessary to combine the existing single-condition data and multi-working condition data into a unified model to obtain more robust and accurate fault diagnosis results. Federated learning is a distributed machine learning technology to train a unified network model by using data features provided by different organizers in cooperation. Through modularized federation of data under different working conditions, the purpose to jointly optimize local machine learning models can be achieved. Robert et al. [21] designed a special modular neural network structure using the idea of modularity, using gate networks for task allocation, and then using corresponding sub-networks to solve other related problems. This modular deep learning method is better than a traditional neural network with improved generalization ability. Andreas et al. [22] combined different neural network modules into a deep neural network to solve the problem of answering vision. Wei et al. [23] designed a deep one-dimensional convolutional neural network based on the idea of modularization, which can perform fault diagnosis under multiple working conditions in a noisy environment. Zhao et al. [24] used fuzzy c-means clustering to divide the measurement space into multiple subspaces and obtain the characteristics of each subspace through a local network. This method has good generalization ability. Bo et al. [25] realized modularization through fuzzy decision. This subnet method based on fuzzy decision can effectively improve the accuracy of the model. Yan et al. [26] designed a hierarchical CNN network, with spectral clustering for hierarchical categorizing. Geng et al. [27] first used wavelet analysis to process the original data, and then extreme learning machine was used as a classifier to identify rolling bearing faults. However, rules such as maximum pooling used in the previous work discard some features and cannot make full use of the extracted features. Therefore, Sabour et al. [28] designed a capsule network model based on dynamic routing rules. The capsule network is a feature-based modular approach. A group of neurons forms an output capsule. Each capsule expresses different characteristics, and the position relationship of different characteristics is established in the routing algorithm, which makes the network more robust to the angle change of the target. Chen et al. [29] proposed a capsule network with a normalization criterion that obeys the Gaussian distribution to diagnose bearing faults. This network could overcome the defects of CNN pooling layer's since it uses all features extracted by the convolution layer. Li et al. [30] proposed an end-toend scheme to combine two-channel signals by using a capsule network. The effect of rotation speed can be eliminated by fusing vertical and horizontal vibration signals, so the invariant features can be automatically extracted by using capsule network. Zhu et al. [31] proposed a new capsule network with a starting block and regression branch, which first converted a one-dimensional signal into a time-frequency graph, and then two convolution layers were used to extract more abstract features from the time-frequency graph. The initial block was applied to the output feature map to improve the capsule's nonlinearity. Two branches with different functions were designed: one branch uses the longest capsule to determine the damage size of the capsule, and the other branch reconstructs the time-frequency graph to overcome the over-fitting problem. However, the above-mentioned network suffered a large computational burden.
To overcome the above shortcomings, this paper designs a deep learning fault diagnosis network based on modular federation for bearings under multi-working conditions. DNN is firstly used to extract features layer by layer, and then the dynamic routing algorithm is adopted to adaptively federate the features extracted by multiple DNNs to establish a new modular federated neural network. The main contributions of this paper are as follows: (1) This paper designs a new modular federated neural network based on the idea of federated learning.
(2) Dynamic routing technology is used to propose a new modular federation mechanism such that the features extracted from data of different working conditions can be effectively federated in the case when there are no additional working condition labels.
(3) Modular federated learning increases the feature expression capabilities of the network to realize online fault diagnosis of bearing operated in any working condition.
The remainder of this paper is organized as follows: the related works of the proposed method are introduced in Section 2. The proposed modular federated learning method (MFLM) for fault diagnosis in the situation of multiple conditions is elaborated in Section 3. The experimental verification is demonstrated in Section 4. Finally, we draw the conclusions in Section 5.

DNN stacked with multiple Auto-Encoders
In this paper, the architecture of DNN is stacked with multiple AEs together. Rumelhart et al. [32] proposed the training process of DNN, as shown in Figure 1. First, the greedy layered training method is used to perform unsupervised pre-training layer by layer, and then supervised reverse fine-tuning is designed to optimize the entire network. The output of the previous AE is fed into the input of the AE on the next layer, thus the pretraining of DNN can be accomplished layer by layer.
After layer-by-layer feature extraction, the Softmax classifier is added and applied. The network parameters can be fine-tuned with labeled data by using back propagation.

Batch normalization
Batch normalization (BN) is a neural network optimization method proposed by Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Ioffe and Szegedy [33] , among others. For data with different covariances, BN can renormalize the output parameters to a standard Gaussian distribution [34] , as shown in Equations (1)-(3): Where z l(i) represents the ith input of the lth BN layer. ω and σ indicate the scale and offset of the layer, which are updated with training. E[z l(i) ] and Var[z l(i) ], respectively, represent the mean and standard deviation of the input x. z l(i) is the output of mini-batches in the BN layer. ω is a small constant, preventing the denominator from being zero.

Squashing function
For a vector neuron, squash can be defined for the nonlinear activation function to compress the vector neuron's value to within 0-1.
where s j and v j are the input and output of squashing, respectively. The long vector obtained can be compressed into a short vector through the squashing function, and a short vector can be reduced to almost zero length. The squashing function does not change the vector's direction so that the feature can be transferred to the next level network well.

Federated learning
As shown in Figure 2, federated learning is essentially a distributed machine learning technology to train a unified network model by using data features provided by different organizers cooperatively. This paper uses the idea of federated learning to optimize federation of modular deep learning networks established under multiple working conditions and realizes federation between features through vertical federation.

MULTI-WORKING CONDITION FAULT DIAGNOSIS BASED ON MFLM
Due to variation of operating load, rolling bearings may operate in multiple working conditions. Figure 3 shows the normal probability plot of inner race fault data collected in a single working condition and multiple working conditions ( Figure 3A and B, respectively). The x-axis of Figure 3 represents the data sorted in ascending order, while the y-axis represents the probability of a normal distribution. In the normal probability graph, if all sample points are near the solid line, the corresponding data obey normal distribution. Comparing Figure 3A and B, it can be seen that the data collected in multi-working condition violates the independent and identical distribution (i.i.d) assumption of machine learning algorithms, which is the basis of DNN for accurate feature extraction.
On the other hand, the existing modular neural network can only synthesize the sub-modules' diagnosis results rather than integrate modules.

Constructed multi-working condition fault diagnosis network
To solve the above-mentioned problems, this section proposes a MFLM for multi-working condition fault diagnosis by designing a federation mechanism using dynamic routing technology. The overall framework of the MFLM-based method proposed in this paper is shown in Figure 4. The detailed steps of the proposed fault diagnosis for multiple working conditions are as follows: Step 1. Data preprocessing For the 1D vibration data collected by an accelerator sensor, the sliding time window technique is used for reshaping the original data in Equation (5), where l is the length of the time window, and the step length is step = 10. (5) n represents that the dataset X Train contains n samples, each row in Equation (5) represents a sample, and the vector length of each sample is equal to the length of the sliding window.  Step 2. Construct a neural network for feature pre-extraction of each modular Data collected from each working condition is fed into the input of a modular network. An additional modular network is established for combined data X Train' without data labels. The BN technique is adopted in DNN training. Taking the two hidden layers as an example, the layer-by-layer feature extraction using BN can be formulated as Equation (6)   (6) Where x is the input sequence data, W i,1 and W i,2 represent the weight between the input and output of the ith DNN, b i,1 and b i,2 are the corresponding biases, and f(·) is the nonlinear activation function Relu.
The features extracted by multiple DNNs are spliced into Feature mul . Feature global is the global feature extracted by Equations (7) and (8). As shown in Equations (9) and (10), spliced feature Feature mul is back propagated to each DNN for optimization, and the updated Feature local can represent Feature mul as much as possible.
Where f θ,1 (·) represents the coding function of the first AE in SAE, f θ,2 (·) represents the coding function of the second AE, θ represents the coding parameter, and σ represents the non-linear activation function.
Step 3. Batch normalize data of different scales Batch normalization is performed using Equations (11) and (12): where E(·) and Var(·) are the mean and standard deviation of the input Feature s . γ and β can be trained, and ε is a minimal number to ensure that the denominator is not 0.
Step 4. Federation of features at different scales According to the local features of the single working condition extracted in Step 2 and the global feature of the multi-condition extracted by the global feature extraction network, a federation mechanism for the features at different scales is proposed. First, the local feature of the working condition label is federated to Feature mul . Then, after the multi-layer feature optimization, the federated learning is performed with Feature fed . The two features of different scales are merged to obtain Feature s , and the normalized features after federation are divided into modules. Each module is a vector neuron, and the information involved is better than that of the traditional scalar neuron. The squashing function is added after each module to scale the length of the vector neuron obtained to within 0-1. The modules of the fusion feature are divided into adaptive federation through dynamic routing strategy to realize the adaptive allocation of the weights of the top-level modules.
The propagation of the capsules between two layers involves two stages: linear transformation and dynamic routing. Different from fully connected neural networks, each capsule is multiplied by an independent weight matrix to predict each high-level capsule, as shown in Equation (13). (13) where u i denotes the ith input capsule, W ij is the weight matrix, and ^| j i u denotes the prediction vectors.
Next, each prediction ^| j i u propagates the features to all high-level capsules s j by the dynamic routing technique, while the high-level capsule is the weighted sum over all its predictions, as shown in Equation (14). (14) where c ij are coupling coefficients which satisfy the restriction of  (15): (15) where b ij are the log prior probabilities that prediction ^| j i u should be coupled to the high-level capsule s j . Then, the squashing function shown in Equation (16) is used. (16) where s j and v j are the input and output of squashing, respectively. Parameter b ij is updated, as shown in Equation (17) .
By iterating Equations (13)-(17) n times, the final feature v j is obtained.
Step 5. Multi-working condition fault diagnosis The new loss function defined in Equation (18) is designed for training of the network: (18) where v j represents the length of the federation module as well as the probability distribution of the fault diagnosis results, R c is the target category of the current fault, and m + = 0.9. m -= 0.1 and μ = 0.6 are used to reduce the weight of loss without failure; the initial learning can be stopped by reducing the length of all units' activation vectors.

Online fault diagnosis
Once online observation data X(t) are collected, they can be fed into the well trained MFLM model. The specific steps are as follows: (1) Use the trained DNN 5 to perform global feature extraction on X(t), and fuse the trained Feature local to obtain Feature s .
(2) Send the fusion data Feature s to the trained MFLM to realize fault diagnosis.

Bearing data description and experimental description
A benchmark dataset downloaded from Case Western Reserve University bearing center [35] was used to test the effectiveness of the proposed method. The experimental platform is shown in Figure 6, including a 2 hp motor, power meter, electronic controller, torque sensor, and a load motor. The acceleration sensor was used to collect the motor drive end-bearing vibration signal under different load conditions in the experiment. The bearing's health state was divided into four types: inner race fault (IF), outer race fault The multiple working conditions of the bearing are shown in Table 1.
Since the vibration data collected by the sensor are 1D sequence data, a sliding window with size 400 was used to preprocess the original sequence data. Each sliding window is a sample, and the sliding step length is 20.
To verify the effectiveness of the method in this paper, Experiments 1-9 were designed. The proposed method was compared with three existing fault diagnosis methods: the traditional DNN fault diagnosis method and the hierarchical DNN (HDNN) designed in [14] and the fault diagnosis method proposed in [24] (TICNN) were trained for a single working condition and used the trained network to test bearing faults under other working conditions. The network structure parameters are shown in Table 2.
To verify the influence of multi-condition data on the experimental results, Experiments 1-6 were designed, among which Experiments 1-3 are single-condition fault diagnosis experiments, while Experiments 4-6 are multi-condition fault diagnosis experiments. To verify the influence of the number of training samples on the fault diagnosis results, Experiments 7-9 were designed. The experimental design is shown in Table 3.

Analysis of the experimental results of single working conditions and multiple working conditions
The experimental results of single working conditions and multiple working conditions are shown in Table 4. Experiments 1-3 are the experimental results of a single working condition. Experiment 1 is that the fault size is 0.007 inches, Experiment 2 is that the fault size is 0.014 inches, and Experiment 3 is that the fault size is 0.021 inches. Experiments 4-6 are the experimental results of multiple working conditions. Experiments 4-6 show that the fault size is 0.007, 0.014, and 0.021 inches, respectively.

Working condition Load Rotating speed (rpm)
Working condition 1 0 1797 Working condition 2 1 1772 Working condition 3 2 1750 Working condition 4 3 1730 Table 1. Working conditions of rolling bearings Figure 6. Bearing data collection equipment [35] : (A) bearing data collection experiment platform diagram; and (B) location diagram of the acceleration sensor.
Comparing Rows 2 and 5 in Table 4, it can be seen that, when the size of the fault is the same, the multicondition fault diagnosis accuracy of different networks is reduced compared to the single-condition fault accuracy. Similarly, comparing Rows 3 and 6, the same pattern can be found as in Rows 4 and 7. This shows that the data of multiple operating conditions caused the problem of feature extraction, which affected the diagnosis results. Therefore, it is of great practical significance to carry out fault diagnosis research under multiple working conditions. Comparing the Columns 2 and 5 of Row 4, it can be seen that the MFLM method proposed in this paper has fault diagnosis accuracy 7.39% higher than that of the traditional DNN method when the fault size is 0.021 inches. In the fault diagnosis of industrial field, it is more difficult to improve the accuracy to more than 95%. MFLM not only improves the accuracy to more than 95%, but also reaches 99.75%, so it is more likely to be valued by engineers.
To improve the readability of the experiment results listed in Table 4, Figure 7 shows the fault diagnosis classification chart taking Experiment 6 as an example. The classification results are represented by blue  stars in the figure. The red circles represent the true fault categories of the sample. The coincidence of red circle and blue star indicates that the classification is correct. Figure 7A-D corresponds to Row 7 in Table 4. Figure 7A is the result of traditional DNN fault diagnosis. Figure 7B is the result of HDNN fault diagnosis. Figure 7C is the result of TICNN fault diagnosis. Figure 7D is the result of MFLM fault diagnosis. Figure 7A shows that the blue stars are dense, which indicates that the misclassification rate is relatively high. Figure 7B has fewer blue stars than Figure 7A, indicating that HDNN's classification effect is better than the traditional DNN. This is because HDNN adopts a hierarchical fault diagnosis method, which first diagnoses the working condition category, and then diagnoses the fault category. Figure 7D shows that MFLM has the fewest blue stars, which means that the misclassification rate is the lowest. This explains the superiority of the modular neural network in fault diagnosis of multiple working conditions.

Analysis of fault diagnosis results of multiple working conditions with different sample sizes
The number of training samples is an important factor affecting the effectiveness of fault diagnosis. It is difficult to obtain high-quality labeled fault samples in industrial sites. Therefore, to verify the results of different sample sizes for multi-condition fault diagnosis, the number of samples for each type of fault in Experiment 7-9 is 1000, 2000, and 3000, respectively, while the size of the fault is 0.007, 0.014, and 0.021 inches, respectively. The experimental results are shown in Table 5.
Comparing Columns 2 and of Row 4 of Table 5, it can be seen that, when the fault size is 0.007 inches and the number of samples without working condition labels is 48,000, the multi-working condition diagnosis accuracy of MFLM can reach 95.03%. The diagnostic accuracy of the traditional DNN method is lower than 80%. Comparing Columns 4 and 5 of Row 4, it can be seen that, although the existing TICNN method can achieve more than 90% accuracy in multi-condition diagnosis, it is still lower than the MFLM method. This illustrates the effectiveness of modular federation in MFLM. Comparing Rows 4 and 10, it can be seen that, when the fault size is 0.021 inches and the number of samples without working conditions is 16,000 and 48,000, the greater is the number of samples, the higher is the accuracy of fault diagnosis. In fault diagnosis, the number of samples is an important factor that affects the result of fault diagnosis. Comparing Columns 4 and 5 of Row 8, it can be seen that, when the fault size is 0.021 inches and the number of samples without working conditions is 16,000, the diagnostic accuracy of MFLM is 97.11%, which is 2.79% higher than that of TICNN. The accuracy is increased to more than 95%, which is a very important improvement in engineering. Row 10 of Table 5 shows that, when the fault size is 0.021 inches and the number of samples  without working conditions is 48,000, while the diagnostic accuracy of the various fault diagnosis methods was improved, the diagnostic accuracy of MFLM could reach 99.07%, which illustrates the superiority of the method proposed in this paper.
To improve the readability of the experiment results listed in Table 5, Figure 8 shows the fault diagnosis classification chart taking Experiment 15 as an example. Figure 8A-D correspond to Row 10 in Table 5. Figure 8A is the result of traditional DNN fault diagnosis. Figure 8B is the result of HDNN fault diagnosis. Figure 8C is the result of TICNN fault diagnosis. Figure 8D is the result of MFLM fault diagnosis. Figure 8A shows that the blue stars are dense, which indicates that the misclassification rate is relatively high. Figure 8B has fewer blue stars than Figure 8A, indicating that HDNN's classification effect is better than that of the traditional DNN. This is because HDNN adopts a hierarchical fault diagnosis method, which first diagnoses the working condition category, and then diagnoses the fault category. Figure 8D shows that MFLM has the fewest blue stars, which means that the misclassification rate is the lowest. This explains the superiority of modular neural network in fault diagnosis of multiple working conditions. Figure 9 shows a comparative bar graph of all experimental results.

CONCLUSIONS
Aiming at improving the low generalization ability of traditional neural networks when solving multiworking condition problems, this paper designs a modular federated neural network based on dynamic routing technology to design a federated mechanism of multiple modular networks. Using the proposed method, bearing fault diagnosis under multiple working conditions can be well diagnosed without requiring an additional module recognition stage.

Authors' contributions
Made substantial contributions to conception and design of the study and performed data analysis and interpretation: Zhou F, Li S Performed data acquisition, as well as providing administrative, technical, and material support: Zhang Z

Availability of data and materials
Experimental data source: Case Western Reserve University rolling bearing data, website: http://csegroups.case.edu/bearingdatacenter/home.

Financial support and sponsorship
This research was supported in part by the Natural Science Fund of China (Grant No. 62073213, U1604158, U1804163, 61751304).