Genetic diversity and utilization of ginger (<i>Zingiber officinale</i>) for varietal improvement: A review

Yusuff Oladosu; Mohd Y Rafii; Fatai Arolu; Suganya Murugesu; Samuel Chibuike Chukwu; Monsuru Adekunle Salisu; Ifeoluwa Kayode Fagbohun; Taoheed Kolawole Muftaudeen; Asma Ilyani Kadar; Yusuff Oladosu; Mohd Y Rafii; Fatai Arolu; Suganya Murugesu; Samuel Chibuike Chukwu; Monsuru Adekunle Salisu; Ifeoluwa Kayode Fagbohun; Taoheed Kolawole Muftaudeen; Asma Ilyani Kadar

doi:10.3934/agrfood.2024011

AIMS Agriculture and Food

2024, Volume 9, Issue 1: 183-208. doi: 10.3934/agrfood.2024011

Previous Article Next Article

Review

Genetic diversity and utilization of ginger (Zingiber officinale) for varietal improvement: A review

1.
Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia (UPM), Serdang 43400, Malaysia
2.
Department of Agriculture, Faculty Technical and Vocational, Sultan Idris Education University, Tanjung Malim 35900, Malaysia
3.
Department of Zoology, University of Lagos, Yaba 101017, Nigeria
4.
Department of Biological Sciences, Faculty of Computing and Applied Sciences, Baze University, Abuja 900102, Nigeria

Received: 17 October 2023 Revised: 09 January 2024 Accepted: 09 January 2024 Published: 29 January 2024

Ginger is widely cultivated globally and considered the third most important spice crop due to its medicinal properties. It is cultivated for its therapeutic potential in treating different medical conditions and has been extensively researched for its pharmacological and biochemical properties. Despite its significant value, the potential for genetic improvement and sustainable cultivation has been largely ignored compared to other crop species. Similarly, ginger cultivation is affected by various biotic stresses such as viral, bacterial, and fungal infections, leading to a significant reduction in its potential yields. Several techniques, such as micropropagation, germplasm conservation, mutation breeding, and transgenic have been extensively researched in enhancing sustainable ginger production. These techniques have been utilized to enhance the quality of ginger, primarily due to its vegetative propagation mode. However, the ginger breeding program has encountered challenges due to the limited genetic diversity. In the selection process, it is imperative to have a broad range of genetic variations to allow for an efficient search for the most effective plant types. Despite a decline in the prominence of traditional mutation breeding, induced mutations remain extremely important, aided by a range of biotechnological tools. The utilization of in vitro culture techniques serves as a viable alternative for the propagation of plants and as a mechanism for enhancing varietal improvement. This review synthesizes knowledge on limitations to ginger cultivation, conservation, utilization of cultivated ginger, and the prospects for varietal improvement.

Keywords:

Citation: Yusuff Oladosu, Mohd Y Rafii, Fatai Arolu, Suganya Murugesu, Samuel Chibuike Chukwu, Monsuru Adekunle Salisu, Ifeoluwa Kayode Fagbohun, Taoheed Kolawole Muftaudeen, Asma Ilyani Kadar. Genetic diversity and utilization of ginger (Zingiber officinale) for varietal improvement: A review[J]. AIMS Agriculture and Food, 2024, 9(1): 183-208. doi: 10.3934/agrfood.2024011

Related Papers:

[1]	Yahia Abdel-Aty, Mohamed Kayid, Ghadah Alomani . Generalized Bayesian inference study based on type-Ⅱ censored data from the class of exponential models. AIMS Mathematics, 2024, 9(11): 31868-31881. doi: 10.3934/math.20241531
[2]	Olfa Hrizi, Karim Gasmi, Abdulrahman Alyami, Adel Alkhalil, Ibrahim Alrashdi, Ali Alqazzaz, Lassaad Ben Ammar, Manel Mrabet, Alameen E.M. Abdalrahman, Samia Yahyaoui . Federated and ensemble learning framework with optimized feature selection for heart disease detection. AIMS Mathematics, 2025, 10(3): 7290-7318. doi: 10.3934/math.2025334
[3]	Sang Gil Kang, Woo Dong Lee, Yongku Kim . Bayesian multiple changing-points detection. AIMS Mathematics, 2025, 10(3): 4662-4708. doi: 10.3934/math.2025216
[4]	Mohieddine Rahmouni, Dalia Ziedan . The Weibull-generalized shifted geometric distribution: properties, estimation, and applications. AIMS Mathematics, 2025, 10(4): 9773-9804. doi: 10.3934/math.2025448
[5]	Rasha Abd El-Wahab Attwa, Shimaa Wasfy Sadk, Hassan M. Aljohani . Investigation the generalized extreme value under liner distribution parameters for progressive type-Ⅱ censoring by using optimization algorithms. AIMS Mathematics, 2024, 9(6): 15276-15302. doi: 10.3934/math.2024742
[6]	Refah Alotaibi, Mazen Nassar, Zareen A. Khan, Ahmed Elshahhat . Statistical analysis of stress–strength in a newly inverted Chen model from adaptive progressive type-Ⅱ censoring and modelling on light-emitting diodes and pump motors. AIMS Mathematics, 2024, 9(12): 34311-34355. doi: 10.3934/math.20241635
[7]	Ahmed R. El-Saeed, Ahmed T. Ramadan, Najwan Alsadat, Hanan Alohali, Ahlam H. Tolba . Analysis of progressive Type-Ⅱ censoring schemes for generalized power unit half-logistic geometric distribution. AIMS Mathematics, 2023, 8(12): 30846-30874. doi: 10.3934/math.20231577
[8]	Nora Nader, Dina A. Ramadan, Hanan Haj Ahmad, M. A. El-Damcese, B. S. El-Desouky . Optimizing analgesic pain relief time analysis through Bayesian and non-Bayesian approaches to new right truncated Fréchet-inverted Weibull distribution. AIMS Mathematics, 2023, 8(12): 31217-31245. doi: 10.3934/math.20231598
[9]	Haiping Ren, Jiajie Shi . A generalized Lindley distribution:Properties, estimation and applications under progressively type-Ⅱ censored samples. AIMS Mathematics, 2025, 10(5): 10554-10590. doi: 10.3934/math.2025480
[10]	Mohamed S. Eliwa, Essam A. Ahmed . Reliability analysis of constant partially accelerated life tests under progressive first failure type-II censored data from Lomax model: EM and MCMC algorithms. AIMS Mathematics, 2023, 8(1): 29-60. doi: 10.3934/math.2023002

Abstract

1. Introduction

Federated learning (FL) is an emerging artificial intelligence technology, which was first proposed by Google in 2016 to tackle the issue of updating local models for Android phone users ^[1]. It is an important machine learning paradigm where a federation of clients builds a centralized global model collaboratively without transferring their local data. To be more specific, there is a center server coordinating the training process and each client only communicates the model parameters to the center server while keeping local data private and reducing the risk of data leakage.

Nevertheless, FL can still meet a variety of challenges ^[2]. Among them, the heterogeneity problem is the most prominent ^[3,4]. Our work focuses on four heterogeneities:

(1) Data heterogeneity. The data among clients is frequently distributed in a heterogenous nature in the network, making the independently and identically distributed (IID) assumption invalid. Simply averaging the uploaded parameters from clients may lead to significant accuracy reduction due to large aggregation errors and severe local forgetting ^[5].

(2) Model heterogeneity. In the original federated framework, all participants agree on the particular architecture of a shared global model ^[6]. This may be acceptable for a large number of low-capacity devices like mobile phones, but it is not appropriate for business or industry-oriented settings. For example participants may come from different organizations, who have different hardware and storage capabilities ^[7,8] and desire to design their own unique models. A more important fact is that the architecture of model involves significant privacy and intellectual property issue, so the participants reluctant to share their personalized models.

(3) Objective heterogeneity. The objectives of FL are distinct, where global model and local model serve different purposes. On the one hand, server intends to train a single generalized global model to fit joint global data distribution for all clients and new participants. On the other hand, each client intends to train a personalized model to fit its own local data distribution for itself. Furthermore, although the clients have similar features (e.g., visual features), they may have different tasks (e.g., 10/100 classification) ^[9]. It's evident that FedAvg can't meet these requirements as it can only produce a shared global model of a specific architecture.

(4) Systems heterogeneity. Bandwidth and computational powers vary widely among participants leading to different update rates, which has been improved by asynchronous ^[10] and semi-synchronous ^[11] schemes. However, these methods converge slowly, especially when the data samples among clients are Non-IID. Though the single round can take shorter time, it requires more rounds to reach a satisfactory accuracy.

Recent works have studied the heterogeneities inherent in real-world FL settings. Zhang et al. ^[12] introduced a personalized federated contrastive learning model to handle the heterogeneity of features and samples. Concretely, they constructed a double optimization problem concluding maximizing the agreement by comparing the shared parts of local models with the global model and minimizing the agreement by comparing the personalized parts of local models with the global model. Xiao et al. ^[13] proposed a new federated learning algorithm. Accuracy Based Averaging (ABAVG), which promotes the server-side aggregation process in Non-IID situations by setting weights based on local accuracies. To deal with model heterogeneity, Li et al. ^[3] introduced FedMD based on knowledge distillation and used the probability distribution outputs of public samples as the interaction information. Hu et al. ^[14] proposed MHAT which exploited knowledge distillation to extract the update information of heterogenous local models and trained an auxiliary model on the server to realize information aggregation. Aiming at tackling data and systems heterogeneities, Li et al. ^[15] introduced FedProx which added a proximal term: $\frac{\mu}{2}\left\Vert \theta-\theta^t\right\Vert_2^2$ to the local objective function and allowed each participating client to perform a variable amount of work. The term restricts the training parameter to be close to the initial global parameters to some extent, where the scale parameter $\mu$ controls the degree of restriction. In addition, Li et al. ^[16] proposed a similar reparameterization idea inspired by contrastive learning. More recently, Mendieta et al. ^[17] did not focus on reparameterization tricks but utilized regularization techniques instead. There are other works that introduced Bayesian inference into federated optimization aiming to improve the aggregation algorithm under the condition of heterogenous data. Maruan et al. ^[18] analyzed FL problem through inferring a global posterior distribution by having each client infer the posterior of their local data, noting that the round to start local posterior sampling and the shrinkage coefficient $\rho$ should be carefully tuned. Liu et al. ^[5] proposed a novel FL framework that used online Laplace approximation to approximate posteriors on both the client and server side. They utilized the diagonal form of Fisher information matrix to approximate the local covariance matrix, which effectively reduced the computational complexity. However, the methods above only consider addressing partial types of heterogeneities.

Shen et al. ^[9] proposed a novel paradigm for FL, named Federated Mutual Learning (FML), in response to DOM (data, objective, and model). FML trained a personalized local model on the client side and a generalized global model on the server side, which solved the objective heterogeneity well. Hence, each client is entitled to determine the size and type of the model, no longer dependent on the server or the consensus of clients. Moreover, the Non-IIDness of data leads to improvement of personalized model. However, the accuracy of the global model decreases to some extent by directly averaging the clients' parameters when the data is distributed in a heterogenous manner. Based on this, we design and implement a new FL framework, named Heterogenous Bayesian Federated Learning (HBFL), aiming to cope with all these four heterogeneities. By formulating FL as a Bayesian inference problem, we find a better solution for the global model, contributing to improvement on the learning process of the client personalized model. What's more, knowledge distillation on the client side can find a more steady (robust) extremum and allow each client to design customized local model. Our work is the first to integrate both knowledge distillation and Bayesian inference into FL, addressing this multi-dimensional heterogeneity problem. The contributions of this paper are summarized as follows:

● We propose a novel FL scheme from the probabilistic perspective in response to these four heterogeneities, which broadens the way of analysing FL, not just optimization techniques.

● We express the objective function of the classification problem in the form of likelihood function and temper the likelihood part of the posterior which significantly improves the performance of the global model and local model. Moreover, we analyze why the tempered likelihood function can be approximated by a scaled multi-dimensional Gaussian PDF.

● Our proposed method is efficient in computation and communication. We show that the communicational and computational complexity of our framework are the same as FedAvg through theoretical analysis.

● Finally, we conduct various experiments to verify the effectiveness of our scheme, including investigating its performance on different hyperparameters and different datasets. We also compare with FML and other methods and present detailed analysis for our experiments.

The reminder of this paper is organized as follows. Section 2 describes some necessary concepts for our framework. Section 3 describes how our approach is conceived and defines a new global objective function, while Section 4 introduces the proposed framework in details. Finally, we report and analyze the experimental results in Section 5 and draw our conclusions in Section 6.

2. Preliminaries

2.1. Federated learning

FL enables participating clients to jointly train a centralized global model by only communicating the model parameters to the server, which is an ingenious technique to protect privacy. Eventually each client can obtain a global model that performs better than the one trained on its own data. The Federated Averaging (FedAvg) ^[6] algorithm provides a general perspective of the original FL process.

In the original FL scenario, there are $N$ clients collaborating on training a model whose architecture is dependent on the center server or the consensus of clients. At the beginning of the training, the server initializes the model parameters $\theta$ and distributes them to all participating clients. Client $k$ ( $k = 1, 2, ..., N$ ) owns a private dataset $D_k$ containing $n_k$ samples $X_k : = \left\{x_{i}\right\}_{i = 1}^{n_k}$ whose corresponding label set is $Y_k : = \left\{y_{i}\right\}_{i = 1}^{n_k}$ . Then each client uses local data to update the parameters by gradient descent optimization algorithm for several epochs:

$\begin{align} \begin{split} &F_k(\theta_k) = \frac{1}{n_k} \sum\limits_{i = 1}^{n_k}f_i(\theta_k),\\ &\theta_k\leftarrow\theta_k-\alpha\nabla F_k(\theta_k) , \end{split} \end{align}$

(2.1)

where $f_i(\theta_k)$ is the sample loss function, $F_k(\theta_k)$ is the local objective function on the dataset $D_k$ with respect to $\theta_k$ , $\alpha$ is the learning rate, and $\nabla F_k(\theta_k)$ is the gradient of $F_k(\theta_k)$ . After a period of local updates, the parameters are communicated to the server for aggregating:

$\begin{equation} \theta_{global} = \sum\limits_{k = 1}^{N} \frac{n_k}{n} \theta_k, \end{equation}$

(2.2)

where $\theta_{global}$ is global model, and $n: = \sum\limits_{k = 1}^{N}{n_k}$ . The process above circulates until the global model converges or the pre-configured communication rounds expire. However, the trained global model is identical and known to each other among the clients. It's urgent to come up with a new idea to allow each client to design personalized model in the practical FL setting. Knowledge distillation emerges as the time requires.

2.2. Knowledge distillation

Knowledge distillation was first proposed in ^[19], aiming to transfer the knowledge from a complex model or multiple models (as teacher model) to another lightweight model (as student model) without significant loss of performance. For a data sample, its hard label is the ground truth label and the soft label is denoted as $q$ whose $i$ -th component $q_i$ is

$\begin{equation} \frac{\exp\left(\frac{z_i}{T}\right)}{\sum\limits_j \exp \left(\frac{z_j}{T}\right)}, \end{equation}$

(2.3)

where $z_i$ represents the value corresponding to the $i$ -th category output of the last layer by the teacher model or the student model, and $T$ is the distillation temperature which is the hyperparameter greater than 0. It is obvious that the soft label is smoother predictive probability distribution than the hard label.

Unlike general data training, the student model not only fits the hard label, but also learns the knowledge conveyed by the soft label from the teacher model. In this way, it can learn which category the sample belongs to and which categories are more similar to the belonging category. In addition to fitting the hard label, the another goal is to make the student soft label $q_{student}$ close to the teacher soft label $q_{teacher}$ , that is, to minimize the Kullback-Leibler (KL) divergence between $q_{teacher}$ and $q_{student}$ (KL divergence describes the distance between probability distributions). Hence, our objective function only needs to add one term to the original one:

$\begin{equation} L = \alpha L_{CE}+(1-\alpha)D_{KL}\left(q_{teacher}\Vert q_{student}\right), \end{equation}$

(2.4)

where $\alpha$ controls the proportion of knowledge derived from the data and the teacher model. $L_{CE}$ is denoted as the cross entropy of the client's datasets and $L_{CE}(p, q) = -\sum\limits_{i = 1}^{m}{p_{i}\log(q_{i})}$ , where $m$ is dimension of vector $p$ and vector $q$ . Besides, $p$ is the hard label (one-hot encoding) of the sample and $q$ is the soft label obtained by the local model. $D_{KL}$ is denoted as the KL divergence of the client's datasets and $D_{KL}(p||q) = \sum\limits_{i = 1}^{m}{p_{i}\log(\frac{p_{i}}{q_{i}})}$ , where $p$ is the distribution that is approximated by the $q$ . Besides, $p$ is the soft label obtained by the teacher model and $q$ is the soft label obtained by the student model. In fact, the cross entropy of the client's datasets can be deemed as a special case of the KL divergence, for $\sum\limits_{i = 1}^{m}{p_{i}\log{p_{i}}}$ is a constant. A KL divergence term is added to the original cross entropy term, which is like adding a regularization term to avoid overfitting. The goal of knowledge distillation is to make the student model learn the generalization ability of the teacher's model. In theory, the result will be better than that simply by fitting the training data.

As a one-way transfer, knowledge distillation works only if there is a well-trained teacher model. Existing related papers improve the knowledge distillation in the FL scenario. A new strategy is presented in ^[20], named deep mutual learning. There is an ensemble of students learning collaboratively and teaching each other throughout the training process. If the deep mutual learning is introduced into FL when each participating client is performing the local update, the global model and personalized model will learn from each other. A novel federated learning paradigm is presented in ^[9], named Federated Mutual Learning (FML). There are two models that need to be trained on the client side, including the meme model and the local model, where the meme model inherits from the previous global model and the local model remains on the client continuously. The objective functions of two models at the stage of local update are given by

$\begin{align} \begin{split} &L_{local} = \alpha L_{CE}^{local}+(1-\alpha)D_{KL}\left(q_{meme}\Vert q_{local}\right),\\ &L_{meme} = \beta L_{CE}^{meme}+(1-\beta)D_{KL}\left(q_{local}\Vert q_{meme}\right) . \end{split} \end{align}$

(2.5)

As we can see, the direction of knowledge transfer between meme model and local model is two-way. The most important thing is that they can have different architectures, so each client is allowed to design the customized model. However, similarly to FedAvg, directly averaging the local parameters can create problems like aggregation error and local forgetting ^[5] when the data among clients is heterogenous. So it's urgent to propose a new aggregation method to address these issues.

3. A Bayesian inference perspective on federated learning

A nature question arises that how to aggregate parameters on the server side to ensure the accuracy of the global model in the case of data heterogeneity. To answer the question, this section first introduces a new idea to analyze FL from the perspective of Bayesian inference and derives a new objective function applicable to our framework. Finally, we propose a novel aggregation method.

3.1. Rethinking objective function

From the perspective of optimization, the goal of FL is typically formulated as follows:

$\begin{align} \begin{split} &\underset{\theta}{min}\,F(\theta): = \sum\limits_{k = 1}^{N}\frac{n_k}{n} F_k(\theta),\\ &F_k(\theta): = \frac{1}{n_k}\sum\limits_{i = 1}^{n_k}f_i(\theta) , \end{split} \end{align}$

(3.1)

where $\theta\in \mathbb{R}^d$ . First, we consider the linear regression problem, where the sample objective function $f_i(\theta) = \frac{1}{2}\left\Vert \theta^T x_i -y_i \right\Vert _2^2$ . Hence, the global objective function $F(\theta) = \frac{1}{2n}\sum\limits_{i = 1}^{n}\left\Vert \theta^T x_i -y_i \right\Vert _2^2$ . In this way, FL is formulated as an optimization problem. Now we treat FL from the view of statistics. Supposing $y_i = \theta^T x_i+\epsilon_i$ , where $\epsilon_i$ subjects to the Gaussian distribution $N(0, \sigma)$ . We can easily prove that the global objective is to minimize $-\frac{1}{n}\ln{p(Y\vert X, \theta)}$ , where $Y: = \left\{y_{i}\right\}_{i = 1}^{n}$ , and $X: = \left\{x_{i}\right\}_{i = 1}^{n}$ . Since

$\begin{align} \begin{split} p(D\vert\theta)& = p(X\vert\theta)p(Y\vert X,\theta)\\ & = p(X)p(Y\vert X,\theta), \end{split} \end{align}$

(3.2)

where $D: = \left\{(x_{i}, y_{i})\right\}_{i = 1}^{n} = (X, Y)$ , the $p(X)$ is not related to $\theta$ and $p(D\vert\theta)$ is a likelihood function for the local data $D$ parameterized by $\theta$ . So the global objective can be rewritten as $\underset{\theta}{min}\, -\frac{1}{n}\ln{p(D\vert\theta)}$ . We can further express it as $\underset{\theta}{min}\, -\frac{1}{n}p(D\vert\theta)$ , where the likelihood part can be denoted as a scaled multi-dimensional Gaussian PDF with respect to $\theta$ . We prove the conclusion in Supplementary A.

Similarly to the linear regression problem, the classification problem can also express the global objective as $\underset{\theta}{min}\, -\ln{p(D\vert\theta)}$ . Hopefully, the likelihood function $p(D\vert\theta)$ can be also approximated by a scaled multi-dimensional Gaussian PDF. In fact, this is predictive. According to Bayes rule,

$\begin{equation} p(D\vert\theta) = \frac{p(\theta\vert D)\int p(D\vert\theta)p(\theta)\, d\theta}{p(\theta)}, \end{equation}$

(3.3)

the posterior probability is commonly assumed to follow multi-dimensional Gaussian distribution in variational Bayes for Bayesian neural networks ^[21,22]. Additionally, in ^[23], the authors put forward to a systematic way to improve the Bayes posterior in many models by tempering its likelihood part. That is

$\begin{equation} p(\theta\vert D) = \frac{p(D\vert\theta)^{\frac{1}{T}}p(\theta)}{\int p(D\vert\theta)^{\frac{1}{T}} p(\theta)\, d\theta}, \end{equation}$

(3.4)

where $T$ is the temperature which is much less than 1. Hence,

$\begin{equation} p(D\vert\theta)^{\frac{1}{T}} = \frac{p(\theta\vert D)\int p(D\vert\theta)^{\frac{1}{T}}p(\theta)\, d\theta}{p(\theta)}. \end{equation}$

(3.5)

Then, we rewrite the global objective as

$\begin{align} \underset{\theta}{max}\,p(D\vert\theta)^{\frac{1}{T}} = \underset{\theta}{max}\,\frac{p(\theta\vert D)}{p(\theta)}\int p(D\vert\theta)^{\frac{1}{T}}p(\theta)\, d\theta. \end{align}$

(3.6)

Both the posterior and the prior probability can be supposed to subject to multi-dimensional Gaussian distribution whose covariance matrix is diagonal matrix:

$\begin{align} \begin{split} &p(\theta\vert D) = p_1(\theta)\approx N(\theta\vert\mu_1,\Sigma_1) = \exp{\left[\zeta_1+\eta_1^T\theta-\frac{1}{2}\theta^T\Lambda_1\theta\right]},\\ &p(\theta) = p_2(\theta)\approx N(\theta\vert\mu_2,\Sigma_2) = \exp{\left[\zeta_2+\eta_2^T\theta-\frac{1}{2}\theta^T\Lambda_2\theta\right]} , \end{split} \end{align}$

(3.7)

where $\Lambda_i = \Sigma_i^{-1}$ , $\eta_i = \Lambda_i\mu_i$ and $\zeta_i = -\frac{1}{2}(d\log2\pi-\log\lvert\Lambda_i\rvert+\eta_i^T\Lambda_i^{-1}\eta_i)$ , $i = 1, 2$ . So,

$\begin{equation} \frac{p_1(\theta)}{p_2(\theta)}\approx\rho\exp{\left[\zeta+\eta^T\theta-\frac{1}{2}\theta^T\Lambda\theta\right]}, \end{equation}$

(3.8)

where $\Lambda = \Lambda_1-\Lambda_2$ , $\eta = \eta_1-\eta_2$ , $\zeta = -\frac{1}{2}\left(d\log2\pi-\log\lvert\Lambda\rvert+\eta^T\Lambda^{-1}\eta\right)$ and $\rho = \exp{\left[\zeta_1-\zeta_2-\zeta\right]}$ , noting that $\rho$ is not related to $\theta$ . Due to the fact that the variance of each parameter following the posterior probability is generally less than that following the prior probability, the diagonal elements of $\Lambda_1$ are all greater than those of $\Lambda_2$ . Hence, $\Lambda$ is a diagonal matrix whose diagonal elements are all positive, which is same to $\Sigma$ . In conclusion, the posterior probability divided by the prior probability can obtain a scaled Gaussian PDF. Moreover, $\int p(D\vert\theta)^{\frac{1}{T}}\times p(\theta)\, d\theta$ is not related to $\theta$ . Therefore, it is reasonable to approximate the global objective $p(D\vert\theta)^\frac{1}{T}$ by a scaled multi-dimensional Gaussian PDF:

$\begin{equation} p(D\vert\theta)^{\frac{1}{T}}\approx \gamma N(\theta\vert\mu, \Sigma), \end{equation}$

(3.9)

where $\gamma$ is a constant. Finally, after the above theoretical analysis, we can approximate the global objective as $\underset{\theta}{max}\, N(\theta\vert\mu, \Sigma)$ . Hence, the extremum of global objective can be approached by the mode of the Gaussian distribution (i.e., $\mu$ ).

3.2. Bayesian aggregation method

The global objective function $p(D\vert\theta)^{\frac{1}{T}}$ can be obviously decomposed into a product of $N$ clients' objective functions:

$\begin{equation} p\left(D = \mathop{\cup}\limits_{k = 1}^{N}D_k\vert\theta\right)^\frac{1}{T} = \prod\limits_{k = 1}^{N}p(D_k\vert\theta)^{\frac{1}{T}}. \end{equation}$

(3.10)

In the same way, each $p(D_k\vert\theta)^{\frac{1}{T}}$ can be also approximated by a scaled multi-dimensional Gaussian PDF. So,

$\begin{equation} p(D\vert\theta)^{\frac{1}{T}}\approx\prod\limits_{k = 1}^{N}\gamma_k N(\theta\vert\mu_k,\Sigma_k), \end{equation}$

(3.11)

where $\gamma_k$ is a constant.

It is easy to prove that a product of $N$ Gaussian PDFs leads to the scaled Gaussian PDF, which is consistent with (3.9). At the same time, the covariance matrix is $\left(\sum\limits_{k = 1}^{N}\Sigma_k^{-1}\right)^{-1}$ . It is worth noting that the variance of each parameter following the global Gaussian distribution decreases with the number of clients increasing. We expect it to be of the same order of magnitude as the local one. Naturally, we assume the approximation of the global objective is rewritten as

$\begin{equation} \underset{\theta}{max}\,\prod\limits_{k = 1}^{N}p(D_k\vert\theta)^{\frac{q_k}{T}}, \end{equation}$

(3.12)

where $q_k$ is set as the proportion of the amount of data owned by the client k. This is identical to the view proposed in ^[5] that a product of local posteriors is used to approximate global posterior. Consequently, the global $\Sigma$ is $\left(\sum\limits_{k = 1}^{N}q_k\Sigma_k^{-1}\right)^{-1}$ and the global $\mu$ is $\left(\sum\limits_{k = 1}^{N}q_k\Sigma_k^{-1}\right)^{-1}\left(\sum\limits_{k = 1}^{N}q_k\Sigma_k^{-1}\mu_k\right)$ .

So the extremum of global objective function can be easily obtained by the local means and local covariances uploaded by the clients. Importantly, this also means that FedAvg can be viewed as our approach that each local covariance matrix is the same and, as a result, obtains the global extremum $\sum\limits_{k = 1}^{N}q_k\mu_k$ , where $\mu_k$ is both the mean vector of approximated multi-dimensional Gaussian distribution and the extremum of the local objective function.

We start from the global objective function and then introduce Bayes rule to analyze it. It's a novel approach to use a scaled multi-dimensional Gaussian distribution to fit the tempered likelihood function. We will prove the feasibility of our aggregation method in theory below and in Section 5 prove it in practice with experiments.

3.3. Feasibility in theory

We take the one-dimensional linear regression problem in a two clients' federation as an example. It has been proved that the local objective function of each client can be expressed as a scaled one-dimensional Gaussian PDF in this case. Hence, the global objective function can be rewritten as a product of two Gaussian PDFs, which is still a scaled Gaussian PDF. For simplicity, we exploit normalized Gaussian distributions to represent the local and global objective functions respectively in Figure 1. From the perspective of aggregation on the server, it is reasonable to take a weighted element-wise average on the parameters of all local models if the data among clients follows the same distribution. In the classical FL framework, the local objective function

$\begin{align} E_{(x,y)\sim \mathcal{P}_k(x,y)}\left [f((x,y);\theta)\right ] = \int_{(x,y)\sim \mathcal{P}_k(x,y)}f((x,y);\theta)\mathcal{P}_k(x,y)d(x,y) \end{align}$

Figure 1. The objective function curve.

DownLoad: Full-Size Img PowerPoint

is approximated by $\frac{1}{n_k}\mathop{\sum}\limits_{i = 1}^{n_k}f_i((x_i, y_i); \theta)$ . With the same distribution, each client has the same objective function which is identical to the global objective. Hence, they have the same extremum. However, the weighted average of the solution of each local objective obviously deviates from the solution of the global objective when the data among clients is Non-IID, due to the distinct local objective functions and local extremums. As shown in Figure 1, the modes of two Gaussian distributions are different in the case of heterogeneous data. In FedAvg, each client aims to find the mode of its own Gaussian distribution and Figure 1 shows that there is a high probability that the weighted average of two local modes will deviate from the global mode. Unlike the previous method, our Bayesian aggregation treats a product of the Gaussian distribution as the global objective function. The extremum of the local objective function does not require to be the same, instead it is necessary to calculate the mean vector and the covariance matrix approximated by the respective likelihood function to get the global mode, which can be viewed as the extremum of the global objective function. Therefore, we don't have to worry about data being heterogenous. Additionally, from the perspective of training on the client, our approach also has the significant advantage over FedAvg. In details, after receiving the global aggregated model, each client takes it as the initialization for further local training. In FedAvg, a larger number of local epochs may lead each device towards the extremum of its local objective as opposed to the global objective in the case of heterogenous data, which potentially hurts convergence or even causes the method to diverge ^[5]. In our approach, the deviation from the global extremum on the client side is no longer a defect but an advantage that the client can calculate the personal mean and the personal covariance more accurately.

4. Methods

4.1. The framework for heterogenous federated learning

Inspired by Bayesian Neural Network, we adopt KL divergence to measure difference between two probability distributions on the client side. Due to the fact that each $p(D_k\vert\theta)^{\frac{1}{T}}$ can be approximated by a scaled multi-dimensional Gaussian PDF, our goal is to minimize the KL divergence between two normalized distributions, which is defined as

$\begin{equation} \mathop{\mathrm{min}}\limits_{\mu_k,\Sigma_k}{D_{KL}^{meme}}: = {D_{KL}\left(N(\theta\vert\mu_k,\Sigma_k)\Vert\frac{1}{\gamma_k}p(D_k\vert\theta)^{\frac{1}{T}}\right)}, \end{equation}$

(4.1)

where $p(D_k\vert\theta) = p(X_k)p(Y_k\vert X_k, \theta)$ as in (3.2), $\gamma_k$ is the normalization factor and $p(X_k)$ , $\gamma_k$ are independent of $\theta$ .

We adopt gradient descent optimization algorithm to minimize the goal on the client side. In each round, after several epochs of training, the client gets final iteration results $\mu_k^t$ and $\sigma_k^t$ which are the training parameters of the meme model. Additionally, the personalized local model $\theta_{local}^t$ is updated based on knowledge distillation. In order to allow each client to design its own customized model, we introduce knowledge distillation into our framework. Specifically, each local model is trained with two losses: a conventional cross entropy loss and a KL divergence loss that aligns each student's class posterior with the class probabilities of the teacher. On account of the fact that the local $\mu_k^t$ can be considered as the extremum of the local objective, the meme model, as a teacher model, provides $q_{meme}$ from the local $\mu_k^t$ to the client's local model which plays the role of the student model. Through the meme model, the local model obtains the knowledge from the aggregated global model. The loss function of local model

$\begin{equation} L_{local} = \alpha L_{CE}^{local}+(1-\alpha)D_{KL}\left(q_{meme}\Vert q_{local}\right). \end{equation}$

(4.2)

Previous aggregation algorithms mainly relied on weighted average of the information uploaded by the clients to obtain global information. We consider this choice is not optimal, especially when the data among client are generated in a non-identically distributed fashion. Instead, we propose to calculate the mean of a product of multi-dimensional Gaussian distribution to aggregate information. Our scheme aims to make the aggregation result more accurate and the model convergence faster while allowing clients to independently design their model architectures. Consequently, we design two models on the client side. The meme model includes $\mu_k^t$ and $\sigma_k^t$ , where $\mu_k^t$ inherits from the previous global model, whose architecture is determined by the server. While, the local model $\theta_{local}^t$ can be designed flexibly by its client. The process of Bayesian federated learning framework is shown in , where each client uses the local datasets to train the meme model and local model respectively. Concretely, the meme model is trained by minimizing $D_{KL}^{meme}$ . While, the local model exploits the soft labels the meme model provides for knowledge distillation, so it is trained by minimizing weighting of $D_{KL}(q_{meme}||q_{local})$ and $L_{CE}^{local}$ . It is worth noting that each participant is able to design the customized local model, so all the models among participants can be different. At the same time, the client doesn't need to share the local model, which is a good approach to protect its privacy.

Figure 2. The whole process of our framework for

$N$ clients.

DownLoad: Full-Size Img PowerPoint

The overview of our proposed algorithm is shown in . The entire interaction process includes four steps. First, the server initializes the global $\mu^t$ and distributes it to all participating clients. Second, each client forks the global $\mu^t$ as the $\mu_k^t$ of its meme model which is a Bayesian neural network and treats the global $\mu^t$ as the initial value of the local $\mu_k^t$ and initializes the local $\Sigma_k^t$ . Then, the client updates the parameters of meme model by gradient descent optimization algorithm for several epochs to minimize the KL divergence. Meanwhile, the client also trains the parameter $\theta_{local}^t$ of the personalized local model through knowledge distillation. Third, each participant uploads the interactive information to the server. Finally, the global $\mu^{t+1}$ can be calculated by the aggregation of each client's information on the server side. We return to the first step to repeat the whole process until the global model converges or pre-configured communication rounds expire.

Algorithm 1 Heterogenous Bayesian Federated Learning (HBFL)
$\theta$ : the model parameters
$\mu$ : the mean vector of multi-dimensional Gaussian distribution
$\Sigma$ : the covariance matrix of multi-dimensional Gaussian distribution
$q_k$ : the proportion of the amount of data owned by the client $k$
Server executes:
1: Initialize the global model $\theta_{global}^{t}\, (\mu_{global}^t$ )
2: for each round $t=1, 2...$ do
3: for each client $k$ from $1$ to $N$ in parallel do
4: $\mu_k^{t+1}, \Sigma_k^{t+1}=\,$ ClientUpdate $(\mu_k^t, \Sigma_k^t)$
5: $\theta_{global}^{t+1}=\mu_{global}^{t+1}=\left(\sum\limits_{k=1}^{N}q_k {\left(\Sigma_{k}^{t+1}\right)}^{-1}\right)^{-1}\left(\sum\limits_{k=1}^{N}q_k{\left(\Sigma_{k}^{t+1}\right)}^{-1}\mu_k^{t+1}\right)$
ClientUpdate:
1: Initialize the local model $\theta_{local}^t$ and $\Sigma_k^t$
2: Fork the $\theta_{global}^{t}\, (\mu_{global}^t)$ as its $\mu_k^t$
3: for each each local epoch $e$ from $1$ to $E$ do
4: $B=$ split $D_k$ into mini $-$ batches
5: for b in B do
6: Update $\mu_k^t$ , $\Sigma_k^t$ by minimizing $D_{KL}^{meme}$ (4.1)
7: Update the predictions $q_{meme}$ by (2.3) for the current mini-batch
8: Update $\theta_{local}^t$ by minimizing $L_{local}$ (4.2)
9: Return $\mu_k^{t+1}$ , $\Sigma_k^{t+1}$ to the server

| Show Table

DownLoad: CSV

From the perspective of the server, the mean of a product of multi-dimensional Gaussian distribution can be calculated by aggregating information uploaded by the participants, where the global $\mu$ can be also regarded as the global objective extremum. Hence, the global $\mu$ is a generalized model that fits the joint distribution $\mathcal{P}_{joint}(x, y)$ over all data among the participants. From the perspective of the client, in each communication round the personalized local model keeps training over the client's private data and distilling knowledge from the meme model. On the one hand, the local model fits the personalized distribution $\mathcal{P}_{k}(x, y)$ over private data. On the other hand, it also learns how to generalize to new samples. Additionally, we allow for partial work to be performed across client to deal with systems heterogeneity in Section 5.3.3. In conclusion, our framework can handle all these four heterogeneities discussed in Section 1.

4.2. Algorithm complexity and privacy security analysis

4.2.1. Algorithm complexity

In our algorithm, the client needs to upload $\mu_k^t$ and $\Sigma_k^t$ in each communication round. Since $\Sigma_k^t$ is a diagonal matrix, the communication complexity is $O(d)$ , which is the same communicational complexity as FedAvg.

Although our means requires more computation steps, it also has the same computational complexity as FedAvg. On the client side, the back-propagation algorithm is used to train the neural network in standard FedAvg, where we can easily conclude that the computational cost is $O(d)$ . During local training, our algorithm needs to train a meme model and a local model, where the former complexity is $O(2\times d)$ and the latter complexity is $O(d)$ . Therefore, the overall computational complexity of our method on the client side is $O(3\times d)$ . On the server side, FedAvg takes a weighted element-wise average on $N$ neural networks to aggregate local models, which has a complexity of $O(N\times d)$ . While, in our algorithm, the server first needs to take the inverse of each $\Sigma_k^t$ . Due to the assumption that $\Sigma_k^t$ is a diagonal matrix, the above operation requires a complexity of $O(N\times d)$ . Then, it calculates each product of the inverse of $\Sigma_k^t$ and the corresponding $\mu_k^t$ , which has a complexity of $O(N\times d)$ . A weighted element-wise average on the products requires $O(N\times d)$ complexity. In the same way, a weighted element-wise average on the inverses of $\Sigma_k^t$ requires $O(N\times d)$ complexity. In the end, the multiplication between the inverse of weighted matrix and the weighted vector requires a complexity of $O(2\times d)$ , for the inverse of weighted matrix is also diagonal. These operations require a complexity of $O(4N\times d+5\times d)$ in all which is equivalent to $O(d)$ (i.e., same as FedAvg), for N is much smaller than d.

In summary, the communicational complexity and the computational complexity of our framework are the same as FedAvg.

4.2.2. Privacy security

Our proposed framework, like many existing FL methods, requires to communicate the models between the server and each client, resulting in the risk of privacy disclosure. There are a number of existing protection mechanisms that can be added to our framework to protect clients from malicious attack. These methods include secure multi-party computation ^[24], homomorphic encryption ^[25] and differential privacy ^[26]. Among them, differential privacy is the most widely used in FL, which is a standard approach to prevent clients' private information from inference attacks. We leave further explorations of this aspect in the future work.

5. Experiments

In this section we evaluate the performance of our algorithm over three commonly used image classification datasets in various settings implemented by PyTorch.

5.1. Experimental setup

We test our framework in a simulated cross-silo federated learning environment. The clients in the cross-silo setting are a small number of organizations (e.g., financial and medical) with reliable communications and abundant computing resources in datacenters ^[27,28]. Compared to being in the cross-device FL setting, the clients are more expected to design their own customized local models and more capable of training a Bayesian neural network and a conventional neural network. Therefore, we focus on cross-silo FL in this paper. There are 10 clients $(N = 10)$ under the orchestration of a central server. Moreover, the total communication rounds $T = 100$ and the number of local epochs $E = 5$ .

5.1.1. Datasets and models

There are three datasets used in our experiments that are widely employed in FL. MNIST ^[29] is a 10-classification handwriting digit dataset, with the number of 0 to 9. It consists of 28 $\times$ 28 gray images which are split into 60000 training and 10000 test images. CIFAR10/100 ^[30] are 10/100 classification datasets respectively, which are 32 $\times$ 32 3-channel color images. CIFAR10 consists of 50000 training and 10000 test pictures in 10 classes. While, CIFAR100 has the same number of pictures as CIFAR10, but it can be divided into 100 classes. The models we apply are similar to ^[9]. We apply four types of models in our experiments: multi-layer perceptron (MLP) ^[6], LeNet5 ^[31], a convolutional neural networks (CNN1) with two 5 $\times$ 5 convolution layers (the first with 6 channels, the second with 16 channels, each followed with 2 $\times$ 2 max pooling and ReLu activation) and two FC layers, and a convolutional neural networks (CNN2) with three 3 $\times$ 3 convolution layers (the first with 32 channels followed with 2 $\times$ 2 max pooling and ReLu activation, the second with 64 channels followed with 2 $\times$ 2 max pooling and Relu activation, the third with 64 channels followed with Relu activation) and two FC layers. The optimizer of the conventional neural network we choose is the SGD algorithm, with learning rate = 0.01, momentum = 0.9, weight_decay = $5\times10^{-4}$ and batchsize = 128. While the optimizer of the Bayesian neural network we choose is the ADAM algorithm, with learning rate = 0.001, weight_decay = $5\times10^{-4}$ and batchsize = 128. We don't decay the learning rate through all rounds. Additionally, we conduct experiments with extra hyperparameter: the low temperature $T$ . As in ^[32], we use $T$ equal to $10^{-4}$ for our experiments. In our proposed framework, the global model and the local model are conventional neural networks, while the meme model that inherits the parameters of previous global model as initial $\mu_k^t$ is Bayesian neural network. Moreover, the trained $\mu_k^t$ of the meme model and the trained $\theta_{local}^t$ of the local model are used to infer test sets and we record the best accuracy between them, which is same to FML, except that the meme model in FML is a conventional neural network.

5.1.2. Heterogenous distribution of data

Since we assume that 10 clients are in the network, we divide the dataset into 10 parts for 10 clients in IID and Non-IID settings. Additionally, we use Dirichlet distribution as in ^[33,34] to create disjoint Non-IID client training and test data. The value of $\alpha$ controls the degree of Non-IIDness: the smaller $\alpha$ is, the more likely the clients are only possession of examples from one class and we use $\alpha$ = 100 to simulate IID setting. It is worth noting that we use the same sampling of Dirichlet distribution for the client's training and test dataset. We conduct experiments with $\alpha$ = 1, 0.1, 0.05 to analyze our algorithm performance in the case of heterogenous data.

5.1.3. Baselines

We compare our algorithm with other ones, including ABAVG ^[13], FedAvg ^[6], FedProx ^[15], and FML ^[9]. Among them, ABAVG, FedAvg and FedProx require that each client's local model is identical. While, FML allows each client to design the customized model. Moreover, FedProx adds a proximal term to mitigate the oscillation of the global model accuracy curve but fails to reach a high accuracy compared to FedAvg. The following experiments are conducted on the basis that all clients are sampled in each round.

5.2. Evaluation on the common federated learning settings

We first assume that each client's model has the same architecture, which is the most general case that all clients just intent to train a shared global model. We apply FedAvg, FedProx and FML as three baselines and compare the performance with our algorithms in both IID and Non-IID manners. We put up four types of models (MLP, LeNet5, CNN1, CNN2) over three datasets (MNIST, CIFAR10, CIFAR100) in four data settings ( $\alpha$ = 100, 1, 0.1, 0.05). Since all clients in this case aim to acquire a shared global model, we only measure the accuracy of global model. We show the global accuracy curves in Figure 3 with different FL frameworks, including our proposed method, ABAVG, FedAvg, FedProx and FML.

Figure 3. Global accuracy curves with different levels of data heterogeneity on four datasets.

DownLoad: Full-Size Img PowerPoint

We can see from the that with $\alpha$ increasing (i.e., the heterogeneity degree of data enlarging), the accuracy of global model becomes decreasing. However, our algorithm still performs well. Specifically, in most of settings, the global accuracy of our method is higher in the initial phase and converges faster compared with baselines. What's more, the improvement of our algorithm over FML is visually apparently. In the Non-IID setting, the method we proposed gains a more stable curve compared with ABAVG, FedAvg and a more accurate curve compared with FedProx. Given all of that, our algorithm possesses the benefits of other three baselines. It is worth noting that our method performs better on CIFAR10 and MNIST compared with CIFAR100. The reason is that it is more effective to fit the likelihood function with Gaussian PDF on less classified data (small-scale data).

5.3. Federated learning on heterogenous environments (model, objective, systems heterogeneity)

In the following experiments, we set the Non-IID degree $\alpha$ to be 100 and 0.05 respectively to represent IID and Non-IID settings. The data heterogeneity lies in each experiment, for we set IID and Non-IID environments. The model, objective and systems heterogeneities in FL are common in a realistic scenario. The client wishes to design customized model according to the requirement, which leads to different architectures of models among all clients. Besides, the server tends to train a generalized model that fits all data distribution, while the client tends to train a personalized model that only fits its own data distribution. At the same time, if the client desires to obtain a generalized model, it can also receive from the server. Moreover, there also exists significant variability in terms of the systems characteristics on each device.

5.3.1. Model heterogeneity

Aiming at handling the heterogeneity problem, knowledge distillation is introduced to our framework. We set CNN2 as the global model and 2 $\times$ MLP, 2 $\times$ LeNet5, 3 $\times$ CNN1 and 3 $\times$ CNN2 as the local models. Our experiments are conducted on CIFAR10 and CIFAR100 with four neural architectures in the IID and Non-IID settings. We test the performance of the product of FL (the personalized local model and generalized global model) on validate set, presented in Figures 4–6 and exhibit the best Top-1 test accuracy in Tables 1–3.

Figure 4. Global accuracy curves in IID and Non-IID settings on CIFAR10/100.

DownLoad: Full-Size Img PowerPoint

Figure 5. No.1 to No.5 client accuracy curves in IID and Non-IID settings on CIFAR10/100.

DownLoad: Full-Size Img PowerPoint

Figure 6. No.6 to No.10 client accuracy curves in IID and Non-IID settings on CIFAR10/100.

DownLoad: Full-Size Img PowerPoint

Table 1. Top-1 test accuracy of global model in IID and Non-IID settings on CIFAR10/100.

Datasets	Settings	FML	Ours
CIFAR10	IID	80.71	82.10
	Non-IID	51.26	62.69
CIFAR100	IID	47.96	51.19
	Non-IID	38.95	40.01

| Show Table

DownLoad: CSV

Table 2. Top-1 test accuracy of No.1 to No.5 client models in IID and Non-IID settings on CIFAR10/100.

Datasets	Settings	Client1		Client2		Client3		Client4		Client5
		FML	Ours	FML	Ours	FML	Ours	FML	Ours	FML	Ours
CIFAR10	IID	80.16	80.88	78.91	79.30	80.40	81.21	78.98	80.30	80.83	83.67
	Non-IID	98.19	99.16	99.72	100	99.81	99.81	100	100	98.64	98.93
CIFAR100	IID	45.03	47.96	43.33	47.74	45.51	48.74	43.61	47.77	43.34	45.73
	Non-IID	77.47	80.10	60.20	62.97	74.13	74.51	80.41	79.38	72.98	74.93

| Show Table

DownLoad: CSV

Table 3. Top-1 test accuracy of No.6 to No.10 client models in IID and Non-IID settings on CIFAR10/100.

Datasets	Settings	Client6		Client7		Client8		Client9		Client10
		FML	Ours	FML	Ours	FML	Ours	FML	Ours	FML	Ours
CIFAR10	IID	77.78	80.90	79.39	80.81	79.52	81.82	78.74	81.44	78.55	79.36
	Non-IID	91.19	93.27	94.57	94.28	94.14	95.05	97.25	97.45	91.89	92.75
CIFAR100	IID	45.62	48.31	43.61	45.39	44.42	46.33	42.46	49.25	44.53	47.24
	Non-IID	70.87	72.45	79.27	78.27	80.13	80.75	71.43	71.88	66.55	66.31

| Show Table

DownLoad: CSV

We can see that compared with FML, the global and local curves of our method perform better in term of converge speed, stability and accuracy when encounting the model heterogeneity issue and the best Top-1 test accuracy of our method is higher in most cases.

5.3.2. Objective heterogeneity

Then, as in ^[9], we study the multi-task FL, where clients have different tasks to handle. In our experiment, there are two clients running CIFAR10 and CIFAR100 tasks with LeNet5 and CNN1 respectively which have the same convolution layers. Besides, we use the joint convolution layers as global model. When clients receive the global model, the FC layers are spliced on it as required, dealing with 10/100 classification task. The local curves and the best results are shown in Figure 7 and Table 4.

Figure 7. Client accuracy curves by independent training, FML and our method respectively.

DownLoad: Full-Size Img PowerPoint

Table 4. Top-1 test accuracy of client models by independent training, FML and our method respectively.

Method	Client1(CIFAR10)	Client2(CIFAR100)
Independent	73.99	35.16
FML	73.94	37.12
Ours	75.20	40.13

| Show Table

DownLoad: CSV

We can observe that our algorithm benefits all clients who have different tasks as opposed to running the task independently and clearly outperforms FML.

5.3.3. Systems heterogeneity

Finally, we simulate FL settings with systems heterogeneity. As ^[15] described, we allow for partial solutions to be sent to deal with systems heterogeneity. In particular, we fix the number of local epochs E to be 5, and force some clients to perform fewer updates than 5 epochs given their current systems constraints. Furthermore, we assign $x$ epochs (chosen uniformly at random between [1, 5]) to 50%, and 90% of the selected devices, respectively for varying heterogenous settings in each round. To make valid comparisons, all methods have the same experimental setup in each round. Figures 8 and 9 show the results.

Figure 8. Global accuracy curves in IID and Non-IID settings on CIFAR10/100, where there are 50% stragglers.

DownLoad: Full-Size Img PowerPoint

Figure 9. Global accuracy curves in IID and Non-IID settings on CIFAR10/100, where there are 90% stragglers.

DownLoad: Full-Size Img PowerPoint

We can observe that our method consistently converge faster than other baselines for various systems heterogeneity. Particularly, our method outperforms FML in all settings. However, there exits a certain gap compared with FedProx when the data is IID. While, our method is superior to FedProx with the Non-IID degree increasing.

6. Conclusions

Motivated by the Bayesian neural network and probability theories, we propose a novel Bayesian aggregation method. In our work, a new algorithm is designed that includes a new training pattern on the client side and a new aggregation strategy on the server side. Since the objective function of the classification problem can be denoted as the form of likelihood, we assume that it can be approximated by a scaled multi-dimensional Gaussian PDF. Therefore, the aggregation process is deemed as a product of scaled Gaussian PDFs. Due to the fact that a product of Gaussian distributions is still a Gaussian distribution, it is easy to evaluate the mode of global distribution based on the parameters communicated from clients. Additionally, in order to get better fitting result, we temper the likelihood part of the posterior which significantly improves the performance of the global model and local model. Moreover, knowledge distillation is introduced to our framework, aiming at dealing the heterogeneity problem. The experiments show that our method can tackle data, model, objective and systems heterogeneities inherent in FL properly. In addition, the convergence is accelerated compared with commonly used baselines. Particularly, our method achieves state-of-the-art results when data is generated in a Non-IID fashion which is common in practical scenario. Finally, we note that the local likelihood can be approximated by other form, considering that the prior probability or the posterior probability doesn't follow Gaussian distribution, which we leave as future work.

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities (22CX03015A, 20CX05012A), the Major Scientific and Technological Projects of CNPC under Grant (ZD2019-183-008) and the National Natural Science Foundation of China (61902429).

Conflict of interest

The authors declare no conflicts of interest.

A. Scaled multi-dimensional Gaussian PDF for the linear regression problem

In this section, we try to prove that the likelihood function for the linear regression problem can be denoted as a scaled multi-dimensional Gaussian PDF. Due to the fact that we have proved $p(D\vert\theta) = p(X)p(Y\vert X, \theta)$ in (3.2), we only need to express the term $p(Y\vert X, \theta)$ in the form of Gaussian PDF with respect to $\theta$ .

For the linear regression problem, according to the assumption that random error follows the Gaussian distribution:

$\begin{align} \begin{split} p(Y\vert X,\theta)& = \prod\limits_{i = 1}^{n}p(y_i\vert x_i,\theta)\\ & = \prod\limits_{i = 1}^{n}\frac{1}{\sqrt{2\pi}\sigma}\exp{\left(-\frac{(y_i-\theta^T x_i)^2}{2\sigma^2}\right)}\\ & = \left(\frac{1}{\sqrt{2\pi}\sigma}\right)^n exp{\left(-\frac{\left\Vert\mathcal{X} \theta-\mathcal{Y}\right\Vert_2^2}{2\sigma^2}\right)}\\ & = \left(\frac{1}{\sqrt{2\pi}\sigma}\right)^n exp{\left(-\frac{\theta^T\mathcal{X}^T\mathcal{X}\theta-2\mathcal{Y}^T\mathcal{X}\theta+\mathcal{Y}^T\mathcal{Y}}{2\sigma^2}\right)}, \end{split} \end{align}$

(A.1)

where $\mathcal{X}\in \mathbb{R}^{n\times d}$ is the design matrix, $\mathcal{Y}\in \mathbb{R}^{n}$ is the response matrix. As in (3.7), the Gaussian PDF can be written in canonical notation as

$\begin{equation} N(\theta\vert\mu,\Sigma) = \exp{\left[\zeta+\eta^T\theta-\frac{1}{2}\theta^T\Lambda\theta\right]}, \end{equation}$

(A.2)

where $\Lambda = \Sigma^{-1}$ , $\eta = \Lambda\mu$ and $\zeta = -\frac{1}{2}\left(d\log2\pi-\log\lvert\Lambda\rvert+\eta^T\Lambda^{-1}\eta\right)$ . So,

$\begin{align} \begin{split} &\exp{\left(\eta^T\theta\right)} = \exp{\left(\frac{\mathcal{Y}^T\mathcal{X}\theta}{\sigma^2}\right)},\\ &\exp{\left(-\frac{1}{2}\theta^T\Lambda\theta\right)} = \exp{\left(-\frac{\theta^T\mathcal{X}^T\mathcal{X}\theta}{2\sigma^2}\right)}. \end{split} \end{align}$

(A.3)

We can conclude from the above:

$\begin{equation} \eta = \frac{\mathcal{X}^T\mathcal{Y}}{\sigma^2}, \end{equation}$

(A.4)

$\begin{equation} \Lambda = \frac{\mathcal{X}^T\mathcal{X}}{\sigma^2}. \end{equation}$

(A.5)

As in (A.1), it is easy to see that there is only a constant term left in $p(Y\vert X, \theta)$ , which is independent of $\theta$ . The left term obviously can be denoted as a multiple of $\exp{\zeta}$ . In conclusion, $p(Y\vert X, \theta)$ can be denoted as a scaled multi-dimensional Gaussian PDF, which is same to $p(D\vert \theta)$ .

Hence, our claim is true.

References

[1]	Kizhakkayil J, Sasikumar B (2011) Diversity, characterization and utilization of ginger: A review. Plant Genet Resour 9: 464–477. https://doi.org/10.1017/S1479262111000670 doi: 10.1017/S1479262111000670
[2]	Ravindran PN, Nirmal Babu K, Shiva KN (2005) Botany and crop improvement of ginger. In: Ravindran PN, Nirmal Babu K (Eds.), Ginger: The Genus Zingiber, CRC Press, New York, 15–85. https://doi.org/10.1201/9781420023367
[3]	Kiyama R (2020) Nutritional implications of ginger: Chemistry, biological activities and signaling pathways. J Nutr Biochem 86: 108486. https://doi.org/10.1016/j.jnutbio.2020.108486 doi: 10.1016/j.jnutbio.2020.108486
[4]	FAOSTAT Database Collections (2024) Food and Agriculture Organization of the United Nations, Rome, Italy. Available from: http://www.fao.org/faostat/en/#data/QC.
[5]	Nair KP (2019) Production, marketing, and economics of ginger. In: Turmeric (Curcuma longa L.) and Ginger (Rosc.)—World's Invaluable Medicinal Spices: The Agronomy and Economy of Turmeric and Ginger, 493–518. https://doi.org/10.1007/978-3-030-29189-1_24
[6]	Padulosi S, Leaman D, Quek P (2002) Challenges and opportunities in enhancing the conservation and use of medicinal and aromatic plants. J Herbs, Spices Med Plants 9: 243–267. https://doi.org/10.1300/J044v09n04_01 doi: 10.1300/J044v09n04_01
[7]	Shao X, Lishuang L, Tiffany P, et al. (2010) Quantitative analysis of ginger components in commercial products using liquid chromatography with electrochemical array detection. J Agric Food Chem 58: 12608–12614. https://doi.org/10.1021/jf1029256 doi: 10.1021/jf1029256
[8]	Sangwan A, Kawatra A, Sehgal S (2014) Nutritional composition of ginger powder prepared using various drying methods. J Food Sci Technol 51: 2260–2262. https://doi.org/10.1007/s13197–012–0703–2 doi: 10.1007/s13197–012–0703–2
[9]	Bischoff-Kont I, Fürst R (2021) Benefits of ginger and its constituent 6-shogaol in inhibiting inflammatory processes. Pharmaceuticals 14: 571. https://doi.org/10.3390/ph14060571 doi: 10.3390/ph14060571
[10]	Russo R, Costa MA, Lampiasi N, et al. (2023) A new ginger extract characterization: Immunomodulatory, antioxidant effects and differential gene expression. Food Biosci 53: 102746. https://doi.org/10.1016/j.fbio.2023.102746 doi: 10.1016/j.fbio.2023.102746
[11]	Eleazu CO, Amadi CO, Iwo G, et al. (2013) Chemical composition and free radical scavenging activities of 10 elite accessions of ginger (Zingiber officinale Roscoe). J Clinic Toxicol 3: 155. https://doi.org/10.4172/2161-0495.1000155 doi: 10.4172/2161-0495.1000155
[12]	Wang J, Ke W, Bao R, et al. (2017) Beneficial effects of ginger Zingiber officinale Roscoe on obesity and metabolic syndrome: A review. Ann N Y Acad Sci 1398: 83–98. https://doi.org/10.1111/nyas.13375 doi: 10.1111/nyas.13375
[13]	Lakshmi BVS, Sudhakar MA (2010) Protective effect of Z. officinale on gentamicin induced nephrotoxicity in rats. Int J Pharmacol 6: 58–62. https://doi.org/10.3923/ijp.2010.58.62 doi: 10.3923/ijp.2010.58.62
[14]	Nammi S, Satyanarayana S, Roufogalis BD (2009) Protective effects of ethanolic extract of Zingiber officinale rhizome on the development of metabolic syndrome in high-fat diet-fed rats. Basic Clin Pharmacol Toxicol 104: 366–373. https://doi.org/10.1111/j.1742-7843.2008.00362.x doi: 10.1111/j.1742-7843.2008.00362.x
[15]	Grant KL, Lutz RB (2000) Alternative therapies: Ginger. Am J Health Syst Pharm 57: 945–947. https://doi.org/10.4236/ojmm.2012.23013 doi: 10.4236/ojmm.2012.23013
[16]	Iqbal Z, Lateef M, Akhtar MS, et al. (2006) In vivo anthelmintic activity of ginger against gastrointestinal nematodes of sheep. J Ethnopharmacol 106: 285–287. https://doi.org/10.1016/j.jep.2005.12.031 doi: 10.1016/j.jep.2005.12.031
[17]	El–Baroty GS, Abd El-Baky HH, Farag RS, et al. (2010) Characterization of antioxidant and antimicrobial compounds of cinnamon and ginger essential oils. Afr J Biochem Res 4: 167–174.
[18]	Hsu YL, Chen CY, Hou MF, et al. (2010) 6‐Dehydrogingerdione, an active constituent of dietary ginger, induces cell cycle arrest and apoptosis through reactive oxygen species/c‐Jun N‐terminal kinase pathways in human breast cancer cells. Mol Nutr Food Res 54: 1307–1317. https://doi.org/10.1002/mnfr.200900125 doi: 10.1002/mnfr.200900125
[19]	Koh EM, Kim HJ, Kim S, et al. (2008) Modulation of macrophage functions by compounds isolated from Zingiber officinale. Planta Med 75: 148–151. https://doi.org/10.1055/s-0028-1088347 doi: 10.1055/s-0028-1088347
[20]	Imm J, Zhang G, Chan LY, et al. (2010)[6]-Dehydroshogaol, a minor component in ginger rhizome, exhibits quinone reductase inducing and anti–inflammatory activities that rival those of curcumin. Food Res Int 43: 2208–2213. https://doi.org/10.1016/j.foodres.2010.07.028 doi: 10.1016/j.foodres.2010.07.028
[21]	Yang G, Zhong L, Jiang L, et al. (2010) Genotoxic effect of 6–gingerol on human hepatoma G2 cells. Chem Biol Interact 185: 12–17. https://doi.org/10.1016/j.cbi.2010.02.017 doi: 10.1016/j.cbi.2010.02.017
[22]	Paret ML, Cabos R, Kratky BA, et al. (2010) Effect of plant essential oils on Ralstonia solanacearum race 4 and bacterial wilt of edible ginger. Plant Dis 94: 521–527. https://doi.org/10.1094/PDIS-94-5-0521 doi: 10.1094/PDIS-94-5-0521
[23]	Sharma BR, Dutta S, Roy S, et al. (2010) The effect of soil physicochemical properties on rhizome rot and wilt disease complex incidence of ginger under hill agro climatic region of West Bengal. J Plant Pathol 26: 198–202. https://doi.org/10.5423/PPJ.2010.26.2.198 doi: 10.5423/PPJ.2010.26.2.198
[24]	So IY (1980) Studies on ginger mosaic virus. Korean J Appl Entomol 19: 67–72.
[25]	Hull R (1977) The grouping of small spherical plant viruses with single RNA components. J Gen Virol 36: 289–295. https://doi.org/10.1099/0022-1317-36-2-289 doi: 10.1099/0022-1317-36-2-289
[26]	Janse J (1996) Potato brown rot in Western Europe-History, present occurrence and some remarks on possible origin, epidemiology and control strategies. Bull OEPP/EPPO Bull 26: 679–695. https://doi.org/10.1111/j.1365-2338.1996.tb01512.x doi: 10.1111/j.1365-2338.1996.tb01512.x
[27]	Swanson JK, Yao J, Tans–Kersten JK, et al. (2005) Behavior of Ralstonia solanacearum race 3 biovar 2 during latent and active infection of geranium. Phytopathology 95: 136–114. https://doi.org/10.1094/PHYTO-95-0136 doi: 10.1094/PHYTO-95-0136
[28]	Meenu G, Jebasingh T (2019) Diseases of ginger. In: Wang H (Ed.), Ginger Cultivation and Its Antimicrobial and Pharmacological Potentials, IntechOpen, 1–31. https://doi.org/10.5772/intechopen.88839
[29]	Dohroo NP (2001) Etiology and management of storage rot of ginger in Himachal Pradesh. Indian Phytopathol 54: 49–54.
[30]	Joshi LK, Sharma ND (1980) Diseases of ginger and turmeric. In: Nair MK, Premkumar T, Ravindran PN, et al. (Eds.), Proceedings of National Seminar on Ginger Turmeric, Calicut: CPCRI, 104–119.
[31]	Dohroo NP (2005) Diseases of ginger. In: Ravindran PN, Babu KN (Eds.), Ginger: The Genus Zingiber, Boca Raton: CRC Press, 305–340.
[32]	ISPS (2005) Experiences in collaboration. Ginger pests and diseases. Indo-Swiss Project Sikkim Series 1, 75.
[33]	Moreira SI, Dutra DC, Rodrigues AC, et al. (2013) Fungi and bacteria associated with post-harvest rot of ginger rhizomes in Espírito Santo, Brazil. Trop Plant Pathol 38: 218–226. https://doi.org/10.1590/S1982-56762013000300006 doi: 10.1590/S1982-56762013000300006
[34]	Dake JN (1995) Diseases of ginger (Zingiber officinale Rosc.) and their management. J Spices Aromat Crops 4: 40–48.
[35]	Le DP, Smith M, Hudler GW, et al. (2014) Pythium soft rot of ginger: Detection and identification of the causal pathogens and their control. Crop Prot 65: 153–167. https://doi.org/10.1016/j.cropro.2014.07.021 doi: 10.1016/j.cropro.2014.07.021
[36]	Bhai RS, Sasikumar B, Kumar A (2013) Evaluation of ginger germplasm for resistance to soft rot caused by Pythium myriotylum. Indian Phytopathol 66: 93–95.
[37]	Yang KD, Kim HM, Lee WH, et al. (1988) Studies on rhizome rot of ginger caused by Fusarium oxysporum f.sp. zingiberi and Pythium zingiberum. Plant Pathol J 4: 271–277.
[38]	Ram J, Thakore BBL (2009) Management of storage rot of ginger by using plant extracts and biocontrol agents. J Mycol Plant Pathol 39: 475–479.
[39]	Jadhav SN, Aparadh VT, Bhoite AS (2013) Plant extract using for management of storage rot of ginger in Satara Tehsil (M.S.). Int J Pharm Phytopharm Res 4: 1–2.
[40]	Babu N, Suraby EJ, Cissin J, et al. (2013) Status of transgenics in Indian spices. J Trop Agric 51: 1–14.
[41]	Shivakumar N (2019) Biotechnology and crop improvement of ginger (Zingiber officinale Rosc.). In: Wang H (Ed.), Ginger Cultivation and Its Antimicrobial and Pharmacological Potentials, IntechOpen, 2020: 13. https://doi.org/10.5772/intechopen.88574
[42]	Deme K, Konate M, Ouedraogo HM, et al. (2021) Importance, genetic diversity and prospects for varietal improvement of ginger (Zingiber officinale Roscoe) in Burkina Faso. World J Agric Res 9: 92–99. https://doi.org/10.12691/wjar-9-3-3 doi: 10.12691/wjar-9-3-3
[43]	Doveri S, Powell W, Maheswaran M, et al. (2007) Molecular markers—History, features and application. In: Kole C, Abbott AG (Eds.), Molecular Markers-History, Science Publishing Group, New York, 23–67. Available from: www.scipub.net/botany/principlespractices-plant-genomics.html.
[44]	Poczai P, Varga I, Bell NE, et al. (2012) Genomics meets biodiversity: advances in molecular marker development and their applications in plant genetic diversity assessment. Mol Basis Plant Genet Diversity 30: 978–953.
[45]	Nayak S, Naik PK, Acharya L, et al. (2005) Assessment of genetic diversity among 16 promising cultivars of ginger using cytological and molecular markers. Zeitschrift für Naturforschung C 60: 485–492. https://doi.org/10.1515/znc-2005-5-618 doi: 10.1515/znc-2005-5-618
[46]	Huang H, Layne DR, Kulisiak TL (2000) RAPD inheritance and diversity in pawpaw (Asimina triloba). J Am Soc Hortic Sci 125: 454–459. https://doi.org/10.21273/JASHS.125.4.454 doi: 10.21273/JASHS.125.4.454
[47]	Zambrano Blanco E, Baldin Pinheiro J (2017) Agronomie evaluation and clonal selection of ginger genotypes (Zingiber officinale Roseoe) in Brazil. Agron Colomb 35: 275–284. https://doi.org/10.15446/agron.colomb.v35n3.62454. doi: 10.15446/agron.colomb.v35n3.62454
[48]	Das A, Sahoo RK, Barik DP, et al. (2020) Identification of duplicates in ginger germplasm collection from Odisha using morphological and molecular characterization. Proc Natl Acad Sci, India Sect B: Biol Sci 90: 1057–1066. https://doi.org/10.1007/s40011-020-01178-y doi: 10.1007/s40011-020-01178-y
[49]	Wang L, Gao FS, Xu K, et al. (2014) Natural occurrence of mixploid ginger (Zingiber officinale Rosc.) in China and its morphological variations. Sci Hortic 172: 54–60. https://doi.org/10.1016/j.scienta.2014.03.043 doi: 10.1016/j.scienta.2014.03.043
[50]	Ismail NA, Rafii MY, Mahmud TMM, et al. (2016) Molecular markers: A potential resource for ginger genetic diversity studies. Mol Biol Rep 43: 1347–1358. https://doi.org/10.1007/s11033-016-4070-3 doi: 10.1007/s11033-016-4070-3
[51]	Henry RJ (1997) Practical applications of plant molecular biology. Chapman & Hall, London.
[52]	Sarwat M, Nabi G, Das S, et al. (2012) Molecular markers in medicinal plant biotechnology: past and present. Crit Rev Biotechnol 32: 74–92. https://doi.org/10.3109/07388551.2011.551872 doi: 10.3109/07388551.2011.551872
[53]	Powell W, Morgante M, Andre C, et al. (1996) The comparison of RFLP, RAPD, AFLP and SSR (microsatellite) markers for germplasm analysis. Mol Breed 2: 225–238. https://doi.org/10.1007/BF00564200 doi: 10.1007/BF00564200
[54]	Varshney RK, Hoisington DA, Nayak SN, et al. (2009) Molecular plant breeding: Methodology and achievements. Plant Genomics: Methods Protoc 513: 283–304. https://doi.org/10.1007/978-1-59745-427-8_15 doi: 10.1007/978-1-59745-427-8_15
[55]	Barcaccia G (2010) Molecular markers for characterizing and conserving crop plant germplasm. In: Jain SM, Brar DS (Eds.), Molecular techniques in crop improvement, Springer, Dordrecht, 231–253. https://doi.org/10.1007/978-90-481-2967-6_10
[56]	Shivakumar N, Agrawal P (2018) The effect of chemical mutagens upon morphological characters of ginger in M0 generation. Asian J Microbiol Biotechnol Environ Sci 20: 126–135.
[57]	Ravinderan PN, Nirmal BK, Shiva KN (2005) Botany and crop improvement of ginger. In: Ravinderan PN, Nirmal BK (Eds.), Ginger: The Genus Zingiber, New York: CRC Press, 15–85.
[58]	Das A, Kesari V, Satyanarayana VM, et al. (2011) Genetic relationship of Curcuma species from Northeast India using PCR-based markers. Mol Biotechnol 49: 65–76. https://doi.org/10.1007/s12033-011-9379-5 doi: 10.1007/s12033-011-9379-5
[59]	Zou X, Dai Z, Ding C, et al. (2011). Relationships among six medicinal species of Curcuma assessed by RAPD markers. J Med Plant Res 5: 1349–1354.
[60]	Kaewsri W, Paisooksantivatana Y, Veesommai U, et al. (2007) Phylogenetic analysis of Thai Amomum (Alpinioideae: Zingiberaceae) using AFLP markers. Agric Natl Resour 41: 213–226.
[61]	Sigrist MS, Pinheiro JB, Azevedo‐Filho JA, et al. (2010) Development and characterization of microsatellite markers for turmeric (Curcuma longa). Plant Breed 129: 570–573. https://doi.org/10.1111/j.1439-0523.2009.01720.x doi: 10.1111/j.1439-0523.2009.01720.x
[62]	Pandotra P, Gupta AP, Husain MK, et al. (2013) Evaluation of genetic diversity and chemical profile of ginger cultivars in north–western Himalayas. Biochem Syst Ecol 48: 281–287. https://doi.org/10.1016/j.bse.2013.01.004 doi: 10.1016/j.bse.2013.01.004
[63]	Jatoi SA, Kikuchi A, San SY, et al. (2006) Use of rice SSR markers as RAPD markers for genetic diversity analysis in Zingiberaceae. Breed Sci 56: 107–111. https://doi.org/10.1270/jsbbs.56.107 doi: 10.1270/jsbbs.56.107
[64]	Oladosu Y, Rafii MY, Abdullah N, et al. (2016) Principle and application of plant mutagenesis in crop improvement: A review. Biotechnol Biotechnol Equip 30: 1–16. https://doi.org/10.1080/13102818.2015.1087333 doi: 10.1080/13102818.2015.1087333
[65]	Aisha AH, Rafii MY, Rahim HA, et al. (2018) Radio-sensitivity test of acute gamma irradiation of two variety of chili pepper chili Bangi 3 and chili Bangi 5. Int J Sci Technol Res 7: 90–95.
[66]	Prasath D, Bhai RS, Nair RR (2015) Induction of variability in ginger through induced mutation for disease resistance. In: Conference: National Symposium on Spices and Aromatic Crops, 16–18.
[67]	Oladosu Y, Rafii MY, Abdullah N, et al. (2014) Genetic variability and selection criteria in rice mutant lines as revealed by quantitative traits. The Scientific World Journal. https://doi.org/10.1155/2014/190531 doi: 10.1155/2014/190531
[68]	Christensen AH, Quail PH (1996) Ubiquitin promoter-based vectors for high-level expression of selectable and/or screenable marker genes in monocotyledonous plants. Transgenic Res 5: 213–218. https://doi.org/10.1007/BF01969712 doi: 10.1007/BF01969712
[69]	Suma B, Keshavachandran R, Nybe EV (2008) Agrobacterium tumefaciens mediated transformation and regeneration of ginger (Zingiber officinale Rosc). J Trop Agric 46: 38–44.
[70]	Fugisawa M, Harada H, Kenmoku H, et al. (2010) Cloning and characterization of a novel gene that encodes (S)-beta-bisabolene synthase from ginger, Zingiber officinale. Planta 232: 121–130.
[71]	Laurent D, Frederic P, Laurence L, et al. (1998) Genetic characterization of RRS1, a recessive locus in Arabidopsis thaliana that confers resistance to the bacterial soil borne pathogen Ralstonia solanacearum. Mol Plant-Microbe Interact 11: 659–667. https://doi.org/10.1094/MPMI.1998.11.7.659 doi: 10.1094/MPMI.1998.11.7.659
[72]	Aswati Nair R, Kiran AG, Sivakumar KC, et al. (2010) Molecular characterization of an oomycete-responsive PR-5 protein gene from Zingiber zerumbet. Plant Mol Biol Rep 28: 128–135. https://doi.org/10.1007/s11105–009–0132–1 doi: 10.1007/s11105–009–0132–1
[73]	Priya RS, Subramanian RB (2008) Isolation and molecular analysis of R-gene in resistant Zingiber officinale (ginger) varieties against Fusarium oxysporum f.sp. zingiberi. Bioresour Technol 99: 4540–4543. https://doi.org/10.1016/j.biortech.2007.06.053 doi: 10.1016/j.biortech.2007.06.053
[74]	Kavitha PG, Thomas G (2006) Zingiber zerumbet, A potential Donor for Soft Rot Resistance in Ginger: Genetic Structure and Functional Genomics. Extended Abstract XVⅢ, Kerala Science Congress, 169–171.
[75]	Renner T, Bragg J, Driscoll HE, et al. (2009) Virusinduced gene silencing in the culinary ginger (Zingiber officinale): An effective mechanism for down-regulating gene expression in tropical monocots. Mol Plant 2: 1084–1094. https://doi.org/ 10.1093/mp/ssp033 doi: 10.1093/mp/ssp033
[76]	Chen ZH, Kai GY, Liu XJ, et al. (2005) cDNA cloning and characterization of a mannose-binding lectin from Zingiber officinale Roscoe (ginger) rhizomes. J Biol Sci 30: 213–220. https://doi.org/10.1007/BF02703701 doi: 10.1007/BF02703701
[77]	Yua F, Haradab H, Yamasakia K, et al. (2008) Isolation and functional characterization of a β-eudesmol synthase, a new sesquiterpene synthase from Zingiber zerumbet Smith. FEBS Letters 582: 565–572. https://doi.org/10.1016/j.febslet.2008.01.020 doi: 10.1016/j.febslet.2008.01.020
[78]	Huang JL, Cheng LL, Zhang ZX (2007) Molecular cloning and characterization of violaxanthin depoxidase (VDE) in Zingiber officinale. Plant Sci 172: 228–235. https://doi.org/10.1371/journal.pone.0064383 doi: 10.1371/journal.pone.0064383
[79]	Nirmal Babu K, Samsudeen K, Divakaran M, et al. (2016) Protocols for in vitro propagation, conservation, synthetic seed production, embryo rescue, microrhizome production, molecular profiling, and genetic transformation in ginger (Zingiber officinale Roscoe.). In: Mohan Jain S (Ed.), Protocols for In Vitro Cultures and Secondary Metabolite Analysis of Aromatic and Medicinal Plants, 2nd Edition, 403–426. https://doi.org/10.1007/978-1-4939-3332-7_28
[80]	Seran TH (2013) In vitro propagation of ginger (Zingiber officinale) through direct organogenesis: A review. Pak J Biol Sci 16: 1826–1835. https://doi.org/10.3923/pjbs.2013.1826.1835 doi: 10.3923/pjbs.2013.1826.1835
[81]	El-Nabarawya MA, El-Kafafia SH, Hamzab MA, et al. (2015) The effect of some factors on stimulating the growth and production of active substances in Zingiber officinale callus cultures. Ann Agric Sci 60: 1–9. https://doi.org/10.1016/j.aoas.2014.11.020 doi: 10.1016/j.aoas.2014.11.020
[82]	Shivakumar N, Agrawal P (2014) Callus induction and regeneration from adventitious buds of Zingiber officinale. Asian J Microbiol Biotechnol Environ Sci 16: 881–885.
[83]	Nery FC, Goulart VLA, Paiva PDO, et al. (2015) Micropropagation and chemical composition of Zingiber Spectabile callus. Acta Hortic 1083: 197–204. https://doi.org/10.17660/ActaHortic.2015.1083.23 doi: 10.17660/ActaHortic.2015.1083.23
[84]	Rostiana O, Syahid SF (2008) Somatic embryogenesis, from meristem explants of ginger. Biotropia 15: 12–16. https://doi.org/10.11598/btb.2008.15.1.2 doi: 10.11598/btb.2008.15.1.2

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Agriculture and Food

1.9 3.7

Metrics

Article views(2476) PDF downloads(271) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(2) / Tables(2)

AIMS Agriculture and Food

Genetic diversity and utilization of ginger (Zingiber officinale) for varietal improvement: A review

Related Papers:

Abstract

1. Introduction

2. Preliminaries

2.1. Federated learning

2.2. Knowledge distillation

3. A Bayesian inference perspective on federated learning

3.1. Rethinking objective function

3.2. Bayesian aggregation method

3.3. Feasibility in theory

4. Methods

4.1. The framework for heterogenous federated learning

4.2. Algorithm complexity and privacy security analysis

4.2.1. Algorithm complexity

4.2.2. Privacy security

5. Experiments

5.1. Experimental setup

5.1.1. Datasets and models

5.1.2. Heterogenous distribution of data

5.1.3. Baselines

5.2. Evaluation on the common federated learning settings

5.3. Federated learning on heterogenous environments (model, objective, systems heterogeneity)

5.3.1. Model heterogeneity

5.3.2. Objective heterogeneity

5.3.3. Systems heterogeneity

6. Conclusions

Acknowledgments

Conflict of interest

A. Scaled multi-dimensional Gaussian PDF for the linear regression problem

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

AIMS Agriculture and Food

Genetic diversity and utilization of ginger (Zingiber officinale) for varietal improvement: A review

Related Papers:

Abstract

1. Introduction

2. Preliminaries

2.1. Federated learning

2.2. Knowledge distillation

3. A Bayesian inference perspective on federated learning

3.1. Rethinking objective function

3.2. Bayesian aggregation method

3.3. Feasibility in theory

4. Methods

4.1. The framework for heterogenous federated learning

4.2. Algorithm complexity and privacy security analysis

4.2.1. Algorithm complexity

4.2.2. Privacy security

5. Experiments

5.1. Experimental setup

5.1.1. Datasets and models

5.1.2. Heterogenous distribution of data

5.1.3. Baselines

5.2. Evaluation on the common federated learning settings

5.3. Federated learning on heterogenous environments (model, objective, systems heterogeneity)

5.3.1. Model heterogeneity

5.3.2. Objective heterogeneity

5.3.3. Systems heterogeneity

6. Conclusions

Acknowledgments

Conflict of interest

A. Scaled multi-dimensional Gaussian PDF for the linear regression problem

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog