Stability analysis and optimal control of a rumor propagation model based on two communication modes: friends and marketing account pushing

Ying Yu; Jiaomin Liu; Jiadong Ren; Cuiyi Xiao; Ying Yu; Jiaomin Liu; Jiadong Ren; Cuiyi Xiao

doi:10.3934/mbe.2022204

Mathematical Biosciences and Engineering

2022, Volume 19, Issue 5: 4407-4428. doi: 10.3934/mbe.2022204

Previous Article Next Article

Research article Special Issues

Stability analysis and optimal control of a rumor propagation model based on two communication modes: friends and marketing account pushing

1.
College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei, China
2.
Liren College of Yanshan University, Qinhuangdao, Hebei, China
3.
Computer Virtual Technology and System Integration Laboratory of Hebei Province, China
4.
College of Mathematics and Information Technology, Hebei Normal University of Science and Technology, Qinhuangdao, Hebei, China

Academic Editor: Cristiana J. Silva

Received: 28 November 2021 Revised: 15 February 2022 Accepted: 17 February 2022 Published: 02 March 2022

In addition to spreading information among friends, information can also be pushed through marketing accounts to non-friends. Based on these two information dissemination channels, this paper establishes a Susceptible-Infection-Marketing-Removed (SIMR) rumor propagation model. First, we obtain the basic reproduction number $R_0$ through the next generation matrix. Second, we prove that the solutions of the model are uniformly bounded and discuss asymptotically stable of the rumor-free equilibrium point and the rumor-prevailing equilibrium point. Third, we propose an optimal control strategy for rumors to control the spread of rumors in the network. Finally, the above theories are verified by numerical simulation methods and the necessary conclusions are drawn.

Keywords:

Citation: Ying Yu, Jiaomin Liu, Jiadong Ren, Cuiyi Xiao. Stability analysis and optimal control of a rumor propagation model based on two communication modes: friends and marketing account pushing[J]. Mathematical Biosciences and Engineering, 2022, 19(5): 4407-4428. doi: 10.3934/mbe.2022204

Related Papers:

[1]	Lorna Katusiime . Time-Frequency connectedness between developing countries in the COVID-19 pandemic: The case of East Africa. Quantitative Finance and Economics, 2022, 6(4): 722-748. doi: 10.3934/QFE.2022032
[2]	Chikashi Tsuji . The historical transition of return transmission, volatility spillovers, and dynamic conditional correlations: A fresh perspective and new evidence from the US, UK, and Japanese stock markets. Quantitative Finance and Economics, 2024, 8(2): 410-436. doi: 10.3934/QFE.2024016
[3]	Yonghong Zhong, Richard I.D. Harris, Shuhong Deng . The spillover effects among offshore and onshore RMB exchange rate markets, RMB Hibor market. Quantitative Finance and Economics, 2020, 4(2): 294-309. doi: 10.3934/QFE.2020014
[4]	Cunyi Yang, Li Chen, Bin Mo . The spillover effect of international monetary policy on China's financial market. Quantitative Finance and Economics, 2023, 7(4): 508-537. doi: 10.3934/QFE.2023026
[5]	Yanhong Feng, Shuanglian Chen, Wang Xuan, Tan Yong . Time-varying impact of U.S. financial conditions on China's inflation: a perspective of different types of events. Quantitative Finance and Economics, 2021, 5(4): 604-622. doi: 10.3934/QFE.2021027
[6]	Isaac O. Ajao, Hammed A. Olayinka, Moruf A. Olugbode, OlaOluwa S. Yaya, Olanrewaju I. Shittu . Long memory cointegration and dynamic connectedness of volatility in US dollar exchange rates, with FOREX portfolio investment strategy. Quantitative Finance and Economics, 2023, 7(4): 646-664. doi: 10.3934/QFE.2023031
[7]	Samuel Kwaku Agyei, Ahmed Bossman . Investor sentiment and the interdependence structure of GIIPS stock market returns: A multiscale approach. Quantitative Finance and Economics, 2023, 7(1): 87-116. doi: 10.3934/QFE.2023005
[8]	Rubaiyat Ahsan Bhuiyan, Tanusree Chakravarty Mukherjee, Kazi Md Tarique, Changyong Zhang . Hedge asset for stock markets: Cryptocurrency, Cryptocurrency Volatility Index (CVI) or Commodity. Quantitative Finance and Economics, 2025, 9(1): 131-166. doi: 10.3934/QFE.2025005
[9]	OlaOluwa S. Yaya, Miao Zhang, Han Xi, Fumitaka Furuoka . How do leading stock markets in America and Europe connect to Asian stock markets? Quantile dynamic connectedness. Quantitative Finance and Economics, 2024, 8(3): 502-531. doi: 10.3934/QFE.2024019
[10]	Ke Liu, Changqing Luo, Zhao Li . Investigating the risk spillover from crude oil market to BRICS stock markets based on Copula-POT-CoVaR models. Quantitative Finance and Economics, 2019, 3(4): 754-771. doi: 10.3934/QFE.2019.4.754

Abstract

1. Introduction

Thoracic diseases are diverse and imply complex relationships. For example, extensive clinical experience ^[1,2] has demonstrated that pulmonary atelectasis and effusion often lead to infiltrate development, and pulmonary edema often leads to cardiac hypertrophy. This strong correlation between pathologies, known as label co-occurrence, is a common phenomenon in clinical diagnosis and is not coincidental ^[3], as shown in Figure 1. Radiologists need to look at the lesion area at the time of diagnosis while integrating the pathologic relationships to arrive at the most likely diagnosis. Therefore, diagnosing a massive number of Chest X-ray (CXR) images is a time-consuming and laborious reasoning task for radiologists. This has inspired researchers to utilize deep learning techniques to automatically analyze CXR images and reduce the workloads of radiologists. Multiple abnormalities may be present simultaneously in a single CXR image, making the clinical chest radiograph examination a classic multi-label classification problem. Multi-label classification means that a sample can belong to multiple categories (or labels) and that different categories are related. Relationships between pathology labels are expressed differently in different data modalities. As Figure 1 shows, pathology regions appearing simultaneously in the image reflect label relationships as features. In the word embedding of pathology labels, the label relationship is implicit in the semantic information of each label. In recent years, several advanced deep learning methods have been developed to solve this task ^{[4,5,6,7,8,9]}. According to our survey, the existing methods are divided into two classes: 1) label-independent learning methods and 2) label-correlation learning methods.

Figure 1. Illustration of pathology relationships and alignment problems in different data modals. Left: the pathology correlation within each modal. Right: we aligned the representation of pathology across modals. The transformed arrows in the figure indicate that "Pathology A

$\rightarrow$ Pathology B" means that when Pathology A appears, Pathology B is likely to have occurred, but the converse does not necessarily hold.

DownLoad: Full-Size Img PowerPoint

The label-independent learning method transforms the multi-label CXR recognition task into multiple independent nonintersecting binary recognition tasks. The primary process is to train a separate binary classifier for each label on the sample to be tested. Early on, some researchers ^[2,10,11,12] used convolutional neural networks and their variants on this task with some success by designing elaborate network structures to improve recognition accuracy. Despite their efforts and breakthroughs in this field, some things can still be improved. Since this label-independent learning method treats each label as an independent learning object, training results are susceptible to situations, such as missing sample labels and sample mislabeling. Additionally, this class of methods uses only the sample image as the main carrier of the learning object. The image as a single modal form of labeling relationships implies a particular limitation. These methods have yet to consider interlabel correlations and ignore the representation of labeling relationships in other data modalities.

Subsequently, clinical experience has shown that some abnormalities in CXR images may be strongly correlated. The literature ^[3] suggests that this is not a coincidence but rather one of a labeling relationship that can be called co-occurrence. The literature ^[1] found that edema in the lungs tends to trigger cardiomegaly. The literature ^[2] indicates that lung infiltrates are often associated with pulmonary atelectasis and effusion. This labeling relationship inspires the application of deep learning techniques to the CXR recognition task. In addition, this interdependent information can be used to infer missing or noisy labels from co-occurrence relationships. This improves the robustness of the model and its recognition performance.

Existing label-correlation learning methods are mainly categorized into two types: image-based unimodal learning methods and methods that additionally consider textual modal data while learning images. First, the most common technique in image-based unimodal learning methods is attention-guided. These attention-guided methods ^[13,14,15] focus on the most discriminating lesion area features in each sample CXR image. These methods capture the interdependence between labels and lesion regions implicitly, i.e., by designing attention models with different mechanisms to establish the correlation between lesion regions and the whole region. However, the above methods only locally establish label correlations on the imaging modality, ignoring the global label co-occurrence relationship. Another approach that considers textual modal data when learning images is categorized as Recurrent Neural Network (RNN)-based and Graph Convolutional Network (GCN)-based. These RNN-based methods ^[1,16,17] rely on state variables to encode label-related information and use the RNN as a decoder to predict anomalous sequences in sample images. However, this approach often requires complex computations. In addition, some researchers ^[18,19] extract valuable textual embedding information from radiology reports to assist in classification. In contrast, GCN-based methods ^[6,20,21,22] represent label-correlation information, such as label co-occurrence as undirected graph data. These methods treat each label as a graph node and use semantic word embeddings of labels as node features. However, while the above methods learn the label relations in additional modalities, they ignore the alignment between the label relation representations of different modalities, as shown on the right side of Figure 1. Moreover, these methods of modeling pathological relationships using graphs are composed so that the directed graph information is ignored, i.e., it is difficult to represent all pathological relationships in an undirected graph accurately.

In this paper, we propose a multi-label CXR classification model called MRChexNet that integrally learns pathology information in different modalities and models interpathology correlations more comprehensively. It consists of a representation learning module (RLM), a multi-modal bridge module (MBM), and a pathology graph learning module (PGL). In RLM, we obtain image-level pathology-specific representations for lesion regions in every image. In MBM, we fully bridge the pathology representations in different modalities. The image-level pathology-specific representations from RLM align with the rich semantic information in pathology word embeddings. In PGL, we first model the undirected graph pathology correlation matrix containing all pathology relations in a data-driven manner. Second, by considering the directed information between nodes, we construct an in-degree matrix and an out-degree matrix as directed graphs by considering the out-degree and in-degree on each node as the study object, respectively. Finally, we designed a graph learning module in PGL that integrates the study of pathological information in multiple modalities. The front end of the module is designed with a graph convolution block with a two-branch symmetric structure for learning two directed graphs containing labeling relations in different directions. The back end of the module stacks graph attention layers. All labeling relations are comprehensively learned on the undirected graph pathology correlation matrix. Finally, the framework is optimized using a multi-label loss function to complete end-to-end training.

In summary, our contributions are fourfold:

1) A new RLM is proposed to obtain image-level pathology-specific representation and global image representation for image lesion regions.

2) A novel MBM is proposed that aligns pathology information in different modal representations.

3) In the proposed PGL, more accurate pathological relationships are modeled as directed graphs by considering directed information between nodes on the graph. An effective graph learning block is designed to learn the pathology information of different modalities comprehensively.

4) We developed the framework in two large-scale CXR datasets (ChestX-ray14 ^[2] and CheXpert ^[23]) and evaluated the effectiveness of MRChexNet on this basis, with average AUC scores of 0.8503 and 0.8649 for 14 pathologies. Our method achieves state-of-the-art performance in terms of classification accuracy and generalizability.

2. Related work

This section presents a summary of the relevant literature in two aspects. First, previous works on the automatic analysis of CXR images are introduced. Second, several representative works related to cross-modal fusion are presented.

2.1. Multi-label chest X-ray image recognition

To improve efficiency and reduce the workloads of radiologists, researchers are beginning to apply the latest advances in deep learning to chest X-ray analysis. In the early days of deep learning techniques applied to CXR recognition, researchers divided the CXR multi-label recognition task into multiple independent disjoint binary labeling problems. An independent binary classifier is trained for each anomaly present in the image. Wang et al. ^[2] used classical convolutional neural networks and transfer learning to predict CXR images. Rajpurkar et al. ^[10] improved the network architecture based on DenseNet-121 ^[11] and proposed CheXNet for anomaly classification in CXR images, which achieved good performance in detecting pneumonia. Li et al. ^[24] performed thoracic disease identification and localization with additional location annotation supervision. Shen et al. ^[12] designed a novel network training mechanism for efficiently training CNN-based automatic chest disease detection models. To dynamically capture more discriminative features for thoracic disease classification, Chen et al. ^[25] used a dual asymmetric architecture based on ResNet and DenseNet. However, as mentioned above, these methods do not account for the correlation between the labels.

When diagnosing, the radiologist needs to view the lesion area while integrating pathological relationships to make the most likely diagnosis. This necessity inspired researchers to start considering label dependencies. For example, Wang et al. ^[16] used RNN to model label relevance sequentially. Yao et al. ^[1] considered multi-label classification as a sequence prediction task with a fixed length. They employed long short-term memory (LSTM) ^[26] and presented initial results indicating that utilizing label dependency can enhance classification performance. Ypsilantis et al. ^[17] used an RNN-based bidirectional attention model that focuses on information-rich regions of an image and samples the entire CXR image sequentially. Moreover, some approaches have attempted to use different attentional mechanisms to correlate labels with attended areas. The work of Zhu et al. ^[13] and Wang et al. ^[14] both use an attention mechanism that only addresses a limited number of local correlations between regions on an image. Guan et al. ^[15] used CNNs to learn high-level image features and designed attention-learning modules to provide additional attention guidance for chest disease recognition. It is worth mentioning that as the graph data structure has become a hot research topic, some approaches use graphs to model labeling relationships. Subsequently, Chen et al. ^[22] introduced a workable framework in which every label represents a node, the term vector of each label acts as a node feature, and GCN is implemented to comprehend the connection among labels in an undirected graph. Li et al. ^[27] developed the A-GCN, which captures label dependencies by creating an adaptive label structure and has demonstrated exemplary performance. Lee et al. ^[20] described label relationships using a knowledge graph, which enhances image representation accuracy. Chen et al. ^[6] employed an undirected graph to represent the relationships between pathologies. They designed CheXGCN by using the word vectors of labels as node features of the graph, and the experiments showed promising results.

2.2. Cross-modal fusion

Researchers often use concatenation or elemental summation to fuse different modal features to fuse cross-modal features. Fukui et al. ^[28] proposed that two vectors of different modalities are made exterior product to fuse multi-modal features by bilinear models. However, this method yields high-dimensional fusion vectors. Hu et al. ^[29] used data within 24 hours of admission to build simpler machine-learning models for early acute kidney injury (AKI) risk stratification and obtained good results. Xu et al. ^[30] encouraged data on both attribute and imaging modalities to be discriminated to improve attribute-image person reidentification. To reduce the high-dimensional computation, Kim et al. ^[31] designed a method that achieves comparable performance to the work of Fukui et al. by performing the Hadamard product between two feature vectors but with slow convergence. It is worth mentioning that Zhou et al. ^[32] introduced a new method with stable performance and accelerated model convergence for the study of fusing image features and text embedding. Chen et al. ^[22] used ResNet to learn the image features, GCN to learn the semantic information in the label word embeddings, and finally fused the two using a simple dot product. Similarly, Wang et al. ^[33] designed a sum-pooling method to fuse the vectors of the two modalities after learning the image features and the semantic information of label word embeddings. It not only reduces the dimensionality of the vectors but also increases the convergence rate of the model.

3. Materials and methods

This section proposes a multi-label CXR recognition framework, MRChexNet, consisting of three main modules: the representation learning module (RLM), multi-modal bridge module (MBM), and pathology graph learning module (PGL). We first introduce the general framework of our model in Figure 2 and then detail the workflow of each of these three modules. Finally, we describe the datasets implementation details, and evaluation metrics.

Figure 2. The overall framework of our proposed MRChexNet.

DownLoad: Full-Size Img PowerPoint

3.1. Representation learning module

Theoretically, we can use any CNN-based model to learn image features. In our experiments, following ^[1,6,25], we use DenseNet-169 ^[11] as the backbone for fair comparisons. Thus, if an input image I has a 224 $\times$ 224 resolution, we can obtain 1664 $\times$ 7 $\times$ 7 feature maps from the "Dense Block_4" layer of DenseNet-169. As shown in , we perform global average pooling to obtain the image-level global feature $\boldsymbol{x} = f_{GAP}\left(f_{backbone}(\mathrm{I})\right)$ , where $f_{GAP}(\cdot)$ represents the global average pooling (GAP) ^[34] operation. We first set up a multi layer perceptron (MLP) layer learning $x$ to obtain an initial diagnostic score of the image, $Y_{MLP}$ . Specifically, the MLP here consists of a layer of fully connected (FC) network + sigmoid activation function.

$\begin{equation} Y_{MLP} = f_{MLP}(x; \theta_{MLP}), \end{equation}$

(3.1)

where $f_{MLP}(\cdot)$ represents the MLP layer and $\theta_{MLP}\in\mathbb{R}^{C\times D}$ is the parameter. We use the parameter $\theta_{MLP}$ as a diagnoser for each disease and filter a set of features specific to a disease from the global feature $x$ . Each diagnoser $\theta^{C}_{MLP}\in \mathbb{R}^{D}$ extracts information related to disease $C$ and predicts the likelihood of the appearance of disease $C$ in the image. Then, the pathology-related feature $F_{pr}$ is disentangled by Eq (3.2).

$\begin{equation} F_{pr} = f_{repeat}(x)\odot\theta_{MLP}. \end{equation}$

(3.2)

The operation $f_{repeat}(\cdot)$ indicates that $x\in \mathbb{R}^{D}$ is copied $C$ times to form $[X, \cdots X]^{T}\in\mathbb{R}^{C\times D}$ , with $\odot$ denoting the Hadamard product. Using this method to adjust the global feature $x$ , the adjusted $x$ captures more relevant information for each disease.

3.2. Multi-modal bridge module

In this section, we design the MBM module to efficiently align the disease's image features and the disease's semantic word embeddings. As shows, the MBM module is divided into two phases: alignment + fusion and squeeze. The fixed input of the MBM module consists of two parts: $modal_{1}\in \mathbb{R}^{D_1}$ , which represents the image features, and $modal_{2}\in\mathbb{R}^{D_2}$ , which is the word embedding. First, we use two FC layers to convert $modal_{1}$ into $M_{1}\in\mathbb{R}^{D_3}$ and $modal_{2}$ into $M_{2}\in\mathbb{R}^{D_3}$ , respectively:

$\begin{equation} \left\{ \begin{aligned} & \mathit{\boldsymbol{M_1}} = FC_{1}(\mathit{\boldsymbol{modal_1}})\in\mathbb{R}^{D_{3}}\\ & \mathit{\boldsymbol{M_2}} = FC_{2}(\mathit{\boldsymbol{modal_2}})\in\mathbb{R}^{D_{3}} \end{aligned}. \right. \end{equation}$

(3.3)

Figure 3. Architecture of multi-modal bridge module.

DownLoad: Full-Size Img PowerPoint

We design a separate dropout layer for $M_{2}$ to prevent redundant semantic information from causing overfitting. After obtaining two inputs $M_{1}$ , $M_{2}$ of the same dimension, the initial bilinear pooling ^[35] is defined as follows:

$\begin{equation} \mathit{\boldsymbol{F}} = \mathit{\boldsymbol{M^{T}_{1}S_{i}M_2}}, \end{equation}$

(3.4)

where $\mathit{\boldsymbol{F}}\in \mathbb{R}^{o}$ is the output fusion feature of the MBM module and $\mathit{\boldsymbol{S_{i}}} \in \mathbb{R}^{ D_{3}\times D_{3} }$ is the bilinear mapping matrix with bias terms included. $\mathit{\boldsymbol{S}} = \left[S_i, \cdots, S_o\right]\in \mathbb{R}^{D_{3} \times D_{3} \times o}$ can be decomposed into two low-rank matrices $\mathit{\boldsymbol{u_{i}}} = \left[u_1, \cdots, u_G\right]\in \mathbb{R}^{D_{3} \times G}$ , $\mathit{\boldsymbol{v_{i}}} = \left[v_1, \cdots, v_G\right]\in \mathbb{R}^{D_{3} \times G}$ . Therefore, Equation (3.4) can be rewritten as follows:

$\begin{equation} \begin{aligned} \mathit{\boldsymbol{F_{i}}} = \mathit{\boldsymbol{1^{T}}} \left(\mathit{\boldsymbol{u_{i}^{T}M_{1}}} \circ \mathit{\boldsymbol{v_{i}^{T}M_{2}}} \right), \end{aligned} \end{equation}$

(3.5)

where the value of $G$ is the factor or latent dimension of two low-rank matrices and $1^{T}\in \mathbb{R}^{G}$ is an all-one vector. To obtain the final $\boldsymbol{F}$ , two three-dimensional tensors $\mathit{\boldsymbol{u_{i}}}\in \mathbb{R}^{D_{3} \times G \times o}, \mathit{\boldsymbol{v_{i}}}\in \mathbb{R}^{D_{3} \times G \times o}$ need to be learned. Under the premise of ensuring the generality of Eq (3.5), the two learnable tensors $\boldsymbol{u}, \boldsymbol{v}$ are converted into two-dimensional matrices by matrix variable dimension, namely, $\mathit{\boldsymbol{u_{i}}} \rightarrow \mathit{\boldsymbol{\tilde{u}}}\in \mathbb{R}^{D_{3}\times Go}$ and $\mathit{\boldsymbol{v_{i}}} \rightarrow \mathit{\boldsymbol{\tilde{v}}}\in \mathbb{R}^{D_{3}\times Go}$ , then Eq (3.5) simplifies to:

$\begin{equation} \begin{aligned} \mathit{\boldsymbol{F}} = f_{GroupSum} \left( \mathit{\boldsymbol{\tilde{u}^{T}M_{1}}} \circ \mathit{\boldsymbol{\tilde{v}^{T}M_{2}}} , G \right), \end{aligned} \end{equation}$

(3.6)

where the function $f_{GroupSum}\left(\mathit{\boldsymbol{vector}}, G \right)$ represents the mapping of $g$ elements in $\boldsymbol{vector}$ into $\frac {1}{G}$ groups and outputs all $G$ groups obtained after complete mapping as potential dimensions, $\boldsymbol{F} \in \mathbb{R}^{G}$ . Furthermore, a dropout layer is added after the elementwise multiplication layer to avoid overfitting. Due to the introduction of elementary multiplication, the size of the output neuron can change drastically, and the model can converge to a local minimum that is not satisfactory. Therefore, the normalization layer ( $\boldsymbol{F}\leftarrow \boldsymbol{F}/\|\boldsymbol{F}\|$ ) and power normalization layer ( $\boldsymbol{F}\leftarrow \operatorname{sign}(\boldsymbol{F})|\boldsymbol{F}|^{0.5}$ ) are appended. Finally, $\boldsymbol{F}$ is copied $C$ times through operation $f_{Repeat}(\cdot)$ , then $\boldsymbol{F}\in\mathbb{R}^{C\times G}$ as the final MBM output. These are the details of the MBM process.

3.3. Pathology graph learning module

Our PGL module is built on top of graph learning. The node-level output of traditional graph learning techniques is the predicted score of each node. In contrast, the final output of our designed graph learning block is designed as the classifier for the corresponding label in our task. We use the fused features of the MBM output as the node features for graph learning. Furthermore, the graph structure (i.e., the correlation matrix) is typically predefined in other tasks. However, it is not provided in the multi-label CXR image recognition task. We need to construct the correlation matrix ourselves. Therefore, we devise a new method for constructing the correlation matrix by considering the directed information of graph nodes.

First, we capture the pathological dependencies based on the label statistics of the entire dataset and construct the pathology correlation matrix $A_{pc}$ . Specifically, we count the number of occurrences ( $T_i$ ) of the i-th pathological label ( $L_i$ ) and the simultaneous occurrences of $L_i$ and $L_j$ ( $T_{ij}$ = $T_{ji}$ ). In addition, the label dependency can be expressed by conditional probability as follows:

$\begin{equation} P_{ij} = P \left(L_i|L_j \right) = \frac{T_{ij}}{T_j}, \forall i \in \left[1, C\right], \end{equation}$

(3.7)

where ${P}_{ij}$ denotes the probability that $L_i$ occurs under the condition that $L_j$ occurs. Note that since the conditional probabilities between two objects are asymmetric, ${P}_{ij} \neq {P}_{ji}$ . The element value ${A_{pc}}_{ij}$ at each position in this matrix is equal to ${P}_{ij}$ . Then, by considering directed information on the graph structure, we split an in-degree matrix $\boldsymbol{A}_{pc}^{in}$ and an out-degree matrix $\boldsymbol{A}_{pc}^{out}$ , which are defined as follows:

$\begin{equation} \boldsymbol{A}_{pc}^{in} = \sum\limits_k \frac{A_{pc_{ki}} A_{pc_{kj}}}{\sum _v A_{pc_{kv}}}, \forall i, j \in C, k, v \in C, \end{equation}$

(3.8)

$\begin{equation} \boldsymbol{A}_{pc}^{out} = \sum\limits_k \frac{A_{pc_{ik}} A_{pc_{jk}}}{\sum _v A_{pc_{vk}}}, \forall i, j \in C, k, v \in C. \end{equation}$

(3.9)

Then, in our PGL, the dual-branch learning of the graph learning block is specifically defined as:

$\begin{equation} \boldsymbol{{Z}^{in}} = f_{gc}^{in}(A_{pc}^{in}\boldsymbol{F}\theta_{gc}^{in}), \end{equation}$

(3.10)

$\begin{equation} \boldsymbol{{Z}^{out}} = f_{gc}^{out}(A_{pc}^{out}\boldsymbol{F}\theta_{gc}^{out}), \end{equation}$

(3.11)

where $\boldsymbol{{Z}^{in}}$ and $\boldsymbol{{Z}^{out}}$ are the outputs of the in-degree branch and the out-degree branch, respectively. $f_{gc}(\cdot)$ denotes the graph convolutional operation, and $\theta_{gc}$ denotes the corresponding trainable transformation matrix.

To learn more about the correlations between different pathological features, we use a graph attention network (GAT) ^[36] to consider $\boldsymbol{{Z}^{in}}$ and $\boldsymbol{{Z}^{out}}$ jointly. We do this by using $\boldsymbol{Z^{all}} = f^{'}(\boldsymbol{Z^{in}}) + f^{'}(\boldsymbol{Z^{out}})$ as the input feature to graph attention. $f^{'}(\cdot)$ denotes the batch normalization layer and nonlinear activation operation LeakyReLU. The graph attention layer transforms the implicit features of the input nodes and aggregates the neighborhood information to the next node to improve the correlation between the information of the central node and its neighbors. The input $\boldsymbol{Z^{all}}$ to the graph attention layer is the set of node features $\left\{ \boldsymbol{Z^{all}_1}, \boldsymbol{Z^{all}_2}, \cdots, \boldsymbol{Z^{all}_n} \right\} \in \mathbb{R}^{d}$ , where $d$ is the number of feature dimensions in each node. The attention weight coefficients $e_{i, j}$ are computed between node $i$ and the neighborhood of node $j \in NB_{i}$ by a learnable linear transformation matrix $W$ and applied to all nodes, as shown in Eq (3.12).

$\begin{equation} e_{i, j} = a\left[W X_{i} \| W X_{j}\right], \end{equation}$

(3.12)

where $\|$ is the concatenation operation, $W \in \mathbb{R}^{\acute{d} \times d}$ , $a \in \mathbb{R}^{\acute{d} \times d}$ is a learnable parameter and $\acute{d}$ denotes the dimensionality of the output features. The graph attention layer allows each node to focus on each of the other nodes. $e_{i, j}$ uses LeakyReLU as the nonlinear activation function and is normalized by the sigmoid function, which can be expressed as:

$\begin{equation} \alpha_{i, j} = {Sigmoid}_j\left(e_{i, j}\right) = \frac{\exp \left({LeakyReLU}\left(e_{i, j}\right)\right)}{\sum_{k \in NB_i} \exp \left({LeakyReLU}\left(e_{i, k}\right)\right)}. \end{equation}$

(3.13)

To stabilize the learning process of the graph attention in the PGL module, we extended the multiheaded self-attention mechanism within it as follows:

$\begin{equation} Y_{PGL} = \|_{k = 1}^K {ReLU}\left(\alpha^{(k)} \boldsymbol{Z^{all}} \boldsymbol{W^k}\right), \end{equation}$

(3.14)

where $Y_{PGL}\in\mathbb{R}^{K\acute{D}}$ denotes the output features incorporating the pathology-correlated features, $K$ denotes the number of attention heads, and $\alpha^{(k)}$ denotes the normalized $k$ -th attention weight coefficient matrix. $W^{k}$ denotes the transformable weight matrix under the corresponding $k$ -th attention head. Finally, the output features are averaged and passed to the next node.

$\begin{equation} Y_{PGL} = ReLU\left(\frac{1}{K}\right) \sum\limits_{K = 1}^K\left(\alpha^{(k)} \boldsymbol{Z^{all}} \boldsymbol{W^k}\right). \end{equation}$

(3.15)

We show through empirical studies that PGL can detect potentially strong correlations between pathological features. It improves the model's ability to learn implicit relationships between pathologies.

After obtaining $Y_{MLP}$ and $Y_{PGL}$ , we set the final output of our model as $Y_{Out}$ = $Y_{MLP}$ + $Y_{PGL}$ and then feed it into the loss function to calculate the loss. Finally, we update the entire network end-to-end using the MultiLabelSoftMargin loss (called multi-label loss) function ^[37]. The training loss function is described as:

$\begin{equation} \begin{aligned} \mathcal{L}\left(Y_{Out}, L\right) = &-\frac{1}{C} \sum\limits_{j = 1}^{C} L_{j} \log \left(\left(1+\exp \left(-Y_{out_j}\right)\right)^{-1}\right) \\&+\left(1-L_{j}\right) \log \left(\frac{\exp \left(-Y_{outj}\right)}{\left(1+\exp \left(-Y_{out_j}\right)\right)}\right), \end{aligned} \end{equation}$

(3.16)

where $Y_{Out}$ and $L$ denote the predicted pathology and the true pathology of the sample image, respectively. $Y_{out_j}$ and $L_{j}$ denote the $j$ -th element in its predicted pathology and the $j$ -th element in the actual pathology.

4. Experiments

In this section, we report and discuss the results on two benchmark multi-label CXR recognition datasets. Ablation experiments were also conducted to explore the effects of different parameters and components on MRChexNet. Finally, a visual analysis was performed.

4.1. Datasets

ChestX-Ray14 is a large CXR dataset. It contains 78,466 training images, 11,220 validation images, and 22,434 test images. Approximately 1.6 pathology labels from 14 semantic categories are applied to the patient images. Each image is labeled with one or more pathologies, as illustrated in Figure 4. We strictly follow the official splitting standards of ChestX-Ray14 provided by Wang et al. ^[2] to conduct our experiments so that our results are directly comparable with most published baselines. We use the training and validation sets to train our model and then evaluate the performance on the test set.

Figure 4. Example images and the corresponding labels in the ChestX-Ray14 and CheXpert datasets. Each image is labeled with one or more pathologies. In CheXpert, the uncertain pathology is marked in red.

DownLoad: Full-Size Img PowerPoint

CheXpert is a popular dataset for recognizing, detecting and segmenting common chest and lung diseases. There are 224,616 images in the database, including 12 pathology labels and two nonpathology labels (not found and assistive device). Each image is assigned one or more disease symptoms, and the disease results are labeled as positive, negative and uncertain, as illustrated in ; if no positive disease is found in the image, it is labeled as 'no finding'. Undetermined labels in the images can be considered positive (CheXpert $\_$ 1s) or negative (CheXpert $\_$ 0s). On average, each image had 2.9 pathology labels for CheXpert $\_$ 1s and 2.3 for CheXpert $\_$ 0s. Since the data for the test set are still not published, we redivided the dataset into a training set, a validation set, and a test set at a ratio of 7:1:2.

As described earlier, the proposed PGL module involves the global modeling of all pathologies on the basis of cooccurrence pairs, the results of which are the identification of potential pathologies present in each image. As shown in , many pathology pairs with cooccurrence relationships were obtained by counting the occurrences of all pathologies in both datasets separately. For example, lung disease is frequently associated with pleural effusion, and atelectasis is frequently associated with infiltration. This phenomenon serves as a basis for constructing pathology correlation matrix $A_{pc}$ and provides initial evidence of the feasibility of the proposed PGL module.

Figure 5. Graph representations of the pathology correlation extracted from the ChestX-Ray14, CheXpert_1s and CheXpert_0s datasets.

DownLoad: Full-Size Img PowerPoint

4.2. Implementation details

All experiments were run on an Intel 8268 CPU and NVIDIA Tesla V100 32 GB GPU. Moreover, it was implemented based on the PyTorch framework. First, we resize all images to 256 $\times$ 256 and normalize via the mean and standard deviation of the ImageNet dataset. Then, random cropping to make images 224 $\times$ 224, random horizontal flip, and random rotation were applied, as some images may have been ﬂipped or rotated within the dataset. The output characteristic dimension $D_{1}$ of the backbone was 1664. In the PGL module, we designed a graph learning block consisting of 1-1 symmetrically structured GCN layers stacked with 2(2) graph attention layers (the number of attention heads within the layer). The number of GCN output channels was 1024 and 1024, respectively. We used a 2-layer GAT model, with the first layer using $K$ = 2 attention heads, each head computing 512 features (1024 features in total), followed by exponential linear unit (ELU) ^[46] nonlinearity. The second layer did the same, averaging these features, followed by logistic sigmoid activation. In addition, we considered LeakyReLU with a negative slope of 0.2 as the nonlinear activation function used in the PGL module. The input pathology label word embedding was a 300-dimensional vector generated by the GloVe model pretrained on the Wikipedia dataset. When multiple words represented the pathology labels, we used the average vector of all words as the pathology label word embedding. In the MBM, we set $D_{3}$ = 14,336 to bridge the vectors of the two modes. Furthermore, we set $G$ = 1024 with $g$ = 14 to complete the GroupSum method. The ratios of dropout1 and dropout2 were 0.3 and 0.1, respectively. The whole network was updated by AdamW with a momentum of (0.9, 0.999) and a weight decay of 1e-4. The initial learning rate of the whole model was 0.001, which decreased 10 times every 10 epochs.

In our experiments, we used the AUC value ^[38] (the area under the receiver operating characteristic (ROC) curve ^[38]) for each pathology and the mean AUC value across all cases to measure the performance of MRChexNet. There was no data overlap between the training and testing subsets. The true label of each image was labeled with $L = \left[L_1, L_2, \dots, L_C \right]$ . In the dataset of two CXR label numbers $C$ = 14, each element $L_C$ indicated the presence or absence of the $C$ -th pathology, i.e., 1 indicated presence and 0 indicated absence. For each image, the label was predicted as positive if the confidence level of the label was greater than 0.5.

4.3. Comparison with existing methods

In this section, we conduct experiments on ChestX-Ray14 and CheXpert to compare the performance of MRChexNet with existing methods.

Results from ChestX-Ray14 and discussion: We compared MRChexNet with a variety of existing methods including U-DCNN ^[2], LSTM-Net ^[1], CheXNet ^[10], DNet ^[39], AGCL ^[19], DR-DNN ^[12], CRAL ^[15], DualCheXN ^[25] and CheXGCN ^[6]. We present the results of the comparison on ChestX-Ray14 in Table 1 including the evaluation metrics for the entire dataset of 14 pathology labels. MRChexNet outperformed all candidate methods on most pathology-labeled metrics. Figure 6 illustrates the ROC curves of our model over the 14 pathologies on ChestX-Ray14. Specifically, MRChexNet outperformed these previous methods in mean AUC score, especially for U-DCNN (0.745) and LSTM-Net (0.798), with improvements of 10.5% and 3.7%, respectively. Moreover, it outperformed DualCheXNet (0.823) and improved the AUC score of detecting consolidation (0.819 vs. 0.746) and pneumonia (0.783 vs. 0.727) by more than 6.0%. Notably, the mean AUC score of MRChexNet improved by 2.4% over CheXGCN (0.826). The AUC scores of some pathologies labeled with MRChexNet obviously improved, e.g., cardiomegaly (0.923 vs. 0.893), consolidation (0.819 vs. 0.751), edema (0.904 vs. 0.850) and atelecta (0.824 vs. 0.786). It must be mentioned that our proposed model performed somewhat poorly on the nodule and fibrosis labels. Note that the pathogenesis of these diseases is systemic, and we generated word embeddings of their pathological labels using only their noun meanings without adding additional semantics to explain their sites of pathogenesis. This issue led to the unsatisfactory performance of MRChexNet on these pathologies. Overall, the proposed MRChexNet improved the multi-label recognition performance of ChestX-Ray14 and outperformed existing methods.

Table 1. AUC comparisons of MRChexNet with existing methods on ChestX-Ray14.

Method	ChestX-Ray14														Mean AUC
	AUC
	atel	card	effu	infi	mass	nodu	pne1	pne2	cons	edem	emph	fibr	pt	hern
U-DCNN ^[2]	0.700	0.810	0.759	0.661	0.693	0.669	0.658	0.799	0.703	0.805	0.833	0.786	0.684	0.872	0.745
LSTM-Net ^[1]	0.772	0.904	0.859	0.695	0.792	0.717	0.713	0.841	0.788	0.882	0.829	0.767	0.765	0.914	0.798
DR-DNN ^[12]	0.766	0.801	0.797	0.751	0.760	0.741	0.778	0.800	0.787	0.820	0.773	0.765	0.759	0.748	0.775
AGCL ^[19]	0.756	0.887	0.819	0.689	0.814	0.755	0.729	0.850	0.728	0.848	0.906	0.818	0.765	0.875	0.803
CheXNet ^[10]	0.769	0.885	0.825	0.694	0.824	0.759	0.715	0.852	0.745	0.842	0.906	0.821	0.766	0.901	0.807
DNet ^[39]	0.767	0.883	0.828	0.709	0.821	0.758	0.731	0.846	0.745	0.835	0.895	0.818	0.761	0.896	0.807
CRAL ^[15]	0.781	0.880	0.829	0.702	0.834	0.773	0.729	0.857	0.754	0.850	0.908	0.830	0.778	0.917	0.816
DualCheXN ^[25]	0.784	0.888	0.831	0.705	0.838	0.796	0.727	0.876	0.746	0.852	0.942	0.837	0.796	0.912	0.823
CheXGCN ^[6]	0.786	0.893	0.832	0.699	0.840	0.800	0.739	0.876	0.751	0.850	0.944	0.834	0.795	0.929	0.826
MRChexNet (Ours)	0.824	0.923	0.894	0.719	0.857	0.779	0.783	0.888	0.819	0.904	0.920	0.835	0.808	0.946	0.850
Note: The 14 pathologies in Chest X-Ray14 are atelectasis (atel), cardiomegaly (card), effusion (effu), infiltration (inﬁ), mass, nodule (nodu), pneumonia (pne1), pneumothorax (pne2), consolidation (cons), edema (edem), emphysema (emph), fibrosis (fibr), pleural thickening (pt) and hernia (hern).

| Show Table

DownLoad: CSV

Figure 6. ROC curves of MRChexNet on the ChestXRay14 and CheXpert, respectively. The corresponding AUC scores are given in Tables 1-3.

DownLoad: Full-Size Img PowerPoint

Results from CheXpert and discussion: To our limited knowledge, the test set of CheXpert has yet to be publicly available and can only be redivided by itself. Fewer state-of-the-art methods are available for comparison. Based on that, we further evaluated the comparison of our model with the uncertainty labeling treatments mentioned in the original dataset (U_Ones and U_Zeros). As shown in , MRChexNet_1s obtained higher mean AUC scores on 14 pathological labels for CheXpert_1s, which were 1.5% higher than the techniques in the original paper U_Ones. Additionally, compared to the vanilla DenseNet-169, the improvement is 3.8%. As shown in , MRChexNet_0s obtained higher mean AUC scores on 14 pathological labels for CheXpert_0s, which were 2.1% higher than the techniques U_Zeros in the original paper. The mean AUC score of MRChexNet is 3.1% higher than that of vanilla DenseNet-169. These results prove that our two proposed modules can work better when reinforcing each other. Overall, the AUC score of MRChexNet_1s was better than that of MRChexNet_0s by 0.3%, especially for lung lesions by 3.5% (0.788 $\rightarrow$ 0.823), atelectasis by 2.5% (0.707 $\rightarrow$ 0.732) and fracture by 2.7% (0.793 $\rightarrow$ 0.820). This is because the true value of these uncertainty labels on the image is likely to be negative. The converse is also true. Figure 6 illustrates the ROC curves of MRChexNet on ChestX-ray14, CheXpert_1s and CheXpert_0s for the 14 pathologies.

Table 2. AUC comparisons of MRChexNet with previous baseline on CheXpert_1s.

Method	CheXpert_1s														Mean AUC
	AUC
	nofi	enla	card	opac	lesi	edem	cons	pne1	atel	pne2	pleu1	pleu2	frac	supp
ML-GCN ^[22]	0.879	0.630	0.841	0.723	0.773	0.856	0.692	0.740	0.713	0.829	0.873	0.802	0.762	0.868	0.784
U_Ones ^[23]	0.890	0.659	0.856	0.735	0.778	0.847	0.701	0.756	0.722	0.855	0.871	0.798	0.789	0.878	0.795
DenseNet-169 ^[11]	0.916	0.717	0.895	0.770	0.783	0.882	0.710	0.774	0.728	0.871	0.916	0.817	0.805	0.909	0.821
MRChexNet_1s (Ours)	0.976	0.738	0.900	0.887	0.940	0.884	0.701	0.719	0.759	0.925	0.924	0.852	0.958	0.944	0.865
Note: The 14 pathologies in CheXpert are no Finding (nofi), enlarged cardiomediastinum (enla), cardiomegaly (card), lung opacity (opac), lung lesion (lesi), edema (edem), consolidation (cons), pneumonia (pne1), atelectasis (atel), pneumothorax (pne2), pleural effusion (pleu1), pleural other (pleu2), fracture (frac) and support devices (supp).

| Show Table

DownLoad: CSV

Table 3. AUC comparisons of MRChexNet with the previous baseline on CheXpert_0s.

Method	CheXpert_0s														Mean AUC
	AUC
	nofi	enla	card	opac	lesi	edem	cons	pne1	atel	pne2	pleu1	pleu2	frac	supp
ML-GCN ^[22]	0.864	0.673	0.831	0.681	0.802	0.770	0.713	0.758	0.654	0.845	0.841	0.764	0.754	0.838	0.771
U_Zeros ^[23]	0.885	0.678	0.865	0.730	0.760	0.853	0.735	0.740	0.700	0.872	0.880	0.775	0.743	0.877	0.792
DenseNet-169 ^[11]	0.912	0.715	0.884	0.738	0.780	0.861	0.753	0.770	0.711	0.860	0.904	0.830	0.758	0.878	0.811
MRChexNet_0s (Ours)	0.914	0.808	0.894	0.748	0.913	0.827	0.801	0.868	0.744	0.928	0.876	0.909	0.915	0.859	0.858

| Show Table

DownLoad: CSV

4.4. Ablation experiments and discussion

MRChexNet with its different components on ChestX-Ray14: We experimented with the performance of the components of the MRChexNet; the results are shown in . In baseline + PGL, we use a simple summation of elements instead of MBM to fuse the visual feature vectors of pathology and the semantic word vectors of pathology. The obtained simple fusion vectors are used as the node features of the graph learning block. Compared to the baseline DenseNet-169, the mean AUC score of baseline + PGL was significantly higher by 3.6% (0.782 $\rightarrow$ 0.818), especially in atelectasis (0.775 $\rightarrow$ 0.820), cardiomegaly (0.879 $\rightarrow$ 0.920), effusion (0.826 $\rightarrow$ 0.888) and nodule (0.689 $\rightarrow$ 0.769), exceeding the vanilla DenseNet-169 by an average of 5.7% in those pathology labels. The experimental results showed that the proposed PGL module is crucial in mining the global cooccurrence between pathologies. Note that in the baseline + MBM model, the fixed direct $input_{2}$ to the MBM module is a vector of 14 pathology-annotated words with initial semantic information. We learn the output of the resulting cross-modal fusion vectors from one FC layer by aligning the visual features of pathology with the semantic word vectors of pathology. Compared to the DenseNet-169 baseline, the mean AUC score of baseline + MBM was significantly higher by 2.7% (0.782 $\rightarrow$ 0.809), especially in atelectasis (0.775 $\rightarrow$ 0.800), effusion (0.826 $\rightarrow$ 0.860), pneumothorax (0.823 $\rightarrow$ 0.859), and mass (0.766 $\rightarrow$ 0.856) on pathology, exceeding the vanilla DenseNet-169 by an average of 4.6% in those pathology labels. With the addition of the MBM and PGL modules, MRChexNet significantly improved the mean AUC score by 6.8%. In particular, the AUC score improvement was significant for atelectasis (0.775 $\rightarrow$ 0.824), pneumothorax (0.823 $\rightarrow$ 0.888), and emphysema (0.838 $\rightarrow$ 0.920). This phenomenon indicates that the MBM and PGL modules in our framework can reinforce and complement each other to make MRChexNet perform at its best.

Table 4. Comparison of AUC of MRChexNet with its different components on ChestX-Ray14.

Method	Chest X-Ray14														Mean AUC
	AUC
	atel	card	effu	infi	mass	nodu	pneu1	pneu2	cons	edem	emph	fibr	pt	hern
Baseline : DenseNet-169 ^[11]	0.775	0.879	0.826	0.685	0.766	0.689	0.725	0.823	0.788	0.841	0.838	0.767	0.742	0.811	0.782
Baseline + MBM	0.800	0.892	0.860	0.707	0.856	0.760	0.741	0.859	0.810	0.870	0.883	0.711	0.781	0.796	0.809
Baseline + PGL	0.820	0.920	0.888	0.710	0.784	0.769	0.756	0.873	0.808	0.896	0.874	0.744	0.799	0.804	0.818
MRChexNet (Ours)	0.824	0.923	0.894	0.719	0.857	0.779	0.783	0.888	0.819	0.904	0.920	0.835	0.808	0.946	0.850

| Show Table

DownLoad: CSV

Testing time for different components in MRChexNet: We experimented with the inference time for each component of MRChexNet, and the results are shown in the . We have set the inference time in seconds and the inference duration as the time to infer 1 image. Then, we first tested an image using Baseline and the obtained time as a base. After testing an image using Baseline + MBM and Baseline + PGL to get the duration, the base inference duration of the previous baseline is subtracted to get the exact inference duration of each module. According to the results, it can be seen that MBM and PGL increase the reasoning time of the model by 20.3 $\times \; 10^{-6}$ and 33.7 $\times \; 10^{-6}$ s, respectively. It is worth mentioning that the interaction of the two achieves a satisfactory recognition performance, which is an acceptable result compared to the manual reasoning time of the radiologist.

Table 5. Comparison of the test time of MRChexNet with its different components.

Method	Test time (1 image)
Baseline : DenseNet $-$ 169	2.5 $\times \; 10^{-6}$ s
(Baseline $+$ MBM) $-$ Baseline	12.1 $\times \; 10^{-6}$ s
(Baseline $+$ PGL) $-$ Baseline	20.3 $\times \; 10^{-6}$ s
MRChexNet (Ours)	33.7 $\times \; 10^{-6}$ s

| Show Table

DownLoad: CSV

MRChexNet under different types of word embeddings: We default to using GloVe ^[40] as the token representation as input to the multi-modal bridge module (MBM). In this section, we evaluate the performance of MRChexNet under other types of popular word representations. Specifically, we investigate four different word embedding methods, including GloVe ^[40], FastText ^[41], and simple single-hot word embedding. Figure 7 shows the results using different word embeddings on ChestX-Ray14 and CheXpert. As shown, we can see that thoracic disease recognition accuracy is not significantly affected when using different word embeddings as inputs to the MBM. Furthermore, the observations (especially the results of one-hot) demonstrate that the accuracy improvement achieved by our approach does not come entirely from the semantics produced by the word embeddings. Furthermore, using powerful word embeddings led to better performance. One possible reason may be that the word embeddings learned from a large text corpus maintain some semantic topology. That is, semantic-related concept embeddings are close in the embedding space. Our model can employ these implicit dependencies and further benefit thoracic disease recognition.

Figure 7. Effects of different pathology word embedding approaches. It is clear that different pathology word embeddings have little effect on accuracy. This shows that our improvements are not necessarily due to the semantic meanings derived from the pathology word embeddings but rather to our MRChexNet.

DownLoad: Full-Size Img PowerPoint

Groups $G$ and elements $g$ in GroupSum: In this section, we evaluate the performance of the MBM in MRChexNet by using a different number of groups $G$ and the number of elements $g$ within a group. With the GroupSum in the MBM, each $D_{3}$ -dimensional vector will be converted into a $G$ -dimensional vector. We have a set of $G$ - $g \in \{(2048, 7), (1024, 14), (512, 28), (256, 56), (128,112)\}$ to generate a low-dimensional bridging vector. As shown in , MRChexNet obtains better performance on ChestX-Ray14 when $G$ = 1024 and $g$ = 14 are chosen, while the change in the mean AUC is very slight on CheXpert. We believe that the original semantic information between the pathology word embeddings can be better expressed by $G$ = 1024 and $g$ = 14. Other values of $G$ - $g$ bring similar results, which do not affect the model too much.

Figure 8. The change of mean AUC using different values of

$G$ -

$g$ .

DownLoad: Full-Size Img PowerPoint

Different numbers of GCN layers and GAT layers of the graph learning block in PGL: Since the front end of the graph learning block we have designed is a GCN with a dual-branch symmetric structure, the main discussion is about the number of GCN layers on each branch. We set the graph attention layer at the end of the graph learning block. To maintain the symmetry of the graph learning block structure, we kept the number of layers the same as the number of attention heads within the layer. We show the performance results for different GCN layers of our model in . For the 1-1 layer model to GCN, in each branch, the output dimensions of the sequential layers are 1024. For the 2-2 layer model to GCN, in each branch, the output dimensions of the sequential layers are 1024 and 1024. For the 3-3 layer model to GCN, in each branch, the output dimensions of the sequential layers are 1024. We aligned the number of graph attention layers with the number of attention heads. Specifically, for the 1-layer GAT model, with the layer using $K$ = 1 attention heads, the head computes 1024 features (1024 features in total). For the 2-layer GAT model, with the first layer using $K$ = 2 attention heads, each head computes 512 features (1024 features in total), and the second layer does the same. As shown in the table, the pathology recognition performance on both datasets decreased when the number of GCN layers and the number of GAT layers increased. The performance degradation was due to the accumulation of information transfer between nodes when more GCN and GAT layers were used, leading to oversmoothing.

Table 6. The different number of GCN layers and GAT layers of the graph learning block in PGL.

Mean AUC
#Layer		Dataset
Dual-branch GCN	GAT (heads)	ChestX-Ray14	CheXpert-0s	CheXpert-1s
1-1	1(1)	0.8417	0.8493	0.8366
1-1	2(2)	0.8503	0.8649	0.8575
2-2	1(1)	0.8342	0.8402	0.8309
2-2	2(2)	0.8251	0.8323	0.8187
3-3	1(1)	0.8187	0.8238	0.8194
3-3	2(2)	0.8063	0.8109	0.8057

| Show Table

DownLoad: CSV

4.5. Visualization of lesion areas for qualitative assessment

In , we visualize the original images and the corresponding label-specific activation maps obtained by our proposed MRChexNet. It is clear that MRChexNet can capture the discriminative semantic regions of the images for the different chest diseases. illustrates a visual representation of multi-label CXR recognition. The top-eight predicted scores for each test subject are given and sorted top-down by the magnitude of the predicted score values. As shown in , compared with the vanilla DenseNet-169 model, the proposed MRChexNet enhances the performance of multi-label CXR recognition. Our MRChexNet can effectively improve associated pathology confidence scores and suppress nonassociated pathology scores with fully considered and modeled global label relationships. For example, in column 1, row 2, MRChexNet fully considers the pathological relationship between effusion and atelectasis. In the presence of effusion, the corresponding confidence score for atelectasis was (0.5210 $\rightarrow$ 0.9319); compared to vanilla DenseNet-169 performance, the confidence score improved by approximately 0.4109. For the weakly correlated labels, effusion ranked first in column 2, row 3 regarding the DenseNet-169 score. While MRChexNet fully considers the global interlabel relationships, its confidence score does not reach the top 8. To some extent, this demonstrates the ability of our model to suppress the confidence scores of nonrelevant pathologies.

Figure 9. Visualization results of pathology correlation activation maps on ChestX-Ray14 dataset. The three columns on the right are three samples with different diseases and their corresponding activation maps.

DownLoad: Full-Size Img PowerPoint

Figure 10. Visualization results of our model scoring the highest pathology on the images to be tested in the ChestX-Ray14 dataset. We present the top-eight predicted pathology labels and the corresponding probability scores. The ground truth labels are highlighted in red.

DownLoad: Full-Size Img PowerPoint

5. Conclusions

Improving the performance of multi-label CXR recognition algorithms in clinical environments by considering the correspondence between pathology labels in different modalities and capturing the correlation relationship between related pathologies is vital, as is aligning pathology-relationship representations in different modalities and learning the relationship information of pathologies within each modality. In this paper, we propose a multi-modal bridge and relational learning method named MRChexNet to align pathological representations in different modalities and learn information about the relationship of pathology within each modality. Specifically, our model first extracts pathology-specific feature representations in the imaging modality by designing a practical RLM. Then, an efficient MBM is designed to align pathological word embeddings and image-level pathology-specific feature representations. Finally, a novel PGL is intended to comprehensively learn the correlation of pathologies within each modality. Extensive experimental results on ChestX-Ray14 and CheXpert show that the proposed MBM and PGL can effectively enhance each other, thus significantly improving the model's multi-label CXR recognition performance with satisfactory results. In the future, we will introduce the relation weight parameter in pathology relation modeling to learn more accurate pathology relations to help further improve the multi-label CXR recognition performance.

In the future, we will extend the applicability of the proposed method to other imaging modalities, such as optical coherence tomography (OCT). Among them, OCT is a noninvasive optical imaging modality that provides histopathology images with microscopic resolution ^{[42,43,44,45]}. Our next research direction is extending the proposed method for OCT-based pathology image analysis. In addition, exploring the interpretability and readability of models has been a hot research topic in making deep learning techniques applicable to clinical diagnosis. Our next research direction is also how to make our model more friendly and credible for clinicians' understanding.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work is supported by the National Nature Science Foundation of China (No. 61872225), Introduction and Cultivation Program for Young Creative Talents in Colleges and Universities of Shandong Province (No. 2019-173), the Natural Science Foundation of Shandong Province (No. ZR2020KF013, No. ZR2020ZD44, No. ZR2019ZD04, No. ZR2020QF043, No. ZR2023QF094) and the Special fund of Qilu Health and Health Leading Talents Training Project.

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	R. K. Garrett, Troubling consequences of online political rumoring, Hum. Commun. Res., 37 (2011), 255–274. https://doi.org/10.1111/j.1468-2958.2010.01401.x doi: 10.1111/j.1468-2958.2010.01401.x
[2]	L. Zhu, G. Guan, Dynamical analysis of a rumor spreading model with self-discrimination and time delay in complex networks, Physica A, 533, (2019). https://doi.org/10.1016/j.physa.2019.121953
[3]	L. Zhu, M. Liu, Y. Li, The dynamics analysis of a rumor propagation model in online social networks, Physica A, 520 (2019), 118–137. https://doi.org/10.1016/j.physa.2019.01.013 doi: 10.1016/j.physa.2019.01.013
[4]	H. Sun, Y. Sheng, Q. Cui, An uncertain SIR rumor spreading model, Adv. Differ. Equations, 2021 (2021). https://doi.org/10.1186/s13662-021-03386-w
[5]	L. Zhu, L. Li, Dynamic analysis of rumor-spread-delaying model based on rumor-refuting mechanism, Acta Phys. Sin., 69 (2020), 020501. https://doi.org/10.7498/aps.69.20191503 doi: 10.7498/aps.69.20191503
[6]	L. Zhu, F. Yang, G. Guan, Z. Zhang, Modeling the dynamics of rumor diffusion over complex networks, Inf. Sci., 562 (2021). https://doi.org/10.1016/j.ins.2020.12.071
[7]	R. Li, Y. Li, Z. Meng, Y. Song, G. Jiang, Rumor spreading model considering individual activity and refutation mechanism simultaneously, IEEE Access, 8 (2020), 63065–63076. https://doi.org/10.1109/ACCESS.2020.2983249 doi: 10.1109/ACCESS.2020.2983249
[8]	G. Chen, ILSCR rumor spreading model to discuss the control of rumor spreading in emergency, Physica A, 522 (2019), 88–97. https://doi.org/10.1016/j.physa.2018.11.068 doi: 10.1016/j.physa.2018.11.068
[9]	Z. He, Z. Cai, J. Yu, X. Wang, Y. Sun, Y. Li, Cost-efficient strategies for restraining rumor spreading in mobile social networks, IEEE Trans. Veh. Technol., 66 (2017), 2789–2800. https://doi.org/10.1109/TVT.2016.2585591 doi: 10.1109/TVT.2016.2585591
[10]	J. Chen, L. Yang, X. Yang, Y. Tang, Cost-effective anti-rumor message-pushing schemes, Physica A, 540 (2020), 123085. https://doi.org/10.1016/j.physa.2019.123085 doi: 10.1016/j.physa.2019.123085
[11]	L. Ding, P. Hu, Z. Guan, An efficient hybrid control strategy for restraining rumor spreading, IEEE Trans. Syst. Man Cybern.: Syst., 51 (2021), 6779–6791. https://doi.org/10.1109/TSMC.2019.2963418 doi: 10.1109/TSMC.2019.2963418
[12]	L. Zhu, M. Zhou, Z. Zhang, Dynamical analysis and control strategies of rumor spreading models in both homogeneous and heterogeneous networks, J. Nonlinear Sci., 30 (2020) 2545–2576. https://doi.org/10.1007/s00332-020-09629-6
[13]	S. Chen, H. Jiang, L. Li, J. Li, Dynamical behaviors and optimal control of rumor propagation model with saturation incidence on heterogeneous networks, Chaos, Solitons Fractals, 140 (2020), 110206. https://doi.org/10.1016/j.chaos.2020.110206 doi: 10.1016/j.chaos.2020.110206
[14]	D. Daley, D. Kendall, Epidemics and rumours, Nature, 204 (1964), 1118. https://doi.org/10.1038/2041118a0
[15]	D. Daley, D. Kendall, Stochastic rumours, J. Appl. Math., 1 (1965), 42–55. https://doi.org/10.1093/imamat/1.1.42
[16]	D. Zanette, H. Damián, Dynamics of rumor propagation on small-world networks, Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 65 (2001), 041908. https://doi.org/10.1103/PhysRevE.65.041908 doi: 10.1103/PhysRevE.65.041908
[17]	D. Zanette, Critical behavior of propagation on small-world network, Phys. Rev. E, 64 (2001), 050901. https://doi.org/10.1103/PhysRevE.64.050901 doi: 10.1103/PhysRevE.64.050901
[18]	Y. Moreno, M. Nekovee, Dynamics of rumor spreading in complex networks, Phys. Rev. E, 69 (2004), 066130. https://doi.org/10.1103/PhysRevE.69.066130 doi: 10.1103/PhysRevE.69.066130
[19]	Y. Wang, J. Cao, Z. Jin, H. Zhang, G. Sun, Impact of media coverage on epidemic spreading in complex networks, Physica A, 392 (2013), 5824–5835. https://doi.org/10.1016/j.physa.2013.07.067 doi: 10.1016/j.physa.2013.07.067
[20]	L. Zheng, L. Tang, A node-based SIRS epidemic model with infective media on complex networks, Complexity, 2019 (2019), 1–14. https://doi.org/10.1155/2019/2849196 doi: 10.1155/2019/2849196
[21]	L. Xia, G. Jiang, B. Song, Y. Song, Rumor spreading model considering hesitating mechanism in complex social networks, Physica A, 437 (2015), 295–303. https://doi.org/10.1016/j.physa.2015.05.113 doi: 10.1016/j.physa.2015.05.113
[22]	Q. Liu, T. Li, M. Sun, The analysis of an SEIR rumor propagation model on heterogeneous network, Physica A, 469 (2017), 372–380. https://doi.org/10.1016/j.physa.2016.11.067 doi: 10.1016/j.physa.2016.11.067
[23]	H. Zhu, J. Ma, Analysis of SHIR rumor propagation in random heterogeneous networks with dynamic friendships, Physica A, 513 (2019), 257–271. https://doi.org/10.1016/j.physa.2018.09.015 doi: 10.1016/j.physa.2018.09.015
[24]	Y. Zhang, J. Zhu, Stability analysis of I2S2R rumor spreading model in complex networks, Physica A, 503 (2018), 862–881. https://doi.org/10.1016/j.physa.2018.02.087 doi: 10.1016/j.physa.2018.02.087
[25]	D. Yang, T. Chow, L. Zhong, Q. Zhang, The competitive information spreading over multiplex social networks, Physica A, 503 (2018), 981–990. https://doi.org/10.1016/j.physa.2018.08.096 doi: 10.1016/j.physa.2018.08.096
[26]	W. Liu, T. Li, X. Cheng, H. Xu, X. Liu, Spreading dynamics of a cyber violence model on scale-free networks, AIMS Math., 531 (2019), 121752. https://doi.org/10.1016/j.physa.2019.121752 doi: 10.1016/j.physa.2019.121752
[27]	Y. Yu, J. Liu, J. Ren, Q. Wang, C. Xiao, Minimize the impact of rumors by optimizing the control of comments on the complex network, AIMS Math., 6 (2021), 6140–6159. https://doi.org/10.3934/math.2021360 doi: 10.3934/math.2021360
[28]	P. van den Driessche, J. Watmough, Reproduction numbers and sub-threshold endemic equilibria for compartmental models of diseases transmission, Math. Biosci., 180 (2002), 29–48. https://doi.org/10.1016/S0025-5564(02)00108-6 doi: 10.1016/S0025-5564(02)00108-6
[29]	O. Diekmann, J. Heesterbeek, J. Metz, On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations, J. Math. Biol., 28 (1990), 365–382. https://doi.org/10.1007/BF00178324 doi: 10.1007/BF00178324
[30]	C. Barril, A. Calsina, J. Ripoll, A practical approach to $R_0$ in continuous-time ecological models, Math. Methods Appl. Sci., 41 (2017), 8432–8445. https://doi.org/10.1002/mma.4673 doi: 10.1002/mma.4673
[31]	D. Breda, F. Florian, J. Ripoll, R. Vermiglio, Efficient numerical computation of the basic reproduction number for structured populations, J. Comput. Appl. Math., 384 (2021), 113165. https://doi.org/10.1016/j.cam.2020.113165 doi: 10.1016/j.cam.2020.113165
[32]	E. X. Dejesus, C. Kaufman, Routh-Hurwitz criterion in the examination of eigenvalues of a system of nonlinear ordinary differential equations, Phys. Rev. A, 35 (1987), 5288. https://doi.org/10.1103/PhysRevA.35.5288 doi: 10.1103/PhysRevA.35.5288
[33]	C. Castillo-Chavez, S. Blower, P. van den Driessche, D. Kirschner, A. A. Yakubu, Mathematical Approaches for Emerging and Re-Emerging Infectious Diseases: An Introduction, Springer, New York, 2002. https://doi.org/10.1007/978-1-4613-0065-6
[34]	E. M. Stein, R. Shakarchi, Real Analysis: Measure Theory, Integration, and Hilbert Spaces, Princeton University Press, 2005. Available from: https://www.academia.edu/25753953/Real_Analysis_Measure_Theory_Integration_and_Hilbert_Spaces_2005_.
[35]	L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, E. Mishchenko, The Mathematical Theory of Optimal Processes, John Wiley & Sons, Inc, 1962. https://doi.org/28812407712881241344
[36]	L. Zhao, W. Qin, J. Cheng, Y. Chen, J. Wang, H. Wei, Rumor spreading model with consideration of forgetting mechanism: A case of online blogging livejournal, Physica A, 390 (2011), 2619–2625. https://doi.org/10.1016/j.physa.2011.03.010 doi: 10.1016/j.physa.2011.03.010
[37]	L. Zhao, H. Cui, X. Qiu, X. Wang, J. Wang, Sir rumor spreading model in the new media age, Physica A, 392 (2013). https://doi.org/10.1016/j.physa.2012.09.030

This article has been cited by:

1.	Xiaoqun Liu, Changrong Yang, Youcong Chao, The Pricing of ESG: Evidence From Overnight Return and Intraday Return, 2022, 10, 2296-665X, 10.3389/fenvs.2022.927420
2.	Jinyu Chen, Yilin Wang, Xiaohang Ren, Asymmetric effect of financial stress on China’s precious metals market: Evidence from a quantile-on-quantile regression, 2023, 64, 02755319, 101831, 10.1016/j.ribaf.2022.101831
3.	Jinyu Chen, Yilin Wang, Xiaohang Ren, Asymmetric effects of non-ferrous metal price shocks on clean energy stocks: Evidence from a quantile-on-quantile method, 2022, 78, 03014207, 102796, 10.1016/j.resourpol.2022.102796
4.	Zhiyi Li, Mayila Tuerxun, Jianhong Cao, Min Fan, Cunyi Yang, Does inclusive finance improve income: A study in rural areas, 2022, 7, 2473-6988, 20909, 10.3934/math.20221146
5.	Mobeen Ur Rehman, Xuan Vinh Vo, Hee-Un Ko, Nasir Ahmad, Sang Hoon Kang, Quantile connectedness between Chinese stock and commodity futures markets, 2023, 64, 02755319, 101810, 10.1016/j.ribaf.2022.101810
6.	Yunying Huang, Wenlin Gui, Yixin Jiang, Fengyi Zhu, Types of systemic risk and macroeconomic forecast: Evidence from China, 2022, 30, 2688-1594, 4469, 10.3934/era.2022227
7.	Cheng-Wen Lee, Shu-Hui Chen, Andrian Dolfriandra Huruta, Christine Dewi, Abbott Po Shun Chen, Net Transmitter of Stock Market Volatility and Safe Haven for Portfolio Investors in the Asian Dragons, 2022, 10, 2227-7099, 273, 10.3390/economies10110273
8.	Jihong Xiao, Xian Chen, Yang Li, Fenghua Wen, Oil price uncertainty and stock price crash risk: Evidence from China, 2022, 112, 01409883, 106118, 10.1016/j.eneco.2022.106118
9.	Zhenghui Li, Bin Mo, He Nie, Time and frequency dynamic connectedness between cryptocurrencies and financial assets in China, 2023, 10590560, 10.1016/j.iref.2023.01.015
10.	Jinyu Chen, Zhipeng Liang, Qian Ding, Zhenhua Liu, Extreme spillovers among fossil energy, clean energy, and metals markets: Evidence from a quantile-based analysis, 2022, 107, 01409883, 105880, 10.1016/j.eneco.2022.105880
11.	Yanhong Feng, Xiaolei Wang, Shuanglian Chen, Yanqiong Liu, Impact of Oil Financialization on Oil Price Fluctuation: A Perspective of Heterogeneity, 2022, 15, 1996-1073, 4294, 10.3390/en15124294
12.	Guifang Liu, Yuhang Zheng, Fan Hu, Zhidi Du, Modelling exchange rate volatility under jump process and application analysis, 2023, 8, 2473-6988, 8610, 10.3934/math.2023432
13.	Yan Zheng, Hua Yin, Min Zhou, Wenhua Liu, Fenghua Wen, Impacts of oil shocks on the EU carbon emissions allowances under different market conditions, 2021, 104, 01409883, 105683, 10.1016/j.eneco.2021.105683
14.	Fenghua Wen, Haocen Zhao, Lili Zhao, Hua Yin, What drive carbon price dynamics in China?, 2022, 79, 10575219, 101999, 10.1016/j.irfa.2021.101999
15.	Fenghua Wen, Minzhi Zhang, Jihong Xiao, Wei Yue, The impact of oil price shocks on the risk-return relation in the Chinese stock market, 2022, 47, 15446123, 102788, 10.1016/j.frl.2022.102788
16.	Jinyu Chen, Zhipeng Liang, Qian Ding, Zhenhua Liu, Quantile connectedness between energy, metal, and carbon markets, 2022, 83, 10575219, 102282, 10.1016/j.irfa.2022.102282
17.	Hao Nong, Yitan Guan, Yuanying Jiang, Identifying the volatility spillover risks between crude oil prices and China's clean energy market, 2022, 30, 2688-1594, 4593, 10.3934/era.2022233
18.	Nan Lin, Chengyi Liu, Sicen Chen, Jianping Pan, Pengdong Zhang, The monitoring role of venture capital on controllers' tunneling: Evidence from China, 2022, 82, 10575219, 102193, 10.1016/j.irfa.2022.102193
19.	Min Zhou, Xiaoqun Liu, Overnight-Intraday Mispricing of Chinese Energy Stocks: A View from Financial Anomalies, 2022, 9, 2296-598X, 10.3389/fenrg.2021.807881
20.	Xu Gong, Anlan Lin, Xiaoqi Chen, CEO–CFO gender congruence and stock price crash risk in energy companies, 2022, 75, 03135926, 591, 10.1016/j.eap.2022.06.010
21.	Jinyu Chen, Yuxin Huang, Xiaohang Ren, Jingxiao Qu, Time-varying spillovers between trade policy uncertainty and precious metal markets: Evidence from China-US trade conflict, 2022, 76, 03014207, 102577, 10.1016/j.resourpol.2022.102577
22.	Xiaohang Ren, Weixi Xu, Kun Duan, Fourier transform based LSTM stock prediction model under oil shocks, 2022, 6, 2573-0134, 342, 10.3934/QFE.2022015
23.	Yongrong Xin, Xiyin Chang, Jianing Zhu, How does the digital economy affect energy efficiency? Empirical research on Chinese cities, 2022, 0958-305X, 0958305X2211434, 10.1177/0958305X221143411
24.	Hongming Li, Jiahui Li, Yuanying Jiang, Exploring the Dynamic Impact between the Industries in China: New Perspective Based on Pattern Causality and Time-Varying Effect, 2023, 11, 2079-8954, 318, 10.3390/systems11070318
25.	Miaomiao Tao, Stephen Poletti, Mingyue Selena Sheng, Le Wen, Nexus between carbon, stock, and energy markets in New Zealand: An analysis of causal domains, 2024, 299, 03605442, 131409, 10.1016/j.energy.2024.131409
26.	Qian Ding, Jianbai Huang, Jinyu Chen, Time-Frequency Spillovers and the Determinants among Fossil Energy, Clean Energy and Metal Markets, 2023, 44, 0195-6574, 259, 10.5547/01956574.44.2.qdin
27.	Bin Mo, He Nie, Rongjie Zhao, Dynamic nonlinear effects of geopolitical risks on commodities: Fresh evidence from quantile methods, 2024, 288, 03605442, 129759, 10.1016/j.energy.2023.129759
28.	Zisheng Ouyang, Xuewei Zhou, Interconnected networks: Measuring extreme risk connectedness between China’s financial sector and real estate sector, 2023, 90, 10575219, 102892, 10.1016/j.irfa.2023.102892
29.	Huiqin Huang, Chenglong Wang, Wei Yu, Keying Zhu, Does powerful executive holding a dual post as the board secretary reduce nonpunitive regulation?, 2023, 89, 10575219, 102797, 10.1016/j.irfa.2023.102797
30.	Mahdi Ghaemi Asl, David Roubaud, Asymmetric interactions among cutting-edge technologies and pioneering conventional and Islamic cryptocurrencies: fresh evidence from intra-day-based good and bad volatilities, 2024, 10, 2199-4730, 10.1186/s40854-024-00623-5
31.	Juan Meng, Yonghong Jiang, Haiwen Zhao, Ansheng Tanliang, Asymmetric Effects of Renewable Energy Markets on China’s Green Financial Markets: A Perspective of Time and Frequency Dynamic Connectedness, 2024, 12, 2227-7390, 2038, 10.3390/math12132038
32.	Rija Anwar, Syed Ali Raza, Exploring the connectedness between non-fungible token, decentralized finance and housing market: Deep insights from extreme events, 2024, 10, 24058440, e38224, 10.1016/j.heliyon.2024.e38224
33.	Anouar Ben Mabrouk, Sabrine Arfaoui, Mohamed Essaied Hamrita, Wavelet-based systematic risk estimation for GCC stock markets and impact of the embargo on the Qatar case, 2023, 7, 2573-0134, 287, 10.3934/QFE.2023015
34.	Qun Zhang, Hao Zhang, Didier Sornette, Discovering nonlinear interactions between China's financial markets: A data-driven approach, 2025, 10575219, 103975, 10.1016/j.irfa.2025.103975

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)