Cytokine profile and oxidative stress parameters in women with initial manifestations of pelvic venous insufficiency

Marina A. Darenskaya; Dmitry A. Stupin; Andrey A. Semendyaev; Sergey I. Kolesnikov; Natalya V. Semenova; Lyubov I. Kolesnikova; Marina A. Darenskaya; Dmitry A. Stupin; Andrey A. Semendyaev; Sergey I. Kolesnikov; Natalya V. Semenova; Lyubov I. Kolesnikova

doi:10.3934/medsci.2022020

AIMS Medical Science

2022, Volume 9, Issue 3: 414-423. doi: 10.3934/medsci.2022020

Previous Article Next Article

Research article Topical Sections

Cytokine profile and oxidative stress parameters in women with initial manifestations of pelvic venous insufficiency

1.
Scientific Centre for Family Health and Human Reproduction Problems, Irkutsk, Russia
2.
Irkutsk State Medical University, Irkutsk, Russia

Received: 05 March 2022 Revised: 27 June 2022 Accepted: 03 July 2022 Published: 27 July 2022

Pelvic venous insufficiency (PVI) in women is widespread and is closely associated with the risk of reproductive disorders (in 15–25% of patients) and a high rate of the disease recurrence after treatment. The factors involved in venous wall damage include atherogenic stimuli and chronic endotoxin aggression due to inflammatory processes. The changes in the initial stages of the disease are usually minor and selective. There is currently an urgent need to identify initial markers of these changes to develop preventive measures for their correction. Therefore, the aim of this study was to determine the cytokine profile parameters' levels, as well as the activity of lipid peroxidation (LPO) and antioxidant defense (AOD) reactions in women with initial manifestations of PVI. Thirty-nine female patients with PVI (mean age 37.4 ± 9.1 years old) were the subjects of the study. The diagnosis was verified by clinical and instrumental examination including ultrasound angioscanning of the pelvic veins and therapeutic and diagnostic laparoscopy, and it was finally confirmed histologically. The control group included 30 nearly healthy women (mean age 33.5 ± 6.3 years old) who underwent surgical sterilization by laparoscopic access. Spectrophotometric, fluorometric and immunoassay methods were used in the study. The cytokine profile in female patients with PVI, as compared to the control group, was characterized by an increased concentration of proinflammatory (interleukin (IL) (IL-6) and IL-8) and anti-inflammatory cytokines (IL-4 and IL-10) and higher ratio values (IL-6/IL-10). The level of primary LPO products, conjugated dienes, was significantly increased and level of final products TBARs values was decreased in comparison to the control. The AOD system main enzyme activity, superoxide dismutase (SOD), was decreased, while the catalase activity increased. In patients with PVI, the glutathione reduced form concentration was lower than in the control group. The results of the study in women with PVI suggest negative changes in the cytokine profile and multidirectional changes in the indicators of the LPO system state in the initial stages of the disease. The control of these changes in patients with PVI should probably be an important component of preventive measures in the initial stages of the disease.

Keywords:

Citation: Marina A. Darenskaya, Dmitry A. Stupin, Andrey A. Semendyaev, Sergey I. Kolesnikov, Natalya V. Semenova, Lyubov I. Kolesnikova. Cytokine profile and oxidative stress parameters in women with initial manifestations of pelvic venous insufficiency[J]. AIMS Medical Science, 2022, 9(3): 414-423. doi: 10.3934/medsci.2022020

Related Papers:

[1]	Zhongzheng Wang, Guangming Deng, Haiyun Xu . Group feature screening based on Gini impurity for ultrahigh-dimensional multi-classification. AIMS Mathematics, 2023, 8(2): 4342-4362. doi: 10.3934/math.2023216
[2]	Peng Lai, Mingyue Wang, Fengli Song, Yanqiu Zhou . Feature screening for ultrahigh-dimensional binary classification via linear projection. AIMS Mathematics, 2023, 8(6): 14270-14287. doi: 10.3934/math.2023730
[3]	Qingqing Jiang, Guangming Deng . Ultra-high-dimensional feature screening of binary categorical response data based on Jensen-Shannon divergence. AIMS Mathematics, 2024, 9(2): 2874-2907. doi: 10.3934/math.2024142
[4]	Renqing Liu, Guangming Deng, Hanji He . Generalized Jaccard feature screening for ultra-high dimensional survival data. AIMS Mathematics, 2024, 9(10): 27607-27626. doi: 10.3934/math.20241341
[5]	Xianwen Ding, Tong Su, Yunqi Zhang . An efficient iterative model averaging framework for ultrahigh-dimensional linear regression models with missing data. AIMS Mathematics, 2025, 10(6): 13795-13824. doi: 10.3934/math.2025621
[6]	Muhammad Aslam, Osama H. Arif . Simulating chi-square data through algorithms in the presence of uncertainty. AIMS Mathematics, 2024, 9(5): 12043-12056. doi: 10.3934/math.2024588
[7]	Changfu Yang, Wenxin Zhou, Wenjun Xiong, Junjian Zhang, Juan Ding . Single-index logistic model for high-dimensional group testing data. AIMS Mathematics, 2025, 10(2): 3523-3560. doi: 10.3934/math.2025163
[8]	Kittiwat Sirikasemsuk, Sirilak Wongsriya, Kanogkan Leerojanaprapa . Solving the incomplete data problem in Greco-Latin square experimental design by exact-scheme analysis of variance without data imputation. AIMS Mathematics, 2024, 9(12): 33551-33571. doi: 10.3934/math.20241601
[9]	Abdulhakim A. Al-Babtain, Rehan A. K. Sherwani, Ahmed Z. Afify, Khaoula Aidi, M. Arslan Nasir, Farrukh Jamal, Abdus Saboor . The extended Burr-R class: properties, applications and modified test for censored data. AIMS Mathematics, 2021, 6(3): 2912-2931. doi: 10.3934/math.2021176
[10]	Anoop Kumar, Shashi Bhushan, Abdullah Mohammed Alomair . Assessment of correlated measurement errors in presence of missing data using ranked set sampling. AIMS Mathematics, 2025, 10(4): 9805-9831. doi: 10.3934/math.2025449

Abstract

Abbreviations

PVI:

Pelvic venous insufficiency;

LPO:

Lipid peroxidation;

AOD:

Antioxidant defense;

TNF:

Tumor necrosis factor-alpha;

IL:

Interleukin;

LH:

Lipid hydroperoxides;

CDs:

Conjugated dienes;

TBARs:

Thiobarbituric acid reactants;

SOD:

Superoxide dismutase;

GSH:

Reduced glutathione;

GR:

Glutathione reductase;

GPO:

Glutathione peroxidase;

G-S-T:

Glutathione-S-transferase;

SEM:

Standard error of the mean

1. Introduction

With the development of information technology and networks, big data with "high dimensionality" as its main feature are widely appearing in medicine, economics, engineering, and other fields. In general, when the dimension $p$ of the covariable increases exponentially with sample size $n$ , we call such data ultrahigh-dimensional data ^[1]. Intuitively, the covariate dimension of ultrahigh-dimensional data far exceeds the sample size, and the traditional penalty variable screening method suffers from high computational complexity, poor statistical accuracy, and weak algorithm stability; therefore, it is urgent to develop a variable screening method that is suitable for ultrahigh-dimensional data. However, missing data is also another major problem in data analysis and it exists in ultrahigh-dimensional data. Even missing a smaller number of samples for a covariate can lead to an inability to calculate the significance of that variable, which will lead to the omission of important information and misjudgment in subsequent analysis. Therefore, determining how to directly find important variables in ultrahigh-dimensional data with randomly missing data points plays an important role in efficient data analysis.

To solve the problem of statistically modeling ultrahigh-dimensional data, J. Fan and J. Lv ^[2] first proposed sure independence screening (SIS), which measures the importance of each covariate according to the Pearson correlation coefficient between the response variable and a single covariate. Thus, the dimension of the covariates is reduced to a suitable range. To improve the use of SIS under more general assumptions, P. Hall and H. Miller ^[3] extended the generalized correlation coefficients, and G. Li et al. ^[4] proposed the robust rank correlation coefficient screening method to address transformed regression models. X. Y. Wang and C. L. Leng ^[5] proposed a feature screening method based on high-dimensional least-squares projection to further improve screening performance, considering that SIS is highly dependent on significant covariates and response variables with large marginal correlations. For the model-free assumption, which is more suitable for the era of big data, L. P. Zhu et al. ^[6] proposed a feature screening method based on covariance, namely, sure independent ranking screening. As there is no model assumption, this method can be used for models such as linear models, generalized linear models, partial linear models, and single index models. R. Li et al. ^[7] proposed a feature screening method based on the distance correlation coefficient by employing distance covariance through the use of an index to describe whether two arbitrary variables are independent. On this basis, X. Shao and J. Zhang ^[8] proposed the use of the martingale difference correlation coefficient screening method to measure the deviation of the correlation between two random variables. On the topic of feature screening for ultrahigh-dimensional discrete data, Q. Mai and H. Zou ^[9] focused on binary response variables, introduced Kolmogorov-Smirnov statistics into the feature screening framework, and proposed a variable selection method based on the Kolmogorov filter. D. Huang et al. ^[10] proposed a feature screening method based on Pearson chi-square statistics that can solve the problems of superhigh-dimensional discrete covariate data and continuous data under multiple response variables. L. Ni et al. ^[11] further considered adjusted Pearson chi-square SIS (APC-SIS) to analyze multiclass response data. P. Lai et al. ^[12] introduced Fisher linear projection and marginal score tests to construct a linear projection feature screening method for linear discriminant modeling. With the emergence of group variables, the above feature screening methods that apply to single variables are no longer applicable. W. C. Song and J. Xie ^[13] proposed a feature screening method for group data based on F statistics. Based on the assumptions of a linear model, D. Qiu and J. Ahn ^[14] proposed three methods: group SIS, group high-dimensional least squares projection, and group adjusted R-square screening. H. J. He and G. M. Deng ^[15] focused on the feature screening of discrete group data and constructed the group information entropy feature screening method based on joint information entropy. Z. Z. Wang et al. ^[16,17] further extended the role of information theory in group feature screening by using the information gain ratio and Gini coefficient. Y. L. Sang and X. Dang ^[18] used the Gini distance coefficient to measure the independence between discrete response variables and continuous covariates and constructed a group feature screening method for the Gini distance.

Missing data constitutes a widespread problem in data analysis, and ultrahigh-dimensional data are no exception. P. Lai et al. ^[19] applied the Kolmogorov filtering method to screen important covariates for the construction of propensity score functions, proposed a feature screening method for ultrahigh-dimensional data with response variables missing at random, and promoted inverse probability weighting technology to build marginal feature screening processes. Q. H. Wang and Y. J. Li ^[20] proposed missing indicator interpolation screening and feature screening methods based on a Venn diagram by using missing indicator information for response variables. X. X. Li et al. ^[21] identified key covariates by using the marginal Spearman rank correlation coefficient for conditional estimation. L. Y. Zou et al. ^[22] used the interpolation technique to process the distribution function of missing responses and adopted the distance correlation between the response distribution function and the covariate distribution function as the index for feature screening. L. Ni et al. ^[23] proposed two-stage feature screening with covariates missing at random (IMCSIS) and proved the theoretical properties of this method based on adjusted Pearson chi-square statistical feature screening.

The response variable and the covariates missing at random have been fully discussed in the feature screening of ultrahigh-dimensional data. Considering that group data are common, they also need to be taken into account in the feature screening framework. We attempted to extend the group feature screening method that is typically applied to complete data to the case of covariates missing at random to expand the application scope of the existing group feature screening method. In addition, the ultrahigh-dimensional data considered in this paper are discrete, and the application scenarios are mostly classification learning objectives. Therefore, we used the adjusted Pearson chi-square statistic and the two-stage screening procedure to improve the effectiveness of multi-classification problems in practical problems.

In this paper, we construct an ultrahigh-dimensional group feature screening method for randomly missing data and extend the feature screening method that is generally suitable for classification models. First, we define the indicator variables for missing covariates, for which it is assumed that any missing variables exist as a group structure. Second, a two-stage group feature screening method with covariates missing at random (GIMCSIS) was proposed by introducing adjusted Pearson chi-square statistics as the basic screening method. In this paper, the GIMCSIS satisfies the sure screening performance requirement. Furthermore, the performance of GIMCSIS is demonstrated via numerical simulation and empirical analysis. Specifically, compared with IMCSIS, GIMCSIS can be applied to group data and improve the feature screening performance of ultrahigh-dimensional data with covariates missing at random. In the empirical analysis, we focus on the classification model, and GIMCSIS is better than IMCSIS in terms of various classification indices.

This paper is organized as follows. Section 2 introduces two-stage group feature screening based on adjusted Pearson chi-square statistics. Then, we establish a group sure screening property. The simulation studies are given in Section 3. Section 4 provides a classification analysis, and the paper concludes with a discussion in Section 5.

2. Theory and method

2.1. Symbol and definition

First, we define the group structure data. When covariates are randomly missing, there are group covariates among both the full and partial observation covariates; therefore, we need to define new symbols and concepts. Suppose that $Y$ is a multiclass response variable with $R$ elements and that the covariate matrix $X$ is a multivariate covariate matrix with $G$ group covariates, each of which consists of one or more covariates. Considering random missing covariates in covariate data, the set of fully observed variables is defined as $U = {\left({u}_{1}, \dots, {u}_{G1}\right)}^{T}$ , and the partial set of observed variables is defined as $V = {\left({v}_{1}, \dots, {v}_{G2}\right)}^{T}$ . The covariate matrix $X$ is represented by

$X = {\left({u}_{1}, \dots , {u}_{G1}, {v}_{1}, \dots , {v}_{G2}\right)}^{T} , P = {\sum }_{k = 1}^{G1}{p}_{k}+{\sum }_{l = 1}^{G2}{q}_{l} , G = G1+G2 ,$

where $1\le k\le G1$ , $1\le l\le G2$ , and $G1$ and $G2$ are the groups of fully and partially observed covariates, respectively. $P$ represents the total number of dimensions in the covariate, ${p}_{k}$ represents the number of dimensions of the $k\mathrm{t}\mathrm{h}$ covariate in the fully observed covariate, and ${q}_{l}$ represents the number of dimensions of the $l\mathrm{t}\mathrm{h}$ covariate in the group of partially observed observation covariates. For some of the observed covariates, ${\delta }_{l}$ is used to represent the missing indicator variables. The missing state of a single variable has only 1 or 0 states, while the missing state of a group of variables is more complicated. In this paper, the missing indicator variable ${\delta }_{l}^{*}$ is defined as a fully observed covariate group only when all covariables in the group are as follows:

${\delta }_{l}^{*} = \left\{\begin{array}{cc}1, & sum\left({\delta }_{l}\right) = {q}_{l};\\ 0, & sum\left({\delta }_{l}\right)\ne {q}_{l}.\end{array}\right.$

Second, it is assumed that all covariate components of the covariate matrix $X$ are classified as $J$ , ${J}_{g}$ represents the last of the combinations between covariate classes in the $g\mathrm{t}\mathrm{h}$ group covariate matrix, and ${j}_{g}$ represents the indicator variables in the combinations between covariate classes in the $g\mathrm{t}\mathrm{h}$ group covariate matrix. If ${j}_{g} = 1$ , it is the first covariate-class combination, and so on; ${J}_{g}$ is the ${p}_{g}$ covariate class combination; and a certain covariate class combination has a classification vector representation, namely, $\left({j}_{1}, \dots, {j}_{{p}_{g}}\right)$ .

Let the probability function of the response variable be ${p}_{r} = P\left(Y = r\right)$ . A probability function with a group structure covariate can be expressed as ${w}_{{j}_{k}} = {w}_{\left({j}_{1}, \dots, {j}_{{p}_{k}}\right)} = P\left({u}_{k1} = {j}_{1}, \dots, {u}_{kz} = {j}_{z}, \dots, {u}_{k{p}_{k}} = {j}_{{p}_{k}}\right)$ and ${w}_{{j}_{l}} = {w}_{\left({j}_{1}, \dots, {j}_{{p}_{l}}\right)} = P\left({v}_{l1} = {j}_{1}, \dots, {v}_{lz} = {j}_{z}, \dots, {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right)$ represents a joint probability function of variables within a group; ${w}_{{j}_{k}}$ is a joint probability function for complete data, and ${w}_{{j}_{l}}$ is a form in the partial case of covariates. Similarly, the probability functions of the response variable with group structure covariates are as follows: ${p}_{{J}_{k}r} = {p}_{\left({j}_{1}, \dots, {j}_{{p}_{k}}\right)r} = P\left(Y = r, {u}_{k1} = {j}_{1}, \dots, {u}_{kz} = {j}_{z}, \dots, {u}_{k{p}_{k}} = {j}_{{p}_{k}}\right)$ and ${p}_{{J}_{l}r} = {p}_{\left({j}_{1}, \dots, {j}_{{p}_{l}}\right)r} = P\left(Y = r, {v}_{l1} = {j}_{1}, \dots, {v}_{lz} = {j}_{z}, \dots, {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right)$ , $1\le k\le G1$ , $1\le l\le G2$ , $r = 1, \dots, R$ , and $z = 1, \dots, {p}_{k}or1, \dots, {p}_{l}, \left\{{j}_{1}, \dots, {j}_{{p}_{g}}\right\} = 1, \dots, J$ .

2.2. Adjusted Pearson chi-square statistic

Based on the two-stage feature screening process, we propose a two-stage feature screening method for group structure data ^[23]. Specifically, we use APC-SIS to construct the group feature screening process.

The APC-SIS method first uses the adjusted Pearson chi-square statistic as the feature screening index ^[11]. The adjusted Pearson chi-square statistic for univariate analysis is given as follows:

${\Delta }_{k} = \frac{1}{log{J}_{k}}\sum _{r = 1}^{R}\sum _{j = 1}^{{J}_{k}}\frac{{\left({p}_{r}{w}_{j}^{\left(k\right)}-{\pi }_{r, j}^{\left(k\right)}\right)}^{2}}{{p}_{r}{w}_{j}^{\left(k\right)}}$

(2.1)

where ${p}_{r} = P\left(Y = r\right)$ , ${w}_{j}^{\left(k\right)} = P\left({X}_{k} = j\right)$ and ${\pi }_{r, j}^{\left(k\right)} = P\left(Y = r, {X}_{k} = j\right)$ , $r = \mathrm{1, 2}, \cdots, R, \; j = \mathrm{1, 2}, \cdots, J$ and $k = \mathrm{1, 2}, \cdots, K$ . When the response variable $Y$ is independent of the covariate ${X}_{k}$ , the product of the marginal probabilities is equal to the joint probability; then, ${p}_{r}{w}_{j}^{\left(k\right)} = {\pi }_{r, j}^{\left(k\right)}$ . When the response variable $Y$ and the covariate ${X}_{k}$ are not independent, the product of the marginal probabilities is not equal to the joint probability; the larger the difference, the stronger the correlation between $Y$ and ${X}_{k}$ . Therefore, it is easy to obtain two properties of ${\Delta }_{k}$ ; then, ${\Delta }_{k}\ge 0$ , and ${\Delta }_{k} = 0$ if and only if $Y$ is independent of ${X}_{k}$ .

By applying the above definition, we can obtain the adjusted Pearson chi-square statistics of a group of covariables under the random missing mechanism. For complete covariate data, we can directly construct the adjusted Pearson chi-square statistic of the response variable $Y$ and the complete covariate ${U}_{k}$ :

${APC}_{g}\left(Y, {U}_{k}\right) = \frac{1}{\mathit{log}{J}_{k}}{\sum }_{r = 1}^{R}{\sum }_{j = 1}^{{J}_{k}}\frac{{\left({p}_{{J}_{k}r}-{p}_{r}{w}_{{j}_{k}}\right)}^{2}}{{p}_{r}{w}_{{j}_{k}}}.$

(2.2)

For partial covariate data, the adjusted Pearson chi-square statistic of the response variable $Y$ and the complete covariate ${V}_{l}$ is calculated:

${APC}_{g}\left(Y, {V}_{l}\right) = \frac{1}{\mathit{log}{J}_{l}}{\sum }_{r = 1}^{R}{\sum }_{j = 1}^{{J}_{l}}\frac{{\left({{p}_{{j}_{l}r}-p}_{r}{w}_{{j}_{l}}\right)}^{2}}{{p}_{r}{w}_{{j}_{l}}}.$

(2.3)

Because ${w}_{{j}_{l}} = P\left({v}_{l1} = {j}_{1}, \dots, {v}_{lz} = {j}_{z}, \dots, {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right) = {\sum }_{r = 1}^{R}P\left(Y = r, {v}_{l1} = {j}_{1}, \dots, {v}_{lz} = {j}_{z}, \dots, {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right) = {p}_{{J}_{l}r}$ , when estimating ${APC}_{g}\left(Y, {V}_{l}\right)$ , compared with Eq (2.2), we need to estimate only ${p}_{{J}_{l}r}$ . Then,

${p}_{{J}_{l}r} = P\left(Y = r, {v}_{l1} = {j}_{1}, \dots , {v}_{lz} = {j}_{z}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right) \\ = \sum\limits_{u}P\left(Y = r, {U}^{{M}_{{l}_{g}}} = u\right)\cdot P\left(\left.{v}_{l1} = {j}_{1}, \dots , {v}_{lz} = {j}_{z}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right|Y = r, {U}^{{M}_{{l}_{g}}} = u\right) \\ = \sum\limits_{u}P\left(Y = r, {U}^{{M}_{{l}_{g}}} = u\right)\cdot P\left(\left.{v}_{l1} = {j}_{1}, \dots , {v}_{lz} = {j}_{z}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}\right|Y = r, {U}^{{M}_{{l}_{g}}} = u, {\delta }_{l}^{*} = 1\right) \\ = \sum\limits_{u}\frac{P\left(Y = r, {U}^{{M}_{{l}_{g}}} = u\right)\cdot P\left({v}_{l1} = {j}_{1}, \dots , {v}_{lz} = {j}_{z}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}, Y = r, {U}^{{M}_{{l}_{g}}} = u, {\delta }_{l}^{*} = 1\right)}{P\left(Y = r, {U}^{{M}_{{l}_{g}}} = u, {\delta }_{l}^{*} = 1\right)}$

(2.4)

where ${U}^{{M}_{{l}_{g}}}$ is used to link the fully observed covariate with the partially observed covariate, and the important fully observed covariate ${M}_{{l}_{g}}$ can be obtained by calculating ${APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)$ ; that is, the fully observed covariate associated with the missing information is used to replace the partially observed covariate. Therefore, ${\widehat{p}}_{{J}_{l}r}$ can be obtained from ${\widehat{M}}_{{l}_{g}}$ and Eq (2.4).

2.3. Two-stage group feature screening method

The two-stage group feature screening with covariates missing at random (GIMCSIS) uses the missing indicator variable as a bridge between the partially observed covariate and the response variable. Feature screening of the fully observed covariates and partially observed covariates is carried out, and the fully observed covariate information is used to replace the partially observed covariate information to realize group feature screening of the partially observed covariates. The screening process is divided into two steps:

Step 1: To map the partially observed variable information to the fully observed covariates, the fully observed covariates associated with the missing indicator variable are considered, and the partially observed covariates are replaced by the information of the fully observed covariates. Specifically, for each of the observed covariates that are missing indicator variables, the adjusted Pearson chi-square statistic is calculated as follows:

${\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) = \frac{1}{\mathit{log}{J}_{k}}{\sum }_{r = 1}^{R}{\sum }_{{j}_{k}}^{{J}_{k}}\frac{{\left({\sum }_{i = 1}^{n}I\left({\delta }_{i, l}^{*} = r, {u}_{k1} = {j}_{1}, \dots , {u}_{k{p}_{k}} = {j}_{{p}_{k}}\right)-{\sum }_{i = 1}^{n}I\left({\delta }_{i, l}^{*} = r\right){\widehat{w}}_{{j}_{k}}\right)}^{2}}{n{\sum }_{i = 1}^{n}I\left({\delta }_{i, l}^{*} = r\right){\widehat{w}}_{{j}_{k}}}$

(2.5)

where ${w}_{{j}_{k}} = {w}_{\left({j}_{1}, \dots, {j}_{{p}_{k}}\right)} = P\left({u}_{k1} = {j}_{1}, \dots, {u}_{kz} = {j}_{z}, \dots, {u}_{k{p}_{k}} = {j}_{{p}_{k}}\right)$ and $r = \mathrm{0, 1}$ . The active covariates are estimated by applying the following thresholds:

${\widehat{{M}}}_{{l}_{g}} = \left\{k:{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) > {c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, 1\le k\le {G}_{1}\right\}$

where ${c}_{{\delta }_{l}^{*}}$ and ${\tau }_{{\delta }_{l}^{*}}$ are predetermined constants that are defined in Condition (C4) in Section 2.4.

Step 2: On the basis of obtaining ${\widehat{{M}}}_{{l}_{g}}$ , using ${U}^{{M}_{{l}_{g}}}$ to replace partially observed covariates, the adjusted Pearson chi-square statistics of the response variable and partially observed covariates are obtained via the following process:

${\widehat{APC}}_{g}\left(Y, {V}_{l}\right) = \frac{1}{\mathit{log}{J}_{l}}{\sum }_{r = 1}^{R}{\sum }_{j = 1}^{{J}_{l}}\frac{{\left({\widehat{p}}_{{j}_{l}r}-{\widehat{p}}_{r}{\widehat{w}}_{{j}_{l}}\right)}^{2}}{{\widehat{p}}_{r}{\widehat{w}}_{{j}_{l}}}$

(2.6)

where

${\widehat{p}}_{{j}_{l}r} = \frac{1}{n}\sum\limits_{u}\frac{{\sum }_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{{M}}}_{{l}_{g}}} = u\right){\sum }_{i = 1}^{n}I\left({v}_{l1} = {j}_{1}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}, {y}_{i} = r, {u}_{i}^{{\widehat{{M}}}_{{l}_{g}}} = u, {\delta }_{i, l}^{*} = 1\right)}{{\sum }_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{{M}}}_{{l}_{g}}} = u, {\delta }_{i, l}^{*} = 1\right)}$

${\widehat{w}}_{{j}_{l}} = {{\sum }_{r = 1}^{R}\widehat{p}}_{{j}_{l}r}$ and ${\widehat{p}}_{r} = {n}^{-1}{\sum }_{i = 1}^{n}I\left({y}_{i} = r\right)$ . The sum of all ${\widehat{p}}_{{j}_{l}r}$ values is equivalent to all possible values of $u$ in the set ${U}^{{\widehat{{M}}}_{l}}$ . In practice, the summation term ${\sum }_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{{M}}}_{{l}_{g}}} = u, {\delta }_{i, l}^{*} = 1\right)$ in the case of a given u value is 0 when the number of covariates in ${\widehat{{M}}}_{l}$ is large enough. Thus, $\mathit{log}{J}_{l}$ is used to adjust the Pearson chi-square statistics, yielding ${\sum }_{r = 1}^{R}{\sum }_{j = 1}^{{J}_{l}}{\widehat{p}}_{{j}_{l}r} = 1$ .

For the fully observed covariates, we obtain the active covariate directly by using the adjusted Pearson chi-square statistic. For the partially observed covariates, we obtain the active covariate according to Steps 1 and 2. Therefore, the active covariates in the dataset can be estimated as follows:

${\left(U, V\right)}^{\widehat{D}} = \left\{{U}_{k}, {V}_{l}:{\widehat{APC}}_{g}\left(Y, {U}_{k}\right) > c{n}^{-\tau }, {\widehat{APC}}_{g}\left(Y, {V}_{l}\right) > c{n}^{-\tau }, 1\le k\le p, 1\le l\le q\right\}$

where $c$ and $\tau$ are predetermined constants.

In practice, we replace ${\widehat{{M}}}_{{l}_{g}}$ with

${\widehat{{M}}}_{{l}_{g}}^{*} = \left\{k:{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\;\mathrm{i}\mathrm{s}\;\mathrm{t}\mathrm{h}\mathrm{e}\;\mathrm{l}\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{e}\mathrm{s}\mathrm{t}\;{d}_{l}\;\mathrm{o}\mathrm{f}\;\mathrm{a}\mathrm{l}\mathrm{l}\;{U}_{k}\right\}$

and replace ${\left(U, V\right)}^{\widehat{D}}$ with the following method:

${\left(U, V\right)}^{{\widehat{D}}^{*}} = \left\{{U}_{k}, {V}_{l}:\begin{array}{ccc}{APC}_{g}\left(Y, {U}_{k}\right)& \text{or} & {APC}_{g}\left(Y, {V}_{l}\right)\end{array}\;\mathrm{i}\mathrm{s}\;\mathrm{t}\mathrm{h}\mathrm{e}\;\mathrm{l}\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{e}\mathrm{s}\mathrm{t}\;d\;\mathrm{o}\mathrm{f}\;\mathrm{a}\mathrm{l}\mathrm{l}\;{U}_{k}\;\mathrm{o}\mathrm{r}\;{V}_{l}\right\}$

2.4. Sure screening property

Next, we establish the theoretical property of the proposed GIMCSIS. For feature screening, sure screening properties are essential and they were proposed by J. Fan and J. Lv ^[2]. After applying a feature screening procedure with a probability tending to 1, all of the important variables still survive. It is important to identify the conditions under which the sure screening property holds, i.e.,

$P\left({\mathcal{M}}_{\mathcal{*}}\subseteq {\mathcal{M}}_{\gamma }\right)\to 1\quad as\;n\to \infty$

where ${\mathcal{M}}_{\gamma }$ is the final model after feature screening and ${\mathcal{M}}_{\mathcal{*}}$ is the true model.

Therefore, to explore the sure screening property of GIMCSIS, the following regularity conditions are assumed.

(1) There are two positive constants ${c}_{1}$ and ${c}_{2}$ , such that $\frac{{c}_{1}}{R}\le {p}_{r}\le \frac{{c}_{2}}{R}$ , $0\le {w}_{{j}_{g}}^{{U}_{k}}\le \frac{{c}_{2}}{J}$ and $0\le {w}_{{j}_{g}}^{{V}_{l}}\le \frac{{c}_{2}}{J}$ ; for $r = 1, \dots, R$ , $j = 1, \dots, {J}_{{U}_{k}}$ , $l = 1, \dots, q, k = 1, \dots, p$ , and $J = {\mathit{max}}_{1\le k\le p, 1\le l\le q}\left\{{J}_{{U}_{k}}, {J}_{{V}_{l}}\right\}$ .

(2) There are two constants $c > 0$ and $0 < \tau < \frac{1}{2}$ , such that

$\underset{{U}_{k}, {V}_{l}\in {\left(U, V\right)}^{D}}{min}\left\{{APC}_{g}\left(Y, {U}_{k}\right), {APC}_{g}\left(Y, {V}_{l}\right)\right\} > 2c{n}^{-\tau }.$

(3) There are two positive constants ${c}_{3}$ and ${c}_{4}$ , such that $0 < {c}_{3}\le P\left({\delta }_{l}^{*} = r\right)\le {c}_{4} < 1$ ; for $r = \mathrm{0, 1}$ , $l = 1, \dots, q$ .

(4) For each $1\le l\le q$ , there are two constants ${c}_{{\delta }_{l}^{*}} > 0$ and $0 < {\tau }_{{\delta }_{l}^{*}} < \frac{1}{2}$ , such that

${\mathit{min}}_{k\in {M}_{l}}{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) > \frac{3}{2}{c}_{{\delta }_{l}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}$

where ${APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) = {\left(\mathit{log}{J}_{k}\right)}^{-1}{\sum }_{j = 1}^{{J}_{g}}{\sum }_{r = 0}^{1}{\left\{P\left({\delta }_{l}^{*} = r, {U}_{k} = j\right)-P\left({\delta }_{l}^{*} = r\right)P\left({U}_{k} = j\right)\right\}}^{2}/P\left({\delta }_{l}^{*} = r\right)/P\left({U}_{k} = j\right)$ .

(5) There are two positive constants ${c}_{5}$ and ${c}_{6}$ , such that $\frac{{c}_{5}}{2R{J}^{\bar{m}}}\le P\left(Y = r, {U}^{{\bar{M}}_{l}} = u, {\delta }_{l}^{*} = 1\right)\le \frac{{c}_{6}}{2R{J}^{\bar{m}}}$ , where ${\bar{M}}_{l} = \left\{k:{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) > \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, 1\le k\le G1\right\}$ , $\bar{m} = {{\mathit{max}}_{1\le l\le q}‖{U}^{{\bar{M}}_{l}}‖}_{0}$ , $r = 1, \dots, R$ , and $l = 1, \dots, q$ .

(6) $R = O\left({n}^{\xi }\right)\mathrm{a}\mathrm{n}\mathrm{d}J = O\left({n}^{\kappa }\right)$ , where $\xi \ge 0$ , $\kappa \ge 0$ , $1-2\tau -6\xi -18\kappa > 0$ , $1-2{\tau }_{\delta }-18\kappa > 0$ , $1-2\tau -10\xi -\left(18\bar{m}+18\right)\kappa > 0$ , and ${\tau }_{\delta } = {\mathit{max}}_{1\le l\le G2}{\tau }_{{\delta }_{l}^{*}}$ .

Conditions (1)–(5) are commonly used in the study of feature screening ^{[2,7,10,11,23]}. Condition (1) ensures that the proportion of each class of response variables and covariates is not too small or too large, i.e., that the class of variables is balanced. Condition (2) requires that the smallest real signal converges to zero at the ${n}^{-\tau }$ rate at which the sample size reaches infinity. Condition (3) requires that the missing proportion be bounded. Condition (4) ensures the sure screening performance of the APC-SIS in the first step of screening. Condition (5) ensures that the denominator of Eq (2.4) is not 0, and the such a condition can be satisfied when the magnitude of ${\bar{M}}_{l}$ is small. Condition (6) requires that the divergence rate be much less than the growth rate of $n$ .

Theorem 1. (Sure screening property) Under Conditions (1)–(6), if $\mathit{log}p = O\left({n}^{\alpha }\right)$ , $\mathit{log}q = O\left({n}^{\beta }\right)$ , $\alpha < 1-2\tau -6\xi -18\kappa$ , $\beta < 1-2\tau -10\xi -\left(18\bar{m}+18\right)\kappa$ and $\alpha +\beta < 1-2{\tau }_{\delta }-18\kappa$ , such that

$P\left({\left(U, V\right)}^{D}\subseteq {\left(U, V\right)}^{\widehat{D}}\right)\ge 1-O\left(pexp\left(-{b}_{1}{n}^{1-2\tau -6\xi -18\kappa }+\left(\xi +\kappa \right)logn\right)\right)\\-O\left(pqexp\left(-{b}_{2}{n}^{1-2{\tau }_{\delta }-18\kappa }+\left(\xi +2\kappa \right)logn\right)\right)\\-O\left(qexp\left(-{b}_{3}{n}^{1-2\tau -10\xi -\left(18\stackrel{-}{m}+18\right)\kappa }+\left(\xi +\left(\stackrel{-}{m}+1\right)\kappa \right)logn\right)\right)$

where ${b}_{1}$ , ${b}_{2}$ and ${b}_{3}$ are constants. Therefore, GIMCSIS has the sure screening property.

Remark 1. To explore the feature screening of missing covariates in ultra-high dimensional data with group structuring, it is easier to make better use of the information of covariates missing at random. Inspired by L. Ni et al. ^[23], we propose group feature screening for ultrahigh-dimensional data with categorical covariates missing at random (GIMCSIS). GIMCSIS expands the scope of IMCSIS, and it further improves the performance of classification learning. Regarding the screening performance theory, compared to IMCSIS, GIMCSIS has a higher probability of screening important variables (see Theorem 1). Theorem 1 is also confirmed in Section 3.

3. Simulation studies

To verify the feature screening performance of GIMCSIS, we generated a series of simulation data for relevant experiments. Simulations 1 and 2 are compared with IMCSIS ^[23] from two perspectives: The binary response variable and the multiclass response variable. The computer configuration is as follows: CPU, Intel i5-3230M (2.6 GHz); memory, 16 GB; and operating system, Windows 10. The feature screening was implemented by using R version 4.2.2 programming, and the RStudio interactive programming interface was used.

The metrics used to evaluate the performance of feature screening include the following steps:

Index	Description
MMS (minimum model size to include all active covariates)	The position of the last active covariate among all active covariates is usually represented by the 5%, 25%, 50%, 75%, and 95% quantiles of the MMS obtained from multiple experiments, which is used to illustrate that when MMS is similar to the number of active covariates, the model results are better.
CP (coverage probability)	The proportion of active covariates covered in a certain interval accounted for all active covariates, and the interval was used to calculate the coverage probability in many studies with three strong and weak intervals $\left[n/log\left(n\right)\right]$ , $2\cdot \left[n/log\left(n\right)\right], \mathrm{ }3\cdot \left[n/log\left(n\right)\right]$ ; also, the coverage probability under the three intervals was defined as CP1, CP2 and CP3 to illustrate the convergence performance of the model.
CPa (all-coverage probability)	The probability of all active variables in the interval $3\cdot \left[n/log\left(n\right)\right]$ in multiple experiments.

3.1. Simulation 1: Binary responses

On the basis of complete covariates, 40% of the covariates were defined as partially observed covariates. A simple model in which all covariables are multiclass and the response variables are binary is defined. The settings for the response variables ${y}_{i}$ , latent variables ${z}_{i}$ and covariables ${x}_{i}$ are as follows:

	Response variable ${y}_{i}$
Balanced data	${p}_{r}=P\left({y}_{i}=1\right)=P\left({y}_{i}=2\right)=1/2$
Unbalanced data	${p}_{r}=2\left[1+\frac{\left(R-r\right)}{\left(R-1\right)}\right]/3R$ with ${\mathit{max}}_{1\le r\le R}{p}_{r}=2{\mathit{min}}_{1\le r\le R}{p}_{r}$
	Latent variable ${z}_{i}$ , ${z}_{i, k}\sim N\left({\mu }_{rk}, 1\right)$ , $1\le k\le p$ .
$k > {d}_{0}$	${\mu }_{rk}=0$
$k\le {d}_{0}$ and $r=1$	${\mu }_{rk}=-0.5$ (active covariate)
$k\le {d}_{0}$ and $r=2$	${\mu }_{rk}=0.5$ (active covariate)
	Covariable ${x}_{i}$
$k$ is odd	${x}_{i, k}=I\left({z}_{i, k} > {z}_{\left(j/2\right)}\right)+1$
$k$ is even	${x}_{i, k}=I\left({z}_{i, k} > {z}_{\left(j/5\right)}\right)+1$

where ${d}_{0}$ is the size of the active covariables and the first to tenth covariates are active covariates. In the IMCSIS method, the indicator of the active covariate is ${d}_{0} = 10$ . In the GIMCSIS method, the number of variables in each group is 3, and the indicator of the active covariate is ${d}_{0G} = 4$ . When ${\mu }_{rk} = -0.5$ or ${\mu }_{rk} = 0.5$ , the $k\mathrm{t}\mathrm{h}$ covariable is active. ${z}_{\left(\alpha \right)}$ is the $\alpha \mathrm{t}\mathrm{h}$ standard normal distribution quantile, and it is used to discretize the covariates ${x}_{i}$ .

Therefore, for a $p$ -dimensional covariate, half of the covariates are categorical variables and the other half are categorical variables. The ratio of complete covariates to partial covariates is defined as 6:4; the first 60% of the covariates make up complete data, and the others constitute missing data. The random missing proportions $mp$ were set to 10%, 25% and 40%, and the missing indicator variable ${\delta }_{i, l}$ was generated from the Bernoulli distribution ${\delta }_{i, l} \sim B\left(\mathrm{1, 1}-mp\right)$ . The dimensions of the covariates were set to 1000, the full covariates to 600, the partial covariates to 400, and the sample sizes to 100,120, and 150, (see Table 1).

Table 1. Results for binary response variables and multi-classification covariates.

$mp$		Method	CP1	CP2	CP3	CPa	mms.5.	mms.25.	mms.50.	mms.75.	mms.95.
n=100, P=1000, p=600, q=400, balanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.00
		IMCSIS	0.85	0.90	0.90	0.04	55.90	73.25	99.00	119.25	140.05
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	5.55
		IMCSIS	0.79	0.92	0.95	0.56	21.90	30.00	41.50	61.75	105.40
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.00
		IMCSIS	0.85	0.90	0.90	0.04	55.90	73.25	99.00	119.25	140.05
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	6.10
		IMCSIS	0.72	0.87	0.92	0.36	25.45	41.25	63.00	87.50	156.40
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.00
		IMCSIS	0.85	0.90	0.90	0.04	55.90	73.25	99.00	119.25	140.05
	Step 2	GIMCSIS	0.99	1.00	1.00	1.00	4.00	4.00	4.00	5.00	13.65
		IMCSIS	0.62	0.78	0.85	0.10	44.80	61.00	102.50	136.00	187.75
n=120, P=1000, p=600, q=400, balanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.94	1.00	1.00	1.00	13.45	17.00	19.50	22.00	27.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	6.00	7.00
		IMCSIS	0.89	0.98	0.99	0.94	15.45	18.25	24.00	29.00	52.30
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.94	1.00	1.00	1.00	13.45	17.00	19.50	22.00	27.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.75	8.55
		IMCSIS	0.80	0.94	0.98	0.80	18.90	27.25	35.00	49.75	81.65
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.94	1.00	1.00	1.00	13.45	17.00	19.50	22.00	27.00
	Step 2	GIMCSIS	0.97	0.99	0.99	0.96	4.00	4.00	6.00	11.75	29.05
		IMCSIS	0.68	0.86	0.92	0.34	33.90	43.50	72.00	103.00	151.50
n=150, P=1000, p=600, q=400, balanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	1.00	1.00	1.00	1.00	10.00	10.00	10.50	11.00	12.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.99	1.00	1.00	1.00	12.00	14.00	15.50	18.00	23.55
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	1.00	1.00	1.00	1.00	10.00	10.00	10.50	11.00	12.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.55
		IMCSIS	0.93	0.99	1.00	1.00	13.45	17.00	23.00	30.00	46.20
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	1.00	1.00	1.00	1.00	10.00	10.00	10.50	11.00	12.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	9.00
		IMCSIS	0.84	0.95	0.98	0.78	18.45	25.00	34.50	57.25	124.90
n=100, P=1000, p=600, q=400, unbalanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.77	0.85	0.88	0.00	63.00	84.50	120.00	159.75	181.55
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	6.00	8.00
		IMCSIS	0.71	0.89	0.94	0.52	22.90	40.25	47.00	66.00	82.50
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.77	0.85	0.88	0.00	63.00	84.50	120.00	159.75	181.55
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.65	0.82	0.89	0.28	29.35	46.00	68.50	94.50	210.25
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.77	0.85	0.88	0.00	63.00	84.50	120.00	159.75	181.55
	Step 2	GIMCSIS	0.94	0.98	1.00	1.00	5.00	6.25	9.00	15.50	33.20
		IMCSIS	0.53	0.75	0.83	0.10	40.90	65.75	96.00	157.75	245.85
n=120, P=1000, p=600, q=400, unbalanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.99	1.00	1.00	1.00	11.00	13.00	14.00	15.75	19.55
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	6.00	8.00
		IMCSIS	0.82	0.97	0.99	0.94	20.00	23.00	28.50	36.50	62.45
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.99	1.00	1.00	1.00	11.00	13.00	14.00	15.75	19.55
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	11.10
		IMCSIS	0.74	0.92	0.97	0.74	21.45	29.00	38.50	54.50	77.50
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.99	1.00	1.00	1.00	11.00	13.00	14.00	15.75	19.55
	Step 2	GIMCSIS	0.96	0.98	0.99	0.96	4.00	5.25	8.00	17.00	38.65
		IMCSIS	0.61	0.81	0.89	0.24	37.70	55.50	72.00	94.00	206.60
n=150, P=1000, p=600, q=400, unbalanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.55
		IMCSIS	1.00	1.00	1.00	1.00	10.00	10.00	10.00	11.00	12.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.92	0.99	1.00	0.98	17.00	20.00	24.50	33.25	50.55
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.55
		IMCSIS	1.00	1.00	1.00	1.00	10.00	10.00	10.00	11.00	12.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.86	0.97	0.99	0.86	15.45	22.25	31.50	44.25	148.40
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.55
		IMCSIS	1.00	1.00	1.00	1.00	10.00	10.00	10.00	11.00	12.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.00
		IMCSIS	0.79	0.90	0.94	0.52	23.00	40.00	61.00	101.75	261.25

| Show Table

DownLoad: CSV

Table 1 reports the values of each index in 50 experiments simulated with a normal distribution. The active covariate size based on the GIMCSIS model is 4, and the active covariate size based on the IMCSIS model is 10. In the first stage, regardless of the missing proportions, GIMCSIS outperforms IMCSIS. In the second stage, the covariate is affected by random missing data, with the following results:

(1) Comparison of different sample sizes: As the sample size increases, the minimum model size (MMS) of GIMCSIS approaches the active covariate size ${d}_{0G} = 4$ , and that of IMCSIS approaches the active covariate size ${d}_{0} = 10$ . However, GIMCSIS tends to converge faster than the set of active covariates, and the quantile of the MMS is almost equal to 4 when $n$ = 150, while the quantile of the MMS for IMCSIS is significantly larger than the size of the active covariates. In both models, the four indicators covering the probability tend to be equal to 1. However, the coverage probability of the IMCSIS model is much weaker than that of the GIMCSIS model. GIMCSIS can achieve a better coverage probability when $n$ = 100, while IMCSIS can achieve a better coverage probability when $n$ = 150.

(2) Comparison of different response variables: The structure of the response variables considers both balanced and unbalanced data and is used to compare the anti-interference capacities of the methods. In general, the performance of finite samples under balanced data is better than that under unbalanced data. Among them, GIMCSIS with unbalanced data can achieve excellent MMS and coverage probability under the condition of $n$ = 50, while IMCSIS has better screening performance under the condition of $n$ = 150. Furthermore, GIMCSIS has stronger anti-interference capacities than IMCSIS.

(3) Comparison of different missing proportions: As the missing data proportion increases, the MMS quantiles of both methods decrease. The MMS variation in IMCSIS is larger than that in GIMCSIS. Moreover, the coverage probability of IMCSIS decreases significantly, while the coverage probability of GIMCSIS remains as 1. The above results indicate that the performance of IMCSIS decreases rapidly, while that of GIMCSIS is relatively stable when the missing data proportion continues to increase.

In summary, GIMCSIS has better ability to screen active covariates than IMCSIS in ultrahigh-dimensional data with binary response variables and covariates missing at random. The screening performance of IMCSIS decreases significantly in the second stage of screening, and the smaller the sample size, the worse the screening performance. The performance of GIMCSIS in the second stage of screening is similar to that in the first stage of screening, and it maintains a high coverage probability. For unbalanced data, the performance of both GIMCSIS and IMCSIS methods decreases, but GIMCSIS is significantly robust.

3.2. Simulation 2: Multiclass responses

Consider a complex model with more classes of covariates and four classes of response variables. Two ${y}_{i}$ distributions are considered the same as those in Simulation 1.

The first to the 11th covariates are active covariates. In the IMCSIS method, the indicator of the active covariate is ${d}_{0} = 11$ ; in the GIMCSIS method, when the number of variables in each group is 3, the indicator of the active covariate is ${d}_{0G} = 4$ , which is the same as the Simulation 1 screening target, but the composition of group variables is different. In the case of ${y}_{i}$ , Simulation 2 for the generation of latent variable data and active covariates is the same as Simulation 1.

Therefore, the $p$ -dimensional covariates are evenly divided into two and five classes, respectively. The ratio of complete covariates to partial covariates is defined as 6:4; the first 60% of the covariates make up complete data, and the others constitute partial data. The random missing proportions $mp$ are set to 10%, 25% and 40%, and the missing indicator variable ${\delta }_{i, l}$ is generated from the Bernoulli distribution ${\delta }_{i, l} \sim B\left(\mathrm{1, 1}-mp\right)$ . The number of dimensions of the covariate are considered to be 1000 dimensions, where the full covariate dimension is 600 dimensions, the partial covariate dimension is 400 dimensions, and the sample numbers are 50, 80 and 100, (see Table 2).

Table 2. Results for multivariate response variables and multi-classification covariates.

$mp$		Method	CP1	CP2	CP3	CPa	mms.5.	mms.25.	mms.50.	mms.75.	mms.95.
n=50, P=1000, p=600, q=400, balanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.82	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.55
		IMCSIS	0.65	0.87	0.94	0.60	13.00	17.25	26.00	35.00	64.55
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.82	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	7.55
		IMCSIS	0.56	0.76	0.86	0.24	14.35	31.00	42.00	63.00	211.50
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.82	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	0.78	0.90	0.91	0.66	4.00	8.25	12.00	46.00	103.50
		IMCSIS	0.46	0.63	0.72	0.04	32.45	54.25	86.50	151.00	259.10
n=80, P=1000, p=600, q=400, balanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.98	1.00	1.00	1.00	11.00	11.00	12.00	13.00	16.55
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.93	0.99	1.00	1.00	11.00	11.25	13.00	19.00	27.00
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	5.00
		IMCSIS	0.85	0.98	0.99	0.92	11.00	14.00	21.00	25.00	40.55
n=100, P=1000, p=600, q=400, balanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	12.00	12.00
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.99	1.00	1.00	1.00	11.00	11.00	11.00	12.00	15.55
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	1.00	1.00	1.00	1.00	11.00	11.00	11.00	11.00	11.00
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	4.00	4.00
		IMCSIS	0.99	1.00	1.00	1.00	11.00	12.00	13.00	15.00	20.55
n=50, P=1000, p=600, q=400, unbalanced
10%	Step 1	GIMCSIS	0.88	1.00	1.00	1.00	5.45	7.00	9.00	11.00	14.55
		IMCSIS	0.35	0.53	0.60	0.00	218.90	277.25	360.00	411.50	455.75
	Step 2	GIMCSIS	0.87	0.99	1.00	1.00	5.00	7.00	9.50	12.75	17.55
		IMCSIS	0.34	0.41	0.47	0.00	177.75	226.50	268.00	348.75	387.95
25%	Step1	GIMCSIS	0.88	1.00	1.00	1.00	5.45	7.00	9.00	11.00	14.55
		IMCSIS	0.35	0.53	0.60	0.00	218.90	277.25	360.00	411.50	455.75
	Step2	GIMCSIS	0.63	0.87	0.93	0.78	7.45	12.00	17.00	25.00	82.65
		IMCSIS	0.29	0.36	0.42	0.00	187.00	248.00	286.00	323.50	379.85
40%	Step 1	GIMCSIS	0.88	1.00	1.00	1.00	5.45	7.00	9.00	11.00	14.55
		IMCSIS	0.35	0.53	0.60	0.00	218.90	277.25	360.00	411.50	455.75
	Step 2	GIMCSIS	0.13	0.28	0.49	0.02	34.00	50.00	75.00	141.00	246.80
		IMCSIS	0.26	0.31	0.36	0.00	153.00	254.25	300.50	359.75	394.10
n=80, P=1000, p=600, q=400, unbalanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	5.00	7.00
		IMCSIS	0.64	0.81	0.89	0.08	37.90	63.25	86.00	111.75	183.30
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.00
		IMCSIS	0.52	0.68	0.75	0.02	44.90	61.25	84.00	138.75	227.05
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	5.00	7.00
		IMCSIS	0.64	0.81	0.89	0.08	37.90	63.25	86.00	111.75	183.30
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	7.00	11.55
		IMCSIS	0.44	0.61	0.72	0.02	49.25	90.50	147.00	218.00	321.05
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	5.00	5.00	7.00
		IMCSIS	0.64	0.81	0.89	0.08	37.90	63.25	86.00	111.75	183.30
	Step 2	GIMCSIS	0.84	0.94	0.98	0.92	7.00	9.00	11.00	26.00	44.10
		IMCSIS	0.38	0.50	0.60	0.00	90.45	123.50	192.50	277.25	353.35
n=100, P=1000, p=600, q=400, unbalanced
10%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.75	0.82	0.88	0.02	59.15	79.00	100.50	126.75	186.75
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.68	0.85	0.93	0.36	27.45	42.00	53.50	65.00	111.10
25%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.75	0.82	0.88	0.02	59.15	79.00	100.50	126.75	186.75
	Step 2	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	8.55
		IMCSIS	0.59	0.77	0.86	0.14	38.80	59.25	83.50	126.00	215.95
40%	Step 1	GIMCSIS	1.00	1.00	1.00	1.00	4.00	4.00	4.00	5.00	6.55
		IMCSIS	0.75	0.82	0.88	0.02	59.15	79.00	100.50	126.75	186.75
	Step 2	GIMCSIS	0.98	1.00	1.00	1.00	4.00	5.00	7.00	9.00	25.10
		IMCSIS	0.49	0.66	0.77	0.02	59.95	85.75	137.50	198.75	328.85

| Show Table

DownLoad: CSV

Table 2 reports the values of each index in 50 experiments simulated with a normal distribution. The active covariate size based on the GIMCSIS model is 4, and the active covariate size based on the IMCSIS model is 11. In the first stage, GIMCSIS outperforms IMCSIS, regardless of the missing proportions. In the second stage, covariates are affected by random missing data, with the following results:

(1) Comparison of different sample sizes: As the sample size increases, the MMS of GIMCSIS approaches the active covariate size ${d}_{0G} = 4$ , and the MMS of IMCSIS approaches the active covariate size ${d}_{0} = 11$ . However, GIMCSIS tends to converge faster than the set of active covariates, and the quantile of the MMS is almost equal to 4 when $n$ = 100, while the quantile of the MMS in IMCSIS is significantly larger than the size of the active covariates. In contrast to the simulation results for binary response variables, the coverage probability of IMCSIS is much lower than that of binary response variables when the four indices all converge to 1, and the active variables that were screened out have an obvious trailing phenomenon. Although GIMCSIS also exhibited a similar phenomenon, the four coverage probability indicators are almost 1 when $n$ = 80.

(2) Comparison of different response variables: The structure of the response variables considers both balanced and unbalanced data and is used to compare the anti-interference capacities of the methods. In general, the performance of finite samples under balanced data is better than that under unbalanced data. Among them, GIMCSIS of unbalanced data can achieve a good MMS and coverage probability when $n$ = 80, while the coverage probability of IMCSIS is far from the qualified requirement even when $n$ = 100. Furthermore, GIMCSIS has stronger anti-interference capacities than IMCSIS.

(3) Comparison of different missing proportions: As the missing data proportion increases, the quantile of the MMS for both methods decreases. The amplitude of the MMS variation in the IMCSIS dataset is larger than that in the GIMCSIS dataset, and the 75% and 95% quantiles of the IMCSIS dataset are too far from the range of active covariates. Moreover, the coverage probability of IMCSIS decreases significantly, while the coverage probability of GIMCSIS remains as 1. The above results indicate that the performance of IMCSIS decreases rapidly, while that of GIMCSIS is relatively stable when the missing data proportion continues to increase. Therefore, GIMCSIS can obtain stable screening results more effectively.

(4) Comparison with binary response variables: In Simulation 1, although the performance of IMCSIS is not as good as that of GIMCSIS, it can achieve better performance when there is a large sample size. However, in Simulation 2, the response variable is only increased from binary to four categories, and IMCSIS cannot achieve the screening goal. This shows that GIMCSIS has unique advantages in the case of more general multiple response variables.

In summary, GIMCSIS also exhibits better screening performance than IMCSIS under the conditions of ultrahigh-dimensional data with a multivariate response and covariates missing at random. The screening performance of GIMCSIS remains robust. Compared with IMCSIS, GIMCSIS has advantages in terms of a small sample size, an unbalanced response and a high deletion rate.

4. Empirical analysis

Section 3 illustrates the performance of GIMCSIS on simulated data. In practical application, whether important variables obtained via GIMCSIS can play a role in data analysis is an important issue that needs further verification. In this section, we describe the application of GIMCSIS to the predata analysis process of imbalanced data classification to test whether the important variables obtained can improve the effectiveness of classification problems.

The empirical data were obtained from the Arizona State University feature selection database (http://featureselection.asu.edu/), which contains colon cancer data consisting of 62 instances and 2000 covariates. Forty samples were negative for colon cancer, and the other 22 samples were positive for colon cancer, for an imbalance of 1.81:1. The 2000 covariates were based on the expression levels, and 2000 out of the 6500 genes were differentially expressed. Thus, the response is binary, and the covariates are continuous. There are group correlations between gene expression.

To evaluate the effectiveness of the various feature screening methods for classification problems, the average accuracy rate of the evaluation indicators was calculated via fivefold cross-validation. First, 62 samples were randomly divided into two groups according to the ratio of 1:4. Eighty percent of the samples were used as training data, and the rest were used as test data. The sample size of the training data was 50, the sample size of the test data was 12, and the covariate dimension of both datasets was 2000. The feature screening methods used were GIMCSIS and IMCSIS, where the number of active covariables in the univariate feature screening was ${d}_{0} = 22$ and the number of active covariables in the group variable feature screening was ${d}_{0G} = 8$ . Considering the effect of covariates on classification, we chose three classification models: Support vector machine (SVM) ^[24], decision tree (DT) ^[25] and k-nearest neighbor (KNN) ^[26].

The evaluation indices for the classification effect are derived from the confusion matrix, and the details are as follows. The confusion matrix is as below.

		Actual
		Positive	Negative	Total
Prediction	Positive	TP	FP	TP+FP
	Negative	FN	TN	FN+TN
	Total	TP+FN	FP+TN	TP+FP+FN+TN

where TP is the true positive, FN is the false negative, FP is the false positive and TN is the true negative. Table 3 shows all of the evaluation indices based on the confusion matrix.

Table 3. Description of evaluation index.

Index	Description
$Accuracy$	$Accuracy=\frac{TP+TN}{TP+FP+TN+FN}$
$Precision$	$Precision=\frac{TP}{TP+FP}$
$Recall$	$Recall=\frac{TP}{TP+FN}$ $Specificity=\frac{TN}{TN+FP}$
$Specificity$	$Recall=\frac{TP}{TP+FN}$ $Specificity=\frac{TN}{TN+FP}$
$G-mean$	$G-mean=\sqrt{Recall\times Specificity}$
$F-measure$	$F-measure=\frac{2\times Precision\times Recall}{Precision+Recall}$

| Show Table

DownLoad: CSV

Table 4 reports the classification performance of the various feature screening methods on the training and test data. Overall, the classification effect of GIMCSIS was better than that of IMCSIS, with outstanding performance in terms of the recall, G-mean and F-measure. The classification method with the best classification effect was KNN. The average accuracy on the training set based on GIMCSIS was greater than 98%, and the average accuracy on the test set was greater than 88%. According to the test data, the G-mean and F-measure based on GIMCSIS were superior to those based on IMCSIS. Regarding the SVM, the average evaluation index for the GIMCSIS dataset was 1.64% greater than that for the IMCSIS dataset on the training data, and the average evaluation index for the GIMCSIS dataset was 5.84% greater than that for the IMCSIS dataset on the test data, indicating that the classification performance of the GIMCSIS dataset was better than that of the IMCSIS dataset. For DT, the average evaluation index for the GIMCSIS dataset was only 0.22% greater than that for the IMCSIS dataset on the training data, but the average evaluation index for the GIMCSIS dataset was 9.55% greater than that for the IMCSIS dataset on the test data, indicating that IMCSIS exacerbated the overfitting phenomenon. For KNN, the average evaluation index for GIMCSIS was 3.14% greater than that for IMCSIS on the training data, and the average evaluation index for GIMCSIS was 1.65% greater than that for IMCSIS on the test data, indicating that GIMCSIS affected the underfitting phenomenon of the KNN model on these data to a certain extent.

Table 4. Lung data analysis results.

	Screening method	Response
	Screening method	Accuracy	Precision	Recall	Specificity	G-mean	F-measure
Classification method	SVM
Train data	IMCSIS	0.8961	0.9339	0.9025	0.8843	0.8932	0.9178
Train data	GIMCSIS	0.9158	0.9385	0.9305	0.8885	0.9091	0.9344
Test data	IMCSIS	0.8048	0.8883	0.7850	0.8667	0.8168	0.8217
Test data	GIMCSIS	0.8690	0.9050	0.8600	0.8917	0.8682	0.8746
Classification method	DT
Train data	IMCSIS	0.8670	0.8981	0.9111	0.7817	0.8339	0.8977
Train data	GIMCSIS	0.8629	0.8954	0.8929	0.8057	0.8466	0.8931
Test data	IMCSIS	0.7282	0.8033	0.7770	0.6600	0.6956	0.7759
Test data	GIMCSIS	0.7910	0.8583	0.8270	0.7600	0.7821	0.8367
Classification method	KNN
Train data	IMCSIS	0.9515	0.9691	0.9558	0.9418	0.9484	0.9621
Train data	GIMCSIS	0.9878	0.9818	1.0000	0.9654	0.9824	0.9908
Test data	IMCSIS	0.8885	0.8992	0.9214	0.8300	0.8735	0.9097
Test data	GIMCSIS	0.8872	0.9278	0.9083	0.8800	0.8884	0.9099

| Show Table

DownLoad: CSV

5. Conclusions

The existing group feature screening methods mainly focus on continuous data, discrete response variables, discrete covariates, and other different cases, but feature screening with covariates missing at random has not been discussed. Considering the missing conditions of ultrahigh-dimensional data, this paper extends two-stage feature screening under a random missing mechanism to ultrahigh-dimensional data with a group structure and it presents a two-stage feature screening method with covariates missing at random. In the first stage, we use group feature screening based on adjusted Pearson chi-square statistics to find fully observed covariates that are dependent on missing indicator variables. In the second stage, the information of partially observed covariates is replaced by the fully observed covariates with dependence in the first stage so that the partially observed covariates with dependence on response variables can be found. Finally, the important features are selected by comparing the dependence between the fully observed covariates and the response variables. Compared with existing methods, GIMCSIS can efficiently extract important variables from ultrahigh-dimensional group data with covariates missing at random. In practice, the variables selected by GIMCSIS can improve the classification performance for imbalanced data, which plays an important role in expanding the path of imbalanced data analysis.

Specifically, GIMCSIS does not require model assumptions and satisfies certain screening performance requirements. According to our numerical simulation, the finite sample performance of GIMCSIS is better than that of IMCSIS, which is consistent with both the binary and multivariate response variables. The computational complexity of GIMCSIS is similar to that of IMCSIS, and the computer simulation times are similar for the same sample sizes. In the empirical analysis, we apply the GIMCSIS method to the classification model to improve the classification of ultrahigh-dimensional data with randomly missing data. The results show that GIMCSIS can identify more important covariates, and that GIMCSIS has better classification performance than IMCSIS.

Under different missing data mechanisms, the group feature screening of ultrahigh-dimensional data needs to be discussed. In addition, screening group features in the absence of both response variables and covariates is one of the more challenging problems. In terms of empirical research, discretizing continuous data to meet the needs of discretization feature screening is the key to popularizing the group feature screening of ultrahigh-dimensional discrete data.

Use of AI tools declaration

We have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

The authors are grateful to the editor and anonymous referee for their constructive comments that led to significant improvements in the paper.

The National Natural Science Foundation of China [grant number 71963008] funded this research.

Conflict of interest

The authors declare that there is no conflict of interest in the publication of this paper.

Supplementary

Lemma 1. Similar to the derivation of Lemma 1 in ^[23], for the categorical response $Y$ and the categorical and fully observed covariate ${U}_{k}$ , under Condition (1), we have

$P\left(\left|{\widehat{APC}}_{g}\left(Y, {U}_{k}\right)-{APC}_{g}\left(Y, {U}_{k}\right)\right| > \varepsilon \right)\le O\left(R{J}^{3}\right)\mathit{exp}\left\{-{e}_{1}\frac{n{\varepsilon }^{2}}{{R}^{6}{J}^{18}}\right\}$

(S1)

where ${e}_{1}$ is a constant.

Corollary 1. For the missing indicator variable ${\delta }_{l}^{*}$ and the fully observed discrete covariate ${U}_{k}$ , under Conditions (1) and (3), we have

$P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right| > \varepsilon \right)\le O\left({J}^{3}\right)\mathit{exp}\left\{-{e}_{2}\frac{n{\varepsilon }^{2}}{{J}^{18}}\right\}$

(S2)

where ${e}_{2}$ is a constant.

Lemma 2. Under Conditions (1), (3), (4) and (5), we have the following two inequalities

$P\left({M}_{{l}_{g}}\subseteq {\widehat{M}}_{{l}_{g}}\right)\ge 1-O\left(G1\cdot {J}^{3}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}$

(S3)

$P\left({\widehat{M}}_{{l}_{g}}\subseteq {\bar{M}}_{{l}_{g}}\right)\ge 1-O\left(G1\cdot {J}^{3}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}$

(S4)

where ${e}_{3}$ is a positive constant.

Proof. ${M}_{{l}_{g}}$ and ${\widehat{M}}_{{l}_{g}}$ have been defined in Section 2.1 and Condition (5) respectively:

${\widehat{{M}}}_{{l}_{g}} = \left\{k:{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) > {c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, 1\le k\le G1\right\}$

${\bar{M}}_{{l}_{g}} = \left\{k:{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right) > \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, 1\le k\le G1\right\} .$

Under Conditions (1), (3) and (4), it is easy to obtain the following:

$P\left({M}_{{l}_{g}}\subseteq {\widehat{M}}_{{l}_{g}}\right)\ge P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right|\le \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, \forall {U}_{k}\in {M}_{{l}_{g}}\right) \\ \ge P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right|\le \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, 1\le k\le G1\right) \\ \ge 1-\sum\limits_{k = 1}^{G1}P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right| > \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}\right) \\ \ge 1-O\left(G1\cdot {J}^{3}\right)\mathit{exp}\left\{-{c}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}$

$P\left({\widehat{M}}_{{l}_{g}}\subseteq {\bar{M}}_{{l}_{g}}\right) = P\left({\widehat{M}}_{{l}_{g}}\supseteq {\bar{M}}_{{l}_{g}}\right) \\ \ge P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right|\le \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, \forall {U}_{k}\in {M}_{{l}_{g}}\right) \\ \ge P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right|\le \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}, 1\le k\le G1\right) \\ \ge 1-\sum\limits_{k = 1}^{G1}P\left(\left|{\widehat{APC}}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)-{APC}_{g}\left({\delta }_{l}^{*}, {U}_{k}\right)\right| > \frac{1}{2}{c}_{{\delta }_{l}^{*}}{n}^{-{\tau }_{{\delta }_{l}^{*}}}\right) \\ \ge 1-O\left(G1\cdot {J}^{3}\right)\mathit{exp}\left\{-{c}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}.$

Lemma 3. For the discrete response variable $Y$ and the random missing covariate ${V}_{l}$ , under Conditions (1), (3), (4) and (5), we have

$P\left(\left|{\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right| > t\right)\le O\left(G1\cdot {J}^{3}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}+O\left({J}^{3\bar{m}}\right)\mathit{exp}\left\{-{e}_{5}\frac{n{t}^{2}}{{R}^{4}{J}^{12\bar{m}}}\right\}$

(S5)

where ${e}_{3}$ and ${e}_{5}$ are positive constants.

Proof. Section 2.2 gives the joint probability of group data with randomly missing data:

${\widehat{p}}_{{j}_{l}r} = \frac{1}{n}\sum\limits_{u}\frac{{\sum }_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{M}}_{{l}_{g}}} = u\right){\sum }_{i = 1}^{n}I\left({v}_{l1} = {j}_{1}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}, {y}_{i} = r, {u}_{i}^{{\widehat{M}}_{{l}_{g}}} = u, {\delta }_{i, l}^{*} = 1\right)}{{\sum }_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{M}}_{{l}_{g}}} = u, {\delta }_{i, l}^{*} = 1\right)}.$

Then it is easy to get

$P\left(\left|{\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right| > t\right)\le P\left(\left.\left|{\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right| > t\right|{M}_{{l}_{g}}\subseteq {\widehat{M}}_{{l}_{g}}\subseteq {\bar{M}}_{{l}_{g}}\right)\\+P\left({M}_{{l}_{g}}\not\subset {\widehat{M}}_{{l}_{g}}\right)+P\left({\widehat{M}}_{{l}_{g}}\not\subset {\bar{M}}_{{l}_{g}}\right).$

In Lemma 2, neither of the last two terms of the above formula is greater than $O\left(G1\cdot {J}^{3}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}$ , so we just have to worry about the first inequality. Before we do that, for ease of representation, we give the following notation:

Let ${\phi }_{r, u} = P\left(Y = r, {U}^{{\widehat{M}}_{{l}_{g}}} = u\right)$ , ${\varphi }_{r, u} = P\left(Y = r, {U}^{{\widehat{M}}_{{l}_{g}}} = u, {\delta }_{l}^{*} = 1\right)$ , and ${\gamma }_{{j}_{l}, r, u} = P\left({V}_{l1} = {j}_{1}, \dots, {V}_{l{p}_{l}} = {j}_{{p}_{l}}, Y = r, {U}^{{\widehat{M}}_{{l}_{g}}} = u, {\delta }_{l}^{*} = 1\right)$ .

The corresponding estimators are as follows:

${\widehat{\phi }}_{r, u} = {n}^{-1}\sum\limits_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{M}}_{{l}_{g}}} = u\right)$

${\widehat{\varphi }}_{r, u} = {n}^{-1}\sum\limits_{i = 1}^{n}I\left({y}_{i} = r, {u}_{i}^{{\widehat{M}}_{{l}_{g}}} = u, {\delta }_{l}^{*} = 1\right)$

${\widehat{\gamma }}_{{j}_{l}, r, u} = {n}^{-1}{\sum }_{i = 1}^{n}I\left({v}_{l1} = {j}_{1}, \dots , {v}_{l{p}_{l}} = {j}_{{p}_{l}}, {y}_{i} = r, {u}_{i}^{{\widehat{M}}_{{l}_{g}}} = u, {\delta }_{i, l}^{*} = 1\right) .$

Because ${M}_{{l}_{g}}\subseteq {\widehat{M}}_{{l}_{g}}\subseteq {\bar{M}}_{{l}_{g}}$ , we have

${\widehat{p}}_{{j}_{l}r}-{\bar{p}}_{{j}_{l}r} = \sum\limits_{u}\frac{{\widehat{\phi }}_{r, u}{\widehat{\gamma }}_{{j}_{l}, r, u}}{{\widehat{\varphi }}_{r, u}}-\sum\limits_{u}\frac{{\phi }_{r, u}{\gamma }_{{j}_{l}, r, u}}{{\varphi }_{r, u}} = \\ \sum\limits_{u}\frac{{\widehat{\gamma }}_{{j}_{l}, r, u}}{{\widehat{\varphi }}_{r, u}}\left({\widehat{\phi }}_{r, u}-{\phi }_{r, u}\right)+\sum\limits_{u}{\phi }_{r, u}{\widehat{\gamma }}_{{j}_{l}, r, u}\left(\frac{1}{{\widehat{\varphi }}_{r, u}}-\frac{1}{{\varphi }_{r, u}}\right)+\sum\limits_{u}\frac{{\phi }_{r, u}}{{\varphi }_{r, u}}\left({\widehat{\gamma }}_{{j}_{l}, r, u}-{\gamma }_{{j}_{l}, r, u}\right) \\ \le \sum\limits_{u}\left({\widehat{\phi }}_{r, u}-{\phi }_{r, u}\right)+\sum\limits_{u}\left(\frac{1}{{\widehat{\varphi }}_{r, u}}-\frac{1}{{\varphi }_{r, u}}\right)+\sum\limits_{u}\frac{2R{J}^{3\bar{m}}}{{e}_{4}}\left({\widehat{\gamma }}_{{j}_{l}, r, u}-{\gamma }_{{j}_{l}, r, u}\right) \\ = :{I}_{41}+{I}_{42}+{I}_{43}.$

For ease of writing, the following formula ignores the conditional probability, such that ${P}_{M}\left(\cdot \right)$ is used instead of $P\left(\left.\cdot \right|{M}_{{l}_{g}}\subseteq {\widehat{M}}_{{l}_{g}}\subseteq {\bar{M}}_{{l}_{g}}\right)$ ; hence

${P}_{M}\left(\left|{\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right| > t\right)\le {P}_{M}\left({I}_{41} > t\right)+{P}_{M}\left({I}_{43} > t\right)+{P}_{M}\left({I}_{43} > t\right).$

Considering ${I}_{41}$ ,

${P}_{M}\left({I}_{41} > t\right)\le {P}_{M}\left(\sum\limits_{u}\left|{\widehat{\phi }}_{r, u}-{\phi }_{r, u}\right| > t\right) \\ \le \sum\limits_{u}{P}_{M}\left(\left|{\widehat{\phi }}_{r, u}-{\phi }_{r, u}\right| > \frac{t}{3{J}^{3\bar{m}}}\right) \\ \le 2{J}^{3\bar{m}}\mathit{exp}\left\{-\frac{6n{\left(\frac{t}{3{J}^{3\bar{m}}}\right)}^{2}}{3+4\left(\frac{t}{3{J}^{3\bar{m}}}\right)}\right\}.$

Similarly,

${P}_{M}\left({I}_{42} > t\right)\le 2{J}^{3\bar{m}}\mathit{exp}\left\{-\frac{6n{\left(\frac{t}{3{J}^{3\bar{m}}}\right)}^{2}}{3+4\left(\frac{t}{3{J}^{3\bar{m}}}\right)}\right\}$

${P}_{M}\left({I}_{43} > t\right)\le 2{J}^{3\bar{m}}\mathit{exp}\left\{-\frac{6n{\left(\frac{{e}_{4}^{2}t}{48{R}^{2}{J}^{6\bar{m}}}\right)}^{2}}{3+4\left(\frac{{e}_{4}^{2}t}{48{R}^{2}{J}^{6\bar{m}}}\right)}\right\}+2{J}^{3\bar{m}}\mathit{exp}\left\{-\frac{6n{\left(\frac{{e}_{4}}{4R{J}^{3\bar{m}}}\right)}^{2}}{3+4\left(\frac{{e}_{4}}{4R{J}^{3\bar{m}}}\right)}\right\} .$

Hence,

where ${e}_{3}$ and ${e}_{5}$ are constants.

Corollary 2. For the discrete response variable $Y$ and the random missing covariate ${V}_{l}$ , under Conditions (1), (3), (4) and (5), we have

$P\left(\left|{\widehat{w}}_{{j}_{l}}-{w}_{{j}_{l}}\right| > t\right)\le O\left(G2\cdot R{J}^{3}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}+O\left(R{J}^{3\bar{m}}\right)\mathit{exp}\left\{-{e}_{5}\frac{n{t}^{2}}{{R}^{6}{J}^{12\bar{m}}}\right\}$

(S6)

where ${e}_{3}$ and ${e}_{5}$ are constants.

Proof. Section 2.2 gives ${\widehat{w}}_{{j}_{l}} = {{\sum }_{r = 1}^{R}\widehat{p}}_{{j}_{l}r}$ . Using the result in Lemma 3, we can see that

$P\left(\left|{\widehat{w}}_{{j}_{l}}-{w}_{{j}_{l}}\right| > t\right) = P\left(\left|\sum\limits_{r = 1}^{R}\left({\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right)\right| > t\right) \\ \le P\left(\sum\limits_{r = 1}^{R}\left|{\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right| > t\right) \\ \le \sum\limits_{r = 1}^{R}P\left(\left|{\widehat{p}}_{{j}_{l}r}-{p}_{{j}_{l}r}\right| > \frac{t}{R}\right) \\ \le O\left(G2\cdot R{J}^{3}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}+O\left(R{J}^{3\bar{m}}\right)\mathit{exp}\left\{-{e}_{5}\frac{n{t}^{2}}{{R}^{6}{J}^{12\bar{m}}}\right\}.$

Lemma 4. For the discrete response variable $Y$ and the random missing covariate ${V}_{l}$ , under Conditions (1), (3), (4) and (5), we have

$P\left(\left|{\widehat{APC}}_{g}\left(Y, {V}_{l}\right)-{APC}_{g}\left(Y, {V}_{l}\right)\right| > \varepsilon \right) \\ \le O\left(G2\cdot R{J}^{6}\right)\mathit{exp}\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\}+O\left(R{J}^{3\left(\bar{m}+1\right)}\right)\mathit{exp}\left\{-{e}_{5}\frac{n{\varepsilon }^{2}}{{R}^{10}{J}^{18\left(\bar{m}+1\right)}}\right\}$

(S7)

where ${e}_{3}$ and ${e}_{5}$ are positive constants.

Proof. The proof process is similar to Lemma 1, so it is omitted here.

Theorem 1. Under Conditions (1)–(6), we have

$-O\left(qexp\left(-{b}_{3}{n}^{1-2\tau -10\xi -\left(18\stackrel{-}{m}+18\right)\kappa }+\left(\xi +\left(\stackrel{-}{m}+1\right)\kappa \right)logn\right)\right)$

(S8)

where ${b}_{1}$ , ${b}_{2}$ and ${b}_{3}$ are constants. If $\mathit{log}p = O\left({n}^{\alpha }\right)$ , $\mathit{log}q = O\left({n}^{\beta }\right)$ , $\alpha < 1-2\tau -6\xi -18\kappa$ , $\beta < 1-2\tau -10\xi -\left(18\bar{m}+18\right)\kappa$ and $\alpha +\beta < 1-2{\tau }_{\delta }-18\kappa$ , then GIMCSIS has the sure screening property.

Proof. Define four covariate sets as follows:

${U}^{D} = {\left(U, V\right)}^{D}\cup \left\{{v}_{1}, \dots , {v}_{{q}_{g}}\right\} ; {V}^{D} = {\left(U, V\right)}^{D}\cup \left\{{v}_{1}, \dots , {v}_{{q}_{g}}\right\}$

${U}^{\widehat{D}} = {\left(U, V\right)}^{\widehat{D}}\cup \left\{{v}_{1}, \dots , {v}_{{q}_{g}}\right\} ; {V}^{\widehat{D}} = {\left(U, V\right)}^{\widehat{D}}\cup \left\{{v}_{1}, \dots , {v}_{{q}_{g}}\right\} .$

It is obvious that ${\left(U, V\right)}^{D} = {U}^{D}\cap {V}^{D}$ and ${\left(U, V\right)}^{\widehat{D}} = {U}^{\widehat{D}}\cap {V}^{\widehat{D}}$ .

According to Lemmas 1 and 4, we have

$P\left({\left(U, V\right)}^{D}\subseteq {\left(U, V\right)}^{\widehat{D}}\right) = P\left(\left({U}^{D}\cap {V}^{D}\right)\subseteq \left({U}^{\widehat{D}}\cap {V}^{\widehat{D}}\right)\right) \\ \ge P\left(\left({U}^{D}\subseteq {U}^{\widehat{D}}\right)\cap \left({V}^{D}\subseteq {V}^{\widehat{D}}\right)\right) \\ \ge P\left(\begin{array}{c}\left\{\left|{\widehat{APC}}_{g}\left(Y, {U}_{k}\right)-{APC}_{g}\left(Y, {U}_{k}\right)\right|\le c{n}^{-\tau }, \forall {U}_{k}\in {U}^{D}\right\}\\ \cap \left\{\left|{\widehat{APC}}_{g}\left(Y, {V}_{l}\right)-{APC}_{g}\left(Y, {V}_{l}\right)\right|\le c{n}^{-\tau }, \forall {V}_{l}\in {U}^{D}\right\}\end{array}\right) \\ \ge 1-\sum\limits_{k = 1}^{p}P\left(\left|{\widehat{APC}}_{g}\left(Y, {U}_{k}\right)-{APC}_{g}\left(Y, {U}_{k}\right)\right| > c{n}^{-\tau }\right) \\ -\sum\limits_{l = 1}^{q}P\left(\left|{\widehat{APC}}_{g}\left(Y, {V}_{l}\right)-{APC}_{g}\left(Y, {V}_{l}\right)\right| > c{n}^{-\tau }\right) \\ \ge 1-p\cdot O\left(RJ\right)exp\left\{-{e}_{1}\frac{{c}^{2}{n}^{1-2\tau }}{{R}^{6}{J}^{18}}\right\} \\ -q\cdot O\left(G2\cdot R{J}^{6}\right)exp\left\{-{e}_{3}\frac{{n}^{1-2{\tau }_{{\delta }_{l}^{*}}}}{{J}^{18}}\right\} \\ -q\cdot O\left(R{J}^{3\left(\bar{m}+1\right)}\right)exp\left\{-{e}_{5}\frac{{c}^{2}{n}^{1-2\tau }}{{R}^{10}{J}^{18\left(\bar{m}+1\right)}}\right\} \\ \ge 1-O\left(pexp\left(-{b}_{1}{n}^{1-2\tau -6\xi -18\kappa }+\left(\xi +\kappa \right)logn\right)\right) \\ -O\left(pqexp\left(-{b}_{2}{n}^{1-2{\tau }_{\delta }-18\kappa }+\left(\xi +2\kappa \right)logn\right)\right) \\ -O\left(qexp\left(-{b}_{3}{n}^{1-2\tau -10\xi -\left(18\stackrel{-}{m}+18\right)\kappa }+\left(\xi +\left(\stackrel{-}{m}+1\right)\kappa \right)logn\right)\right)$

where ${\tau }_{\delta } = {max}_{1 < l < G2}{\tau }_{{\delta }_{l}^{*}}$ , ${b}_{1}$ , ${b}_{2}$ and ${b}_{3}$ are constants.

Acknowledgments

None.

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	Khilnani NM, Meissner MH, Learman LA, et al. (2019) Research priorities in pelvic venous disorders in women: Recommendations from a multidisciplinary research consensus panel. J Vasc Interv Radiol 30: 781-789. https://doi.org/10.1016/j.jvir.2018.10.008
[2]	Riding DM, Hansrani V, McCollum C (2017) Pelvic vein insufficiency: clinical perspectives. Vasc Health Risk Manag 13: 439-447. https://doi.org/10.2147/VHRM.S132827
[3]	Bendek B, Afuape N, Banks E, et al. (2020) Comprehensive review of pelvic congestion syndrome: causes, symptoms, treatment options. Curr Opin Obstet Gynecol 32: 237-242. https://doi.org/10.1097/GCO.0000000000000637
[4]	Kistner RL, Eklöf B (2009) Classification and etiology of chronic venous disease. Handbook of venous disorders . London: Hodder Arnold 37-46.
[5]	Khatri G, Khan A, Raval G, et al. (2017) Diagnostic evaluation of chronic pelvic pain. Phys Med Rehabil Clin N Am 28: 477-500. https://doi.org/10.1016/j.pmr.2017.03.004
[6]	Castro-Ferreira R, Cardoso R, Leite-Moreira A, et al. (2018) The role of endothelial dysfunction and inflammation in chronic venous disease. Ann Vasc Surg 46: 380-393. https://doi.org/10.1016/j.avsg.2017.06.131
[7]	Nebylitsin YS, Sushkov SA, Solodkov AP, et al. (2008) Endothelial dysfunction in acute and chronic venous insufficiency. News of Surgery 4: 141-153.
[8]	Kolesnikova LI, Rychkova LV, Kolesnikova LR, et al. (2018) Сoupling of lipoperoxidation reactions with changes in arterial blood pressure in hypertensive ISIAH rats under conditions of chronic stress. Bull Exp Biol Med 164: 712-715. https://doi.org/10.1007/s10517-018-4064-3
[9]	Bairova TA, Kolesnikov SI, Kolesnikova LI, et al. (2014) Lipid peroxidation and mitochondrial superoxide dismutase-2 gene in adolescents with essential hypertension. Bull Exp Biol Med 158: 181-184. https://doi.org/10.1007/s10517-014-2717-4
[10]	Darenskaya MA, Kolesnikov SI, Rychkova LV, et al. (2018) Oxidative stress and antioxidant defense parameters in different diseases: ethnic aspects. Free Radic Biol Med 120: S60. https://doi.org/10.1016/j.freeradbiomed.2018.04.199
[11]	Darenskaya MA, Grebenkina LA, Sholokhov LF, et al. (2016) Lipid peroxidation activity in women with chronic viral hepatitis. Free Radic Biol Med 100: S192. https://doi.org/10.1016/j.freeradbiomed.2016.10.525
[12]	Sies H (2020) Oxidative stress: concept and some practical aspects. Antioxidants (Basel) 9: 852. https://doi.org/10.3390/antiox9090852
[13]	Darenskaya MA, Semendyaev AA, Stupin DA, et al. (2020) Activity of antioxidant enzymes in the regional blood flow during pelvic venous disorders in women. Bull Exp Biol Med 169: 747-750. https://doi.org/10.1007/s10517-020-04970-y
[14]	Takase S, Bergan JJ, Schmid-Schönbein G (2000) Expression of adhesion molecules and cytokines on saphenous veins in chronic venous insufficiency. Ann Vasc Surg 14: 427-435. https://doi.org/10.1007/s100169910092
[15]	Barge TF, Uberoi R (2022) Symptomatic pelvic venous insufficiency: a review of the current controversies in pathophysiology, diagnosis, and management. Clin Radiol 77: 409-417. https://doi.org/10.1016/j.crad.2022.01.053
[16]	Saribal D, Kanber EM, Hocaoglu-Emre FS, et al. (2019) Effects of the oxidative stress and genetic changes in varicose vein patients. Phlebology 34: 406-413. https://doi.org/10.1177/0268355518814124
[17]	Gus AI, Khamoshina MB, Cherepanova MA, et al. (2014) Diagnosis and treatment of varicose veins of the small pelvis in women. Novosibirsk: Science 136.
[18]	Semendyaev AA, Stupin DA, Cherepanova MA, et al. Method for determining the functional state of the venous system of the small pelvis in women (2018). Russia patent RU 2646563
[19]	Volchegorskij IA, Nalimov AG, Jarovinskij BG, et al. (1989) Comparison of different approaches to the determination of lipid peroxidation products in heptane-isopropanol extracts of blood. Vopr Med Khim 35: 127-131.
[20]	Gavrilov VB, Gavrilova AR, Mazhul' LM (1987) Analysis of methods for determining the products of lipid peroxidation in blood serum by the test with thiobarbituric acid. Vopr Med Khim 33: 118-122.
[21]	Misra HP, Fridovich I (1972) The role of superoxide anion in the autoxidation of epinephrine and a simple assay for superoxide dismutase. J Biol Chem 247: 3170-3175. https://doi.org/10.1016/S0021-9258(19)45228-9
[22]	Hisin PJ, Hilf R (1976) Fluorоmetric method for determination of oxidized and reduced glutathione in tissues. Anal Biochem 74: 214-226. https://doi.org/10.1016/0003-2697(76)90326-2
[23]	Sayer GL, Smith PDC (2004) Immunocytochemical characterisation of the inflammatory cell infiltrate of varicose veins. Eur J Vasc Endovasc Surg 28: 479-483. https://doi.org/10.1016/j.ejvs.2004.07.023
[24]	Golovina VI, Seliverstov EI, Efremova OI, et al. (2021) The role of cytokines in the pathogenesis of varicose veins. Flebologia 15: 117-126. https://doi.org/10.17116/flebo202115021117
[25]	Lee SA, Tsai HT, Ou HC, et al. (2008) Plasma interleukin-1β, -6, -8 and tumor necrosis factor-α as highly informative markers of pelvic inflammatory disease. Clin Chem Lab Med 46: 997-1003. https://doi.org/10.1515/CCLM.2008.196
[26]	Raju S, Neglén P (2009) Chronic venous insufficiency and varicose veins. N Engl J Med 360: 2319-2327. https://doi.org/10.1056/NEJMcp0802444
[27]	De Gregorio MA, Guirola JA, Alvarez-Arranz E, et al. (2020) Pelvic venous disorders in women due to pelvic varices: treatment by embolization: experience in 520 patients. J Vasc Interv Radiol 31: 1560-1569. https://doi.org/10.1016/j.jvir.2020.06.017
[28]	Khilnani NM, Winokur RS, Scherer KL, et al. (2021) Clinical presentation and evaluation of pelvic venous disorders in women. Tech Vasc Interv Radiol 24: 100730. https://doi.org/10.1016/j.tvir.2021.100730
[29]	Kolesnikova LI, Darenskaya MA, Kolesnikov SI (2017) Free radical oxidation: a pathophysiologist's view. Bull Siberian Med 16: 16-29. https://doi.org/10.20538/1682-0363-2017-4-16-29
[30]	Niki E (2018) Lipid peroxidation products as oxidative stress biomarkers. Biofactors 34: 171-180. https://doi.org/10.1002/biof.5520340208
[31]	Darenskaya MA, Kolesnikova LI, Kolesnikov SI (2021) Oxidative stress: pathogenetic role in the development of diabetes mellitus and its complications, therapeutic approaches to correction. Bull Exp Biol Med 171: 179-189. https://doi.org/10.1007/s10517-021-05191-7
[32]	Kolesnikova LI, Darenskaya MA, Semenova NV, et al. (2015) Lipid peroxidation and antioxidant protection in girls with type 1 diabetes mellitus during reproductive system development. Medicina (Kaunas, Lithuania) 51: 107-111. https://doi.org/10.1016/j.medici.2015.01.009
[33]	Donnez J, Binda MM, Donnez O, et al. (2016) Oxidative stress in the pelvic cavity and its role in the pathogenesis of endometriosis. Fertil Steril 106: 1011-1017. https://doi.org/10.1016/j.fertnstert.2016.07.1075
[34]	Adeoye O, Olawum J, Opeyemi A, et al. (2018) Review on the role of glutathione on oxidative stress and infertility. JBRA Assist Reprod 22: 61-66. https://doi.org/10.5935/1518-0557.20180003
[35]	Darenskaya MA, Gavrilova OA, Rychkova LV, et al. (2018) The assessment of oxidative stress intensity in adolescents with obesity by the integral index. IJBM 8: 37-41. https://doi.org/10.21103/Article8(1)_OA5

This article has been cited by:

Hanji He, Jianfeng He, Guangming Deng, Nian-Sheng Tang, A Group Feature Screening Procedure Based on Pearson Chi‐Square Statistic for Biology Data with Categorical Response, 2024, 2024, 2314-4629, 10.1155/2024/9014764

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Medical Science

0.7

Metrics

Article views(2248) PDF downloads(146) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Medical Science

Cytokine profile and oxidative stress parameters in women with initial manifestations of pelvic venous insufficiency

Related Papers:

Abstract

1. Introduction

2. Theory and method

2.1. Symbol and definition

2.2. Adjusted Pearson chi-square statistic

2.3. Two-stage group feature screening method

2.4. Sure screening property

3. Simulation studies

3.1. Simulation 1: Binary responses

3.2. Simulation 2: Multiclass responses

4. Empirical analysis

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

Supplementary

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog

Abstract

1. Introduction

2. Theory and method

2.1. Symbol and definition

2.2. Adjusted Pearson chi-square statistic

2.3. Two-stage group feature screening method

2.4. Sure screening property

3. Simulation studies

3.1. Simulation 1: Binary responses

3.2. Simulation 2: Multiclass responses

4. Empirical analysis

5. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

Supplementary

References