A Review on Low-Rank Models in Data Analysis

Zhouchen Lin; Zhouchen Lin

doi:10.3934/bdia.2016001

Big Data and Information Analytics

2016, Volume 1, Issue 2: 139-161. doi: 10.3934/bdia.2016001

Previous Article Next Article

A Review on Low-Rank Models in Data Analysis

Zhouchen Lin

1.
Key Lab. of Machine Perception (MOE), School of EECS Peking University, Beijing, China;
2.
Cooperative Medianet Innovation Center Shanghai Jiao Tong University, Shanghai, China

Received: 01 October 2015 Revised: 01 December 2016 Published: 01 July 2016

Nowadays we are in the big data era. The high-dimensionality of data imposes big challenge on how to process them effectively and efficiently. Fortunately, in practice data are not unstructured. Their samples usually lie around low-dimensional manifolds and have high correlation among them. Such characteristics can be effectively depicted by low rankness. As an extension to the sparsity of first order data, such as voices, low rankness is also an effective measure for the sparsity of second order data, such as images. In this paper, I review the representative theories, algorithms and applications of the low rank subspace recovery models in data processing.

Keywords:

Citation: Zhouchen Lin. A Review on Low-Rank Models in Data Analysis[J]. Big Data and Information Analytics, 2016, 1(2): 139-161. doi: 10.3934/bdia.2016001

Related Papers:

[1]	Antonios Armaou, Bryce Katch, Lucia Russo, Constantinos Siettos . Designing social distancing policies for the COVID-19 pandemic: A probabilistic model predictive control approach. Mathematical Biosciences and Engineering, 2022, 19(9): 8804-8832. doi: 10.3934/mbe.2022409
[2]	Haiyan Wang, Nao Yamamoto . Using a partial differential equation with Google Mobility data to predict COVID-19 in Arizona. Mathematical Biosciences and Engineering, 2020, 17(5): 4891-4904. doi: 10.3934/mbe.2020266
[3]	Hai-Feng Huo, Tian Fu, Hong Xiang . Dynamics and optimal control of a Zika model with sexual and vertical transmissions. Mathematical Biosciences and Engineering, 2023, 20(5): 8279-8304. doi: 10.3934/mbe.2023361
[4]	Xiaoying Wang . Studying social awareness of physical distancing in mitigating COVID-19 transmission. Mathematical Biosciences and Engineering, 2020, 17(6): 7428-7441. doi: 10.3934/mbe.2020380
[5]	Abdelkarim Lamghari, Dramane Sam Idris Kanté, Aissam Jebrane, Abdelilah Hakim . Modeling the impact of distancing measures on infectious disease spread: a case study of COVID-19 in the Moroccan population. Mathematical Biosciences and Engineering, 2024, 21(3): 4370-4396. doi: 10.3934/mbe.2024193
[6]	Chad Westphal, Shelby Stanhope, William Cooper, Cihang Wang . A mathematical model for Zika virus disease: Intervention methods and control of affected pregnancies. Mathematical Biosciences and Engineering, 2025, 22(8): 1956-1979. doi: 10.3934/mbe.2025071
[7]	F. S. Vannucchi, S. Boccaletti . Chaotic spreading of epidemics in complex networks of excitable units. Mathematical Biosciences and Engineering, 2004, 1(1): 49-55. doi: 10.3934/mbe.2004.1.49
[8]	Huan Rong, Tinghuai Ma, Xinyu Cao, Xin Yu, Gongchi Chen . TEP2MP: A text-emotion prediction model oriented to multi-participant text-conversation scenario with hybrid attention enhancement. Mathematical Biosciences and Engineering, 2022, 19(3): 2671-2699. doi: 10.3934/mbe.2022122
[9]	Sherry Towers, Katia Vogt Geisse, Chia-Chun Tsai, Qing Han, Zhilan Feng . The impact of school closures on pandemic influenza: Assessing potential repercussions using a seasonal SIR model. Mathematical Biosciences and Engineering, 2012, 9(2): 413-430. doi: 10.3934/mbe.2012.9.413
[10]	Raimund Bürger, Gerardo Chowell, Pep Mulet, Luis M. Villada . Modelling the spatial-temporal progression of the 2009 A/H1N1 influenza pandemic in Chile. Mathematical Biosciences and Engineering, 2016, 13(1): 43-65. doi: 10.3934/mbe.2016.13.43

Abstract

1. Introduction

Zika virus (ZIKV) is a member of the family Flaviviridae, and is the viral cause of Zika fever, which, when symptomatic, involves only mild symptoms including fever, red eyes, joint pain, headache, and a maculopapular rash; Zika resembles a mild form of dengue fever ^[1]. The virus is named for the Zika Forest, in Uganda, where the virus was first isolated in 1947 from the serum of a rhesus monkey ^[2]. Aedes species, especially Aedes aegypti and Aedes albopictus, serve as vectors for ZIKV, allowing transmission between Aedes and humans. The transmission is autochthonous in many South American countries ^[3,4,5]. Historically, human ZIKV infection is less common than dengue fever, but international spread accelerated the global epidemic, and epidemics were observed in 2007 in Micronesia, in 201314 in French Polynesia, and, since 2015, in Brazil and across the world ^[6,7,8,9]. Following the detection of a large number of cases in Brazil in late 2015, an excessive number of microcephaly cases was reported approximately 30 weeks later in 2016, in northeastern Brazil, which attracted global attention to this virus ^[10,11,12]. Although ZIKV infections tend to be mild, infected pregnant women are at risk of delivering babies with microcephaly or other congenital disorders. Additionally, the infection can result in Guillain-Barre syndrome in adults ^[13,14].

The 2015 global epidemic was initially concentrated in northeastern Brazil; however, the infection rapidly spread throughout Latin America and the Caribbean and subsequently to many countries around the world ^[15,16,17]. More than 45 American countries reported ZIKV importation ^[18,19], and, of these, all countries except Canada and the US reported ZIKV importation after the Brazil epidemic. While many European countries have reported ZIKV importation from Brazil since the first imported case was reported in 2015, only a few African countries reported importation after the 2015 Brazil epidemic ^[20]. There are two possible explanations for the timing and the presence of imported cases in a given country, including why some countries did not experience importation. First, due to a countrys effective distance from Brazil or any other epidemic locations, importation may not actually occur. Second, due to limited laboratory capacity, imported cases could have gone undetected, resulting in reporting delays and underreporting. The status and quality of ZIKV diagnostic capacity are known to be highly variable, even among European countries ^[21], to say nothing of the limitations healthcare workers navigate in low-income countries.

Previous mathematical models were developed to describe vector-borne disease transmission dynamics ^[22,23], and, after the 2015 Brazil epidemic, mathematical models were applied to ZIKV to describe both local ^{[24,25,26,27]} and international transmission ^[28,29,30]. Both deterministic and stochastic models for ZIKV were formulated to fit data from Colombia, El Salvador, and Suriname in ^[5]. Massad et.al ^[31] identified high-risk countries in Europe by using mathematical models to estimate the importation risk from Brazil according to important factors, such as travel volume. In fact, there is an associated question with respect to the predictability of importation time for ZIKV: real-time forecasting for emerging infectious disease epidemics can be achieved by employing meta-population models ^[32,33], and, as a possible short cut, the so-called "effective distance, " which can be computed from airline network data, has been proposed ^[34,35]. Use of effective distance indicates that the relative arrival time for imported cases is determined by the path length and adjacency matrix from the origin of the epidemic to the destination. However, in the case of ZIKV, effective distance alone is not necessarily an adequate predictor. Figure 1A shows arrival time as a function of the effective distance from Brazil. While a positive correlation is certainly observed, the resulting importation times were highly variable. Such variation was not identified during severe acute respiratory syndrome (SARS) epidemic and the pandemic of H1N1 influenza in 2009 ^[34]. Figure 1B classifies countries into three different groups by gross domestic product (GDP) per capita. It seems that the variation in arrival time and deviation from the linear predictor in Figure 1 were partly explainable by GDP or other variables that influence the underlying diagnostic capacity in a given country.

Figure 1. The relationship between effective distance from Brazil to importing country and the first reported importation time as a function of weeks since Brazil reported its first ZIKV case. Linear regression lines passing through the origin, i.e., counting from the reporting week in Brazil as time zero, are shown. A. Original data. B. Countries classified into 3 groups, A, B, and C, based on tertiles of gross domestic product per capita—high, intermediate, and low, respectively.

DownLoad: Full-Size Img PowerPoint

To explore the mechanisms of ZIKV importation, we argued that it is crucial to account for possible inter-country variations in testing, diagnostics, and reporting ZIKV. In this study, we statistically estimated the actual arrival time of ZIKV importation in each country using airline network data, while adjusting for reporting delays. Describing the data generating process of imported cases, we mathematically explored mechanisms behind observed variations in ZIKV arrival time.

2. Materials and method

2.1. Epidemiological data

The time from Brazil to "reported" importation of ZIKV is defined as the time interval from the first notification of ZIKV infection in Brazil to the time at which each country reported the first ZIKV-infected case. The observed (reported) arrival time was collected from publicly available data sources for a total of 219 countries. Following a recent similar study that was conducted elsewhere ^[36], we updated our arrival time data. As of September 25, 2018,110 countries reported ZIKV importation (see below for non-imported countries). Of these, we excluded 40 countries that experienced importation before the 2015 Brazil epidemic. In addition to arrival time data, airline transportation data were obtained from the OpenFlights database ^[37], which uses the Global Flights Network (2016) that includes 230 airports and 4600 flight routes. We selected datasets that can be considered to reflect the diagnostic and reporting capacity of affected countries ^[38,39]. Specifically, we collected datasets of GDP per capita and several types of health expenditure metrics, including government expenditure on health, per capita government expenditure on health, and private expenditure on health. Moreover, we categorized countries into geographic regions using two different classification systems: first, by World Health Organization (WHO) region, and, second, by continent. Other variables that we examined included religion (Christian, Muslim, and others) and language (English, Spanish, and others). See Otsuki and Nishiura ^[40] for data collection methods for those explanatory variables.

2.2. Importation risk

To determine actual ZIKV arrival times, we constructed a mathematical model that convolutes two functions (). For the first function, we modeled the time from Brazil to actual importation, in which we exploited the relationship between arrival time and effective distance. We derived the effective distance, which was created by Brockmann and Helbing ^[34], from the abovementioned flight network, and we calculated it as the minimum of the summation of all possible path lengths and the logarithm of the product of transition probabilities from the origin country (i.e., Brazil) to each destination country $i$ . A key feature of effective distance is that, as long as we handle emerging infectious diseases, it has shown a very strong correlation with arrival time, demonstrating itself to be an excellent predictor of arrival time. The validity of the predictive performance of effective distance has been described elsewhere ^[34], and has been used in studies exploring data from the 2003 SARS epidemic and the influenza A(H1N1-2009) pandemic, and it has been widely used to analyze a variety of global infectious disease pandemics ^[36,40,41]. Let $m_i$ be the effective distance from Brazil to country $i = 1, 2\cdots, 170$ . Due to the linear relationship between arrival time and effective distance, the hazard function of importation for country $i$ is modeled as an inverse of the effective distance:

$\begin{equation} \lambda_{i} = \frac{k}{m_i} \end{equation}$

(2.1)

Figure 2. Two functions were defined to account for the actual arrival time and reporting delay from actual importation in Brazil:

$(i)$ the time to importation from Brazil and

$(ii)$ the time from importation to reporting. We set

$t = 0$ as April 1, 2015, when Brazil officially announced the importation. For Brazil, we manually set the reporting delay at

$t_0 = -48$ weeks according to molecular clock analysis ^[18]. For the 70 countries that reported ZIKV importation after the 2015 Brazil epidemic, we count the number of weeks since April 2015. Countries that never reported ZIKV importation were handled as censored data, and we defined the end of 2017, i.e., week 144, as the censoring week.

DownLoad: Full-Size Img PowerPoint

where $k$ is a constant parameter (to be estimated below). The country-specific hazard function yields the probability density function for ZIKV importation from Brazil to a country $i$ at time $t$ , written as

$\begin{equation} f_i(t,t_0;k) = \lambda_i e^{-\lambda_i (t-t_0)} \end{equation}$

(2.2)

where $t_0$ is the time at which the global epidemic started at the origin, i.e., Brazil. In this study, we set $t_0$ as 48 weeks prior to the date that Brazil reported their first ZIKV importation, referring to Faria et al.s molecular clock analysis ^[18].

2.3. Time from importation to reporting

For the second function, we modeled the time from importation to reporting, accounting for reporting delays that differ by country (). The country-specific mean reporting delay $\mu_{i}$ was modeled with a linear predictor using a combination of two variables: (1) financial capacity for diagnosis and reporting ( $x_{1}$ ), which was either modeled by GDP per capita or health expenditure (per capita government expenditure on health), and (2) region $x_{2}$ according to the WHO classification. For $x_{1}$ , we divided 170 countries into 3 groups i.e., $G_1 = \left\lbrace A, B, and \ C \right\rbrace$ based on tertiles of high, intermediate, and low, respectively. Three-group discretization was adopted because we did not observe GDP per capita as having precise predictor performance when handled as a continuous variable. Additionally, we classified countries into three groups based on WHO region for the variable $x_2$ : region of the Americas (AMR), African region (AFR), and countries which belong to neither AMR nor AFR ( $Others$ ), i.e., $G_2 = \left\lbrace AMR, AFR, and \ Others \right\rbrace$ . Again, we employed discrete grouping to approximately capture the observed patterns, including the tendency that countries in the AMR reported ZIKV early, while those in the AFR reported relatively late. The linear predictor of $\mu_{i}$ for country $i$ is

$\begin{equation} \mu_i = \beta_0+ \beta_{1G_1(J_1(i))}x_{1,G_1(J_1(i))}+\beta_{2G_2(J_2(i))}x_{2,G_2(J_2(i))} \end{equation}$

(2.3)

where $\beta$ s are coefficients to be estimated, and $x_{y, G_y(j)}$ represents the dummy variable $x_y$ that belongs to group $j$ , defined as

$\begin{equation} \begin{array}{cc} x_{1, G_1(j)} = \begin{cases} 0, & \text{if $j = 1$}\\ 1, & \text{if $j = 2$}\\ 1, & \text{if $j = 3$}, \end{cases} & x_{2, G_2(j)} = \begin{cases} 1, & \text{if $j = 1$}\\ 1, & \text{if $j = 2$}\\ 0, & \text{if $j = 3$} \end{cases} \end{array} \end{equation}$

(2.4)

That is, for $x_{1, G_1(j)}$ , $j = 1$ represents high income group, $j = 2$ represents intermediate, and $j = 3$ low income. For $x_{2, G_2(j)}$ , $j = 1$ represents countries that belong to AMR, $j = 2$ indicates countries that belong to AFR, and $j = 3$ represents other countries. The country grouping $j$ is determined according to country $i$ . Let $J_y (i)$ be the group of variables $x_y$ for country $i$ . Then, $j$ corresponds to $J_y (i)$ . It shoud be noted that, while AMR is a mix of North America and other countries, the USA and Canada imported ZIKV case prior to Brazil and were excluded from our analysis. We calculated the correlations between $x_{1, G_1}$ and $x_{2, G_2}$ using the Kendall rank correlation coefficient. If significant correlations $p < 0.05$ were identified, we attempted to address the interaction using the interaction term from the linear predictor. However, if the interaction was not adjusted by the interaction-term alone, we removed the candidate model with two correlating variables prior to model comparisons.

For a country $i$ , groups of two variables can be described as $g_i = \lbrace G_1(J_1(i)), G_2(J_2(i)) \rbrace$ . We assumed that the reporting delay followed a gamma distribution. Supposing that country i belongs to group ${g_i}$ , the reporting delay was assumed to be the result of an independent random sampling from the probability density function $g(t; \mu_{i}, \sigma_{i, g_i}^{2})$ with a country-specific mean $\mu_{i}$ and variance $\sigma_{i, g_i}^{2}$ . To obtain group specific variance $\sigma_{i, g_i}^{2}$ , we first estimated the coefficient of variation in two different wayseither using a group-specific standard deviation $\sigma_{g_i}$ (i.e., $cv_{g_i} = \frac{\sigma_{g_i}}{\mu_{g_i}}$ ) or a constant standard deviation over all the groups $\sigma$ (i.e., $cv_{g_i}^{'} = \frac{\sigma}{\mu_{g_i}}$ ). Consequently, group-specific variance for gamma distribution obeyed the formula $\sigma_{i, {g_i}}^{2} = cv_{g_i}\mu_i$ or $\sigma_{i, g_i}^{2} = cv_{g_i}^{'}\mu_i$ .

2.4. Likelihood functions

A total of 219 countries were divided into three different groups:

(ⅰ) 56 countries that reported ZIKV importation after the 2015 Brazil epidemic; the country $i$ in group $g_i$ is defined by $i \in C^{RP}_{g_i}$ where RP stands for "reported."

(ⅱ) 89 countries that reported no ZIKV importations by the end of 2017; the country $i$ in group $g_i$ is defined by $i \in C^{NRP}_{g_i}$ where NRP stands for "never reported."

(ⅲ) 74 excluded countries, including 40 countries that reported ZIKV importation before the 2015 Brazil epidemic and 34 countries for which there are no available data for either GDP per capita or health expenditure.

For the countries in group (ⅰ), the probability density function of the reporting time $h_{i} (t, t_0;k, \mu_i, \sigma_{i, g_i}^{2})$ obeys the formula:

$\begin{equation} h_{i}(t,t_0;k,\mu_i,\sigma_{i,g_i}^{2}) = \int_0^{t}g(t-s,t_0;\mu_i,\sigma_{i,g_i}^{2})f_i(s,t_0;k)ds \end{equation}$

(2.5)

For simplicity, let $h_{i}(t)$ be the simpler notation of $h_i(t, t_0;k, \mu_i, \sigma_{i, g_i}^{2})$ . The likelihood for observing $t_i$ in this group is

$\begin{equation} L_{1}(k,\mu_i,\sigma_{i,g_i}^{2};t_i,t_0) = \prod\limits_{i \in C^{RP}_{g_i}}h_i(t_i) \end{equation}$

(2.6)

where $t_i$ is the time of reported ZIKV importation for each country $i$ . For the countries in group (ⅱ), the probability density function of the reporting time $w_{i} (t, t_0;k, \mu_i, \sigma_{i, g_i}^{2})$ is calculated as

$\begin{equation} w_{i}(t,t_0;k,\mu_i,\sigma_{i,g_i}^{2}) = 1-\int_0^{t}h_{i}(y)dy \end{equation}$

(2.7)

As was done for group (ⅱ), let $w_i (t)$ be the simpler notation of $w_{i}(t, t_0;k, \mu_i, \sigma_{i, g_i}^{2})$ . The likelihood function for observing censoring for this group is

$\begin{equation} L_{2}(k,\mu_i,\sigma_{i,g_i}^{2};t_c,t_0) = \prod\limits_{i \in C^{NRP}_{g_i}}w_i(t_c) \end{equation}$

(2.8)

where $t_c$ is the substituted time of reported ZIKV importation into country $i$ , which we assumed was long enough to represent no reported cases. Finally the full likelihood function is

$\begin{equation} L(k,\mu_i,\sigma_{i,g_i}^{2};t_i,t_c,t_0) = L_1 L_2 \end{equation}$

(2.9)

Profile-likelihood based confidence intervals were computed to obtain the 95% confidence interval (CI). We computed the second order Akaike information criterion (AICc) for model comparison.

2.5. Ethical considerations

Herein, we only analyzed publicly available data. As such, the datasets were de-identified and fully anonymized in advance, and the analysis of publicly available data without identifying information does not require ethical approval.

2.6. Data sharing policy

This study fully relies on published data, essential components of which we made available as supplementary material.

3. Results

Models with variable combinations of explanatory parameters for describing the time from importation to reporting were optimized using the same datasets, yielding a total of 56 imported and 89 non-imported countries during the study period. We compared 10 models and selected the best according to which generated the lowest AICc value. Among the financial variables that influenced diagnostic and reporting capacities, GDP per capita and health expenditure were correlated, indicating that they could not jointly explain the reporting delay. Accordingly, when either GDP per capita or health expenditure was included, the other variable was excluded from the final model. Table 1 compares the goodness-of-fit of those 10 different models. Of these, the model with GDP and WHO region yielded the lowest AICc value of 681.7 (Table 1). We compared the mean reporting delay according to GDP per capita tertiles (Figure 3) using the best model (i.e. Model 1 in Table 1). We found that countries with the highest GDPs yielded the shortest reporting delays. It can be interpreted that countries with relatively high GDPs were more likely to report ZIKV infection earlier than countries with lower GDPs. The median reporting delays for first, second, and third GDP tertiles were 12 weeks (95% CI: 1222), 30 weeks (95% CI: 2030), and 35 weeks (95% CI: 2435), respectively. On average, approximately 4 months after the countries in group A reported a first case, countries in group B reported ZIKV importation, and countries in group C subsequently reported ZIKV importation approximately 1 month later. Figure 4 shows the linear relationship between the effective distance and the time from actual importation in Brazil to actual arrival in each country, according to GDP tertile grouping. As the effective distance became longer, the time required for importation increased. Moreover, comparing the time since actual arrival by GDP groups, the countries in the highest GDP tertile (A) experienced a longer time to actual arrival, even up to 75 weeks, since initial importation in Brazil. The majority of countries in the lowest tertile group experienced importation within 25 weeks of actual importation in Brazil.

Table 1. Comparison of ten models based on second order Akaike Information Criterion (AICc).

Model (a)	Variable*	$n_{\sigma}^{**}$	$n_a^{\dagger}$	AICc $_a^{\ddagger}$	$\Delta$ AICc $_a^{\ddagger}$
1	GDP, WHO region	1	6	681.7	0
2	Constant	3	2	685.0	3.3
3	GDP	1	4	685.5	3.8
4	HEALTH, WHO region	3	6	686.3	4.6
5	GDP	3	4	687.8	6.1
6	GDP, WHO region	3	6	688.7	7.0
7	HEALTH	3	4	688.8	7.1
8	HEALTH, WHO region	1	6	688.9	7.2
9	Constant	1	2	690.6	8.9
10	HEALTH	1	4	690.7	9.0
* Variables of ten models: GDP - three groups based on gross domestic product per capita, HEALTHthree groups based on per capita government expenditure on health, WHO regionthree groups based on WHO region; Region of the Americas (AMR), African region (AFR), and $Others$ . $**$ : $n_{\sigma}$ is the number of standard deviations, $\sigma$ in model $a$ . $\dagger$ : $n_a$ is the number of parameters for model $a$ . $\ddagger$ : AIC for each model $a$ is calculated as $AIC_a = -2ln(L_a)+2n_a$ and subsequently, AICc $_a$ is calculated as $AIC_a+\frac{2n_a^2+2n_a}{m-n_a-1}$ where $m$ is the sample size. $\Delta AICc_a = AICc_a - \min(AICc_1)$ , where $L_a$ is the likelihood value for model $a$ .

| Show Table

DownLoad: CSV

Figure 3. Comparison of the estimated reporting delay by tertile of gross domestic product (GDP) per capita. The vertical axis measures the weeks from importation to reporting. Beginning with the highest tertile, A, B, and C are labeled on the horizontal axis. Left: the box size corresponds to the interquartile range (IQR). The horizontal bold line in the box represents the median value. The lower and upper boxes correspond to the first and third quartiles. The upper whisker extends from the box to the largest value of no more than third quartile plus 1.5 times the IQR. Dots represent outliers. Right: error bars indicate the minimum and maximum values. Solid squares represent the median reporting delay.

DownLoad: Full-Size Img PowerPoint

Figure 4. Relationship between effective distance from Brazil and the estimated actual arrival time of ZIKV in each country. Time 0 indicates the time at which Brazil experienced actual ZIKV importation. Countries are classified into tertile groups, A, B, and C based on gross domestic product per capita tertile (high, intermediate, and low, respectively). The linear regression line passing through the origin, i.e., Brazil, is shown.

DownLoad: Full-Size Img PowerPoint

4. Discussion and conclusion

We statistically estimated ZIKV arrival time around the world after it first appeared in Brazil in 2015. Taking a similar approach to our previous study that we conducted in real time ^[36], we modeled inter-country infection spread using effective distance, but, as Figure 1 indicates, there was substantial variation among observed importation times, and we determined that effective distance alone was not an adequate predictor of ZIKV arrival time. To better decipher the importation mechanisms, we also accounted for reporting delay, which we regressed by plausible indicators of country-specific laboratory capacity and reporting, including GDP per capita, health expenditure per capita, and geographic regions. We found that high GDP is a good predictor of short reporting delays. Additionally, reporting delay was dependent on WHO geographic region. ZIKV infection is generally mild and, without substantial laboratory capacity, cases can be underestimated. Herein, we successfully highlighted this feature and identified several variables as important to the data generating process of time from Brazil to reported importation around the world.

There are two major findings from our regression models of reporting delay, which yielded smaller AICs than models using a constant to explain delay. First, the reporting delays were shorter in countries with higher GDP per capita. We chose GDP per capita to reflect the capacity of laboratory testing and surveillance. For similar reasons, we also evaluated health expenditure per capita. Because GDP and health expenditure were correlated, the weaker predictorhealth expenditurewas not included in the final model. Because of this finding, the estimated actual arrival time from the actual start of the Brazil epidemic was longest in countries in the highest tertile of GDP per capita. To the best of our knowledge, this study is the first to identify and decipher country-dependent mechanisms behind reporting delays.

Second, we found that South American countries had shorter reporting delays than other countries. Owing to their geographic proximity to Brazil and an elevated awareness of the virus among South American countries, this is an intuitive finding. We found longer reporting delays among the African countries. This is also in line with our expectation that the laboratory capacity in the African region may be lower relative to other countries. Moreover, owing to the mild nature of ZIKV infection, which is historically endemic in African countries ^[13], it is probable that there is a high rate of underreporting in this region.

As shown in Figure 4, our study does not fully explain the global variation in arrival time. Rather, owing to the discrete grouping of GDP per capita and other variables, the estimated variation in actual arrival time was escalated compared with the observed (reported) arrival time. As a response to country-specific variations, herein, we showed that laboratory capacity for testing and surveillance, as well as elevated illness awareness and geographic proximity to the origin, are likely to drive diagnosis and reporting rates. Rather than reducing the arrival-time variance, our contribution with this study was to identify the effect of reporting delays on country-specific observed arrival times of the ZIKV epidemic. A critical lesson learned from modeling the 201516 ZIKV pandemic is that the use of effective distance alone is not sufficiently precise to capture the observed patterns of country-specific arrival times; rather, we need to additionally account for case ascertainment, laboratory capacity, and virus awareness to address possible variations in reporting delays and laboratory coverage.

Several technical limitations of our study should be noted. First, even though our linear regression model explored reporting delay by country, ZIKV epidemics include asymptomatic infections; thus, it is possible that several importation events were missed. Second, even though country-specific variation was regressed, our approach adopted discrete classification of GDP per capita into three groups, but this approach could not fully explain variations in observed arrival times. Additional mechanisms need to be identified via country-specific analyses. Third, we used static network data to capture human mobility patterns, but it is possible that global spread in 2016 was enhanced due to the Olympic Games, which were held in Rio de Janeiro that year ^[42]. Lastly, we estimated the ZIKV importation risk in each country as a risk of importation from Brazil. Importations from other areas that experienced a contemporaneous ZIKV epidemic (e.g., South Pacific) were ignored.

Despite these limitations, we argue that, with this study, we have successfully elucidated key importation mechanisms, including diffusion by human migration and reporting delay. The time required for ZIKV to arrive and be reported in a country was determined not only by global airline travel patterns, but also by other factors, most notably heterogeneous laboratory and reporting capacities.

Acknowledgments

HN received funding support from the Japan Agency for Medical Research and Development (JP18fk0108050); Japan Society for the Promotion of Science KAKENHI (Grant Numbers 16KT0130, 17H04701, 17H05808 and 18H04895); Health and Labour Sciences Research Grant (H28-AIDS-General-001); the Inamori Foundation; the Telecommunication Advancement Foundation; and the Japan Science and Technology Agency (JST) CREST program (JPMJCR1413). HL received financial support from the Japan Society for the Promotion of Science, Program for Advancing Strategic International Networks to Accelerate the Circulation of Talented Researchers; and the Japan Society for the Promotion of Science KAKENHI (Grant number 18H06385). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. We also thank Ashley Hazel, PhD, from Edanz Group (www.edanzediting.com/ac) for editing a draft of this manuscript.

Conflict of interest

The authors declare that they have no conflict of interest.

References

[1]	[ A. Adler, M. Elad and Y. Hel-Or, Probabilistic subspace clustering via sparse representations, IEEE Signal Processing Letters, 20(2013), 63-66.
[2]	[ A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM Journal on Imaging Sciences, 2(2009), 183-202.
[3]	[ J. Cai, E. Candès and Z. Shen, A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20(2010), 1956-1982.
[4]	[ E. Candès, X. Li, Y. Ma and J. Wright, Robust principal component analysis?, Journal of the ACM, 58(2011), Art. 11, 37 pp.
[5]	[ E. Candès and Y. Plan, Matrix completion with noise, Proceedings of the IEEE, 98(2010), 925-936.
[6]	[ E. Candès and B. Recht, Exact matrix completion via convex optimization, Foundations of Computational Mathematics, 9(2009), 717-772.
[7]	[ V. Chandrasekaran, S. Sanghavi, P. Parrilo and A. Willsky, Sparse and low-rank matrix decompositions, Annual Allerton Conference on Communication, Control, and Computing, 2009, 962-967.
[8]	[ C. Chen, B. He, Y. Ye and X. Yuan, The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent, Mathematical Programming, 155(2016), 57-79.
[9]	[ Y. Chen, H. Xu, C. Caramanis and S. Sanghavi, Robust matrix completion with corrupted columns, International Conference on Machine Learning, 2011, 873-880.
[10]	[ B. Cheng, G. Liu, J. Wang, Z. Huang and S. Yan, Multi-task low-rank affinity pursuit for image segmentation, International Conference on Computer Vision, 2011, 2439-2446.
[11]	[ A. Cichocki, R. Zdunek, A. H. Phan and S. Ichi Amari, Nonnegative Matrix and Tensor Factorizations:Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, 1st edition, Wiley, 2009.
[12]	[ Y. Cui, C.-H. Zheng and J. Yang, Identifying subspace gene clusters from microarray data using low-rank representation, PLoS One, 8(2013), e59377.
[13]	[ P. Drineas, R. Kannan and M. Mahoney, Fast Monte Carlo algorithms for matrices Ⅱ:Computing a low rank approximation to a matrix, SIAM Journal on Computing, 36(2006), 158-183.
[14]	[ E. Elhamifar and R. Vidal, Sparse subspace clustering, in IEEE International Conference on Computer Vision and Pattern Recognition, 2009, 2790-2797.
[15]	[ E. Elhamifar and R. Vidal, Sparse subspace clustering:Algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2013), 2765-2781.
[16]	[ P. Favaro, R. Vidal and A. Ravichandran, A closed form solution to robust subspace estimation and clustering, IEEE Conference on Computer Vision and Pattern Recognition, 2011, 1801-1807.
[17]	[ J. Feng, Z. Lin, H. Xu and S. Yan, Robust subspace segmentation with block-diagonal prior, IEEE Conference on Computer Vision and Pattern Recognition, 2014, 3818-3825.
[18]	[ M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly, 3(1956), 95-110.
[19]	[ Y. Fu, J. Gao, D. Tien and Z. Lin, Tensor LRR based subspace clustering, International Joint Conference on Neural Networks, 2014, 1877-1884.
[20]	[ A. Ganesh, Z. Lin, J. Wright, L. Wu, M. Chen and Y. Ma, Fast algorithms for recovering a corrupted low-rank matrix, International Workshop on Computational Advances in MultiSensor Adaptive Processing, 2009, 213-216.
[21]	[ H. Gao, J.-F. Cai, Z. Shen and H. Zhao, Robust principal component analysis-based four-dimensional computed tomography, Physics in Medicine and Biology, 56(2011), 3181-3198.
[22]	[ M. Grant and S. Boyd, CVX:Matlab software for disciplined convex programming (web page and software), http://cvxr.com/cvx/, 2009.
[23]	[ S. Gu, L. Zhang, W. Zuo and X. Feng, Weighted nuclear norm minimization with application to image denoising, IEEE Conference on Computer Vision and Pattern Recognition, 2014, 2862-2869.
[24]	[ H. Hu, Z. Lin, J. Feng and J. Zhou, Smooth representation clustering, IEEE Conference on Computer Vision and Pattern Recognition, 2014, 3834-3841.
[25]	[ Y. Hu, D. Zhang, J. Ye, X. Li and X. He, Fast and accurate matrix completion via truncated nuclear norm regularization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2013), 2117-2130.
[26]	[ M. Jaggi, Revisiting Frank-Wolfe:Projection-free sparse convex optimization, in International Conference on Machine Learning, 2013, 427-435.
[27]	[ M. Jaggi and M. Sulovský, A simple algorithm for nuclear norm regularized problems, in International Conference on Machine Learning, 2010, 471-478.
[28]	[ I. Jhuo, D. Liu, D. Lee and S. Chang, Robust visual domain adaptation with low-rank reconstruction, IEEE Conference on Computer Vision and Pattern Recognition, 2012, 2168-2175.
[29]	[ H. Ji, C. Liu, Z. Shen and Y. Xu, Robust video denoising using low rank matrix completion, IEEE Conference on Computer Vision and Pattern Recognition, 2010, 1791-1798.
[30]	[ Y. Jin, Q. Wu and L. Liu, Unsupervised upright orientation of man-made models, Graphical Models, 74(2012), 99-108.
[31]	[ T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Review, 51(2009), 455-500.
[32]	[ C. Lang, G. Liu, J. Yu and S. Yan, Saliency detection by multitask sparsity pursuit, IEEE Transactions on Image Processing, 21(2012), 1327-1338.
[33]	[ R. M. Larsen, http://sun.stanford.edu/~rmunk/PROPACK/, 2004.
[34]	[ D. Lee and H. Seung, Learning the parts of objects by non-negative matrix factorization, Nature, 401(1999), 788.
[35]	[ X. Liang, X. Ren, Z. Zhang and Y. Ma, Repairing sparse low-rank texture, in European Conference on Computer Vision, 7576(2012), 482-495.
[36]	[ Z. Lin, R. Liu and H. Li, Linearized alternating direction method with parallel splitting and adaptive penality for separable convex programs in machine learning, Machine Learning, 99(2015), 287-325.
[37]	[ Z. Lin, R. Liu and Z. Su, Linearized alternating direction method with adaptive penalty for low-rank representation, Advances in Neural Information Processing Systems, 2011, 612-620.
[38]	[ G. Liu, Z. Lin, S. Yan, J. Sun and Y. Ma, Robust recovery of subspace structures by low-rank representation, IEEE Transactions Pattern Analysis and Machine Intelligence, 35(2013), 171-184.
[39]	[ G. Liu, Z. Lin and Y. Yu, Robust subspace segmentation by low-rank representation, in International Conference on Machine Learning, 2010, 663-670.
[40]	[ G. Liu, H. Xu and S. Yan, Exact subspace segmentation and outlier detection by low-rank representation, International Conference on Artificial Intelligence and Statistics, 2012, 703-711.
[41]	[ G. Liu and S. Yan, Latent low-rank representation for subspace segmentation and feature extraction, in IEEE International Conference on Computer Vision, IEEE, 2011, 1615-1622.
[42]	[ J. Liu, P. Musialski, P. Wonka and J. Ye, Tensor completion for estimating missing values in visual data, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(2013), 208-220.
[43]	[ R. Liu, Z. Lin, Z. Su and J. Gao, Linear time principal component pursuit and its extensions using l1 filtering, Neurocomputing, 142(2014), 529-541.
[44]	[ R. Liu, Z. Lin, F. Torre and Z. Su, Fixed-rank representation for unsupervised visual learning, IEEE Conference on Computer Vision and Pattern Recognition, 2012, 598-605.
[45]	[ C. Lu, J. Feng, Z. Lin and S. Yan, Correlation adaptive subspace segmentation by trace lasso, International Conference on Computer Vision, 2013, 1345-1352.
[46]	[ C. Lu, Z. Lin and S. Yan, Smoothed low rank and sparse matrix recovery by iteratively reweighted least squared minimization, IEEE Transactions on Image Processing, 24(2015), 646-654.
[47]	[ C. Lu, H. Min, Z. Zhao, L. Zhu, D. Huang and S. Yan, Robust and efficient subspace segmentation via least squares regression, European Conference on Computer Vision, 7578(2012), 347-360.
[48]	[ C. Lu, C. Zhu, C. Xu, S. Yan and Z. Lin, Generalized singular value thresholding, AAAI Conference on Artificial Intelligence, 2015, 1805-1811.
[49]	[ X. Lu, Y. Wang and Y. Yuan, Graph-regularized low-rank representation for destriping of hyperspectral images, IEEE Transactions on Geoscience and Remote Sensing, 51(2013), 4009-4018.
[50]	[ Y. Ma, S. Soatto, J. Kosecka and S. Sastry, An Invitation to 3-D Vision:From Images to Geometric Models, 1st edition, Springer, 2004.
[51]	[ K. Min, Z. Zhang, J. Wright and Y. Ma, Decomposing background topics from keywords by principal component pursuit, in ACM International Conference on Information and Knowledge Management, 2010, 269-278.
[52]	[ Y. Ming and Q. Ruan, Robust sparse bounding sphere for 3D face recognition, Image and Vision Computing, 30(2012), 524-534.
[53]	[ L. Mukherjee, V. Singh, J. Xu and M. Collins, Analyzing the subspace structure of related images:Concurrent segmentation of image sets, European Conference on Computer Vision, 7575(2012), 128-142.
[54]	[ Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2), (Russian) Dokl. Akad. Nauk SSSR, 269(1983), 543-547.
[55]	[ Y. Panagakis and C. Kotropoulos, Automatic music tagging by low-rank representation, International Conference on Acoustics, Speech, and Signal Processing, 2012, 497-500.
[56]	[ Y. Peng, A. Ganesh, J. Wright, W. Xu and Y. Ma, RASL:Robust alignment by sparse and low-rank decomposition for linearly correlated images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2012), 2233-2246.
[57]	[ J. Qian, J. Yang, F. Zhang and Z. Lin, Robust low-rank regularized regression for face recognition with occlusion, The Workshop of IEEE Conference on Computer Vision and Pattern Recognition, 2014, 21-26.
[58]	[ X. Ren and Z. Lin, Linearized alternating direction method with adaptive penalty and warm starts for fast solving transform invariant low-rank textures, International Journal of Computer Vision, 104(2013), 1-14.
[59]	[ A. P. Singh and G. J. Gordon, A unified view of matrix factorization models, in Proceedings of Machine Learning and Knowledge Discovery in Databases, 5212(2008), 358-373.
[60]	[ H. Tan, J. Feng, G. Feng, W. Wang and Y. Zhang, Traffic volume data outlier recovery via tensor model, Mathematical Problems in Engineering, 2013(2013), 164810.
[61]	[ M. Tso, Reduced-rank regression and canonical analysis, Journal of the Royal Statistical Society, Series B (Methodological), 43(1981), 183-189.
[62]	[ R. Vidal, Y. Ma and S. Sastry, Generalized principal component analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(2005), 1945-1959.
[63]	[ R. Vidal, Subspace clustering, IEEE Signal Processing Magazine, 28(2011), 52-68.
[64]	[ J. Wang, V. Saligrama and D. Castanon, Structural similarity and distance in learning, Annual Allerton Conf. Communication, Control and Computing, 2011, 744-751.
[65]	[ Y.-X. Wang and Y.-J. Zhang, Nonnegative matrix factorization:A comprehensive review, IEEE Transactions on Knowledge and Data Engineering, 25(2013), 1336-1353.
[66]	[ S. Wei and Z. Lin, Analysis and improvement of low rank representation for subspace segmentation, arXiv:1107.1561.
[67]	[ Z. Wen, W. Yin and Y. Zhang, Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm, Mathematical Programming Computation, 4(2012), 333-361.
[68]	[ J. Wright, A. Ganesh, S. Rao, Y. Peng and Y. Ma, Robust principal component analysis:Exact recovery of corrupted low-rank matrices via convex optimization, Advances in Neural Information Processing Systems, 2009, 2080-2088.
[69]	[ L. Wu, A. Ganesh, B. Shi, Y. Matsushita, Y. Wang and Y. Ma, Robust photometric stereo via low-rank matrix completion and recovery, Asian Conference on Computer Vision, 2010, 703-717.
[70]	[ L. Yang, Y. Lin, Z. Lin and H. Zha, Low rank global geometric consistency for partial-duplicate image search, International Conference on Pattern Recognition, 2014, 3939-3944.
[71]	[ M. Yin, J. Gao and Z. Lin, Laplacian regularized low-rank representation and its applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2016), 504-517.
[72]	[ Y. Yu and D. Schuurmans, Rank/norm regularization with closed-form solutions:Application to subspace clustering, Uncertainty in Artificial Intelligence, 2011, 778-785.
[73]	[ H. Zhang, Z. Lin and C. Zhang, A counterexample for the validity of using nuclear norm as a convex surrogate of rank, European Conference on Machine Learning, 8189(2013), 226-241.
[74]	[ H. Zhang, Z. Lin, C. Zhang and E. Chang, Exact recoverability of robust PCA via outlier pursuit with tight recovery bounds, AAAI Conference on Artificial Intelligence, 2015, 3143-3149.
[75]	[ H. Zhang, Z. Lin, C. Zhang and J. Gao, Robust latent low rank representation for subspace clustering, Neurocomputing, 145(2014), 369-373.
[76]	[ H. Zhang, Z. Lin, C. Zhang and J. Gao, Relation among some low rank subspace recovery models, Neural Computation, 27(2015), 1915-1950.
[77]	[ T. Zhang, B. Ghanem, S. Liu and N. Ahuja, Low-rank sparse learning for robust visual tracking, European Conference on Computer Vision, 7577(2012), 470-484.
[78]	[ Z. Zhang, A. Ganesh, X. Liang and Y. Ma, TILT:Transform invariant low-rank textures, International Journal of Computer Vision, 99(2012), 1-24.
[79]	[ Z. Zhang, X. Liang and Y. Ma, Unwrapping low-rank textures on generalized cylindrical surfaces, International Conference on Computer Vision, 2011, 1347-1354.
[80]	[ Z. Zhang, Y. Matsushita and Y. Ma, Camera calibration with lens distortion from low-rank textures, IEEE Conference on Computer Vision and Pattern Recognition, 2011, 2321-2328.
[81]	[ Y. Zheng, X. Zhang, S. Yang and L. Jiao, Low-rank representation with local constraint for graph construction, Neurocomputing, 122(2013), 398-405.
[82]	[ X. Zhou, C. Yang, H. Zhao and W. Yu, Low-rank modeling and its applications in image analysis, ACM Computing Surveys, 47(2014), p36.
[83]	[ G. Zhu, S. Yan and Y. Ma, Image tag refinement towards low-rank, content-tag prior and error sparsity, in International conference on Multimedia, 2010, 461-470.
[84]	[ L. Zhuang, H. Gao, Z. Lin, Y. Ma, X. Zhang and N. Yu, Non-negative low rank and sparse graph for semi-supervised learning, IEEE International Conference on Computer Vision and Pattern Recognition, 2012, 2328-2335.
[85]	[ W. Zuo and Z. Lin, A generalized accelerated proximal gradient approach for total-variation-based image restoration, IEEE Transactions on Image Processing, 20(2011), 2748-2759.

This article has been cited by:

1.	Wei Gao, Behzad Ghanbari, Haci Mehmet Baskonus, New numerical simulations for some real world problems with Atangana–Baleanu fractional derivative, 2019, 128, 09600779, 34, 10.1016/j.chaos.2019.07.037
2.	Clarisse Lins de Lima, Ana Clara Gomes da Silva, Giselle Machado Magalhães Moreno, Cecilia Cordeiro da Silva, Anwar Musah, Aisha Aldosery, Livia Dutra, Tercio Ambrizzi, Iuri V. G. Borges, Merve Tunali, Selma Basibuyuk, Orhan Yenigün, Tiago Lima Massoni, Ella Browning, Kate Jones, Luiza Campos, Patty Kostkova, Abel Guilhermino da Silva Filho, Wellington Pinheiro dos Santos, Temporal and Spatiotemporal Arboviruses Forecasting by Machine Learning: A Systematic Review, 2022, 10, 2296-2565, 10.3389/fpubh.2022.900077

Reader Comments

Your name:*

Email:*
© 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)