Editorial Special Issues

Autism and neuro-immune-gut link

  • Received: 28 February 2018 Accepted: 20 June 2018 Published: 27 June 2018
  • Recent evidences sustain the hypothesis that host-bacteria interactions play a critical role in regulating tissue and body homeostasis. Gut microbiota and the brain are strongly interconnected and share communication pathways. Modifications in gut bacteria compositions are correlated to changes in behaviors. Indeed, autism spectrum disorders (ASD) are linked to dysfunctions of the gut bacteria-brain axis. Possible therapeutic strategies in ASD management will aim to restore dysbiosis and gut bacteria imbalance.

    Citation: Dario Siniscalco, Anna Lisa Brigida, Nicola Antonucci. Autism and neuro-immune-gut link[J]. AIMS Molecular Science, 2018, 5(2): 166-172. doi: 10.3934/molsci.2018.2.166

    Related Papers:

    [1] Pawan Lingras, Farhana Haider, Matt Triff . Fuzzy temporal meta-clustering of financial trading volatility patterns. Big Data and Information Analytics, 2017, 2(3): 219-238. doi: 10.3934/bdia.2017018
    [2] Ricky Fok, Agnieszka Lasek, Jiye Li, Aijun An . Modeling daily guest count prediction. Big Data and Information Analytics, 2016, 1(4): 299-308. doi: 10.3934/bdia.2016012
    [3] Bill Huajian Yang . Modeling path-dependent state transitions by a recurrent neural network. Big Data and Information Analytics, 2022, 7(0): 1-12. doi: 10.3934/bdia.2022001
    [4] Elnaz Delpisheh, Aijun An, Heidar Davoudi, Emad Gohari Boroujerdi . Time aware topic based recommender System. Big Data and Information Analytics, 2016, 1(2): 261-274. doi: 10.3934/bdia.2016008
    [5] Prince Peprah Osei, Ajay Jasra . Estimating option prices using multilevel particle filters. Big Data and Information Analytics, 2018, 3(2): 24-40. doi: 10.3934/bdia.2018005
    [6] Ioannis D. Schizas, Vasileios Maroulas, Guohua Ren . Regularized kernel matrix decomposition for thermal video multi-object detection and tracking. Big Data and Information Analytics, 2018, 3(2): 1-23. doi: 10.3934/bdia.2018004
    [7] Yaru Cheng, Yuanjie Zheng . Frequency filtering prompt tuning for medical image semantic segmentation with missing modalities. Big Data and Information Analytics, 2024, 8(0): 109-128. doi: 10.3934/bdia.2024006
    [8] Jiaqi Ma, Hui Chang, Xiaoqing Zhong, Yueli Chen . Risk stratification of sepsis death based on machine learning algorithm. Big Data and Information Analytics, 2024, 8(0): 26-42. doi: 10.3934/bdia.2024002
    [9] J. Kent Poots, Nick Cercone . First steps in the investigation of automated text annotation with pictures. Big Data and Information Analytics, 2017, 2(2): 97-106. doi: 10.3934/bdia.2017001
    [10] John A. Doucette, Robin Cohen . A testbed to enable comparisons between competing approaches for computational social choice. Big Data and Information Analytics, 2016, 1(4): 309-340. doi: 10.3934/bdia.2016013
  • Recent evidences sustain the hypothesis that host-bacteria interactions play a critical role in regulating tissue and body homeostasis. Gut microbiota and the brain are strongly interconnected and share communication pathways. Modifications in gut bacteria compositions are correlated to changes in behaviors. Indeed, autism spectrum disorders (ASD) are linked to dysfunctions of the gut bacteria-brain axis. Possible therapeutic strategies in ASD management will aim to restore dysbiosis and gut bacteria imbalance.


    1. Introduction

    With the development of wireless communication and positioning technology, more and more data about locations are collected and used for many applications, such as location-based services. In particular, the format of the data could be generalized to ($location$, $time$, $texts$), where the $location$ represents the geo-coordinates where event happens (or some object lies), and $time$ means the valid time when the event happens at $location$ (or the object is at $location$), and $texts$ are used to describe or record the state of the event (or object). For instance, someone publishes a micro-blog associated with geo-location tag at 6:30 pm, in which some comments about a restaurant are attached. Another example about real-estate advertising is that every advertisement contains geo-coordinates of the property, the publishing time and texts describing the property. Users are often interested in the spatial, temporal and textual attributes of the data, and retrieve useful information through the combination of these three dimensions. Such queries are called spatio-temporal keyword (STK) queries. For instance, find the users publishing micro-blogs containing words pasta and pizza in the range of this shopping mall yesterday.

    However, tens of billions of spatio-temporal data would be a serious problem for applications. HBase [1] is a distributed non-sql, key-value database, and is capable to store such a huge amount of data. However, due to the non-sql characteristics of HBase, it is inefficient to process spatio-temporal keyword queries using the built-in functions of HBase. Consequently, it is highly necessary to design efficient algorithms for processing spatio-temporal keyword queries in HBase. A previous work [9] about processing spatial or multi-attributes queries in HBase is different from ours, because they merely consider spatial dimension in query processing. To the best of our knowledge, this is the first work for processing spatio-temporal keyword queries in HBase.

    Our motivation is to adopt HBase to process STK queries efficiently. First, a space filling curve, Hilbert curve [8], is used to encode the spatial dimension into one-dimensional codes, i.e., for a location referred by geo-coordinates, it is represented by a number denoting the relative position in the original space. Based on this, we design a suitable access model for HBase, containing row keys for indexing spatio-temporal dimensions and Bloom filters [3] for fast detecting the existence of query keywords. After that, we develop two algorithms, one is suitable for the query with ordinary selectivity, the other is a parallel algorithm for the large range query. We evaluate our algorithms on a real dataset, and the results show that our algorithms are capable to handle spatio-temporal keyword queries efficiently. In summary, we make the following contributions:

    ● We propose spatio-temporal keyword queries processing problem in HBase, which is a new challenge for HBase platform.

    ● We design a novel access model for HBase to store spatial, temproal and textual information.

    ● We propose efficient algorithms for processing spatio-temporal keyword queries in HBase.

    The rest of this paper is organized as follows. Section 2 reviews related works on spatio-temporal queries and HBase queries. Section 3 formally defines spatio-temporal keyword problem. Section 4 presents an access model for HBase. Algorithms for spatio-temporal keyword queries are presented in section 5. And we experimentally evaluate our algorithms in section 6. Finally, section 7 concludes the paper with directions for future works.


    2. Related work

    To our knowledge, the state of the art for spatio-temporal keyword queries in HBase is less well studied. However, some researches focusing on distributed index could be referenced. As an attractive choice for large-scale data processing, Cloud storage system currently adopts a hash-like approach to retrieve data that only support simple keyword-based queries, but lacks various forms of information search. To overcome this disadvantage, Zhou et al. [10] propose a novel SkipNet and B$^{+}$-tree based index structure, called SNB-index. SNB-index uses the B$^{+}$-tree to construct efficient local index in the lower layer, while it selects local index nodes to form a SkipNet-based global overlay in the upper layer. SNB-index also can be employed in systems such as GFS and Hadoop [2], which ensures it scalable and flexible. Experimental results show that SNB-index is valid and can be an alternative approach for constructing an auxiliary index in Cloud computing systems.

    For supporting similarity search in cloud system, Cheng et al. [4] propose VF-CAN, a novel indexing scheme, which integrates content addressable network (CAN) based routing protocol and the improved vector approximation file (VA-file) index. There are two index levels in this scheme: global index and local index. In the local index, VA-file approximation vectors are clustered by k-means according to their degree of proximity. In the global index, storage nodes are organized into an overlay network CAN, and only clustering information of local index is issued to the entire overlay network through the CAN interface. The experimental results show that VF-CAN reduces the index storage space and improves query performance effectively.

    For multi-dimensional data applications, Nishimura et al. [9] proposes MD-HBase, a scalable multi-dimensional data store supporting efficient multi-dimensional range and nearest neighbor queries. MD-HBase layers a multi-dimensional index structure over a range partitioned key-value store. Using a design based on linearization, its implementation layers standard index structures like K-D trees and Quad trees.

    As we have mentioned above, many applications not only require finding objects closest to a specified location, but also that containing a set of keywords. Felipe et al. [6] present an efficient method to answer top-k spatial keyword queries. It adopts a structure called IR$^2$-Tree (Information Retrieval R-Tree) which combines an R-Tree with superimposed text signatures. Algorithms are proposed to construct and maintain an IR$^2$-Tree, and answer top-k spatial keyword queries. With experimentally comparison with current methods, superior performance and excellent scalability of the algorithms are shown.

    Cong et al. [5] propose a novel indexing framework for location top-k text retrieval. It integrates the inverted file for text retrieval and the R-tree for spatial proximity query. Several hybrid indexing approaches are explored within the framework. Algorithms utilize the proposed indexes to compute the top-k query by taking into account both text relevancy and location proximity to prune the search space.

    In practical applications, it is need to consider the nature of the movement of objects. Traditional indexes have good query performance but can not handle this data processing problem in that they are not efficient for update which is crucial for an index for moving objects, as they change their position frequently. Jensen et al. [7] represent moving-object locations as vectors that are time stamped based on their update time. Then B$^+$-tree based data structure is adopted to index moving objects according to their time stamp and otherwise preserves spatial proximity. This method supports range and nearest neighbor queries, as well as continuous queries. B$^x$-tree which is based on the B$^+$-tree, outperforms the TPR-tree in range queries and kNN queries concerning the current or near-future positions of the objects.


    3. Problem definition

    In this section, we first present the definitions for spatio-temporal keyword data and queries, and then describe the HBase logical table. For simplicity, only two-dimensional space is considered in this paper, however, our method can be directly extended into higher dimensional space.

    Spatio-temporal data. A record $r$ of spatio-temporal data can be denoted as $\langle x$, $y$, $t$, $W$, $A \rangle$, where ($x$, $y$) means the geo-location of the record, $t$ means the valid time when the data is produced, $W$ is a set of keywords, $W$ = {$w_{1}$, $w_{2}$, $\ldots $, $w_{n}$}, $A$ represents other attributes, such as user-id, object's shape, etc.

    Spatio-temporal keyword query. Given a set $U_{st}$ of records of spatio-temporal data, a spatio-temporal keyword (STK) query $Q$ = $\langle R_{q}, t_{s}, t_{e}, W_{q}\rangle$, where $R_{q}$ = ($x_{l}$, $y_{l}$, $x_{u}$, $y_{u}$) means a query range with lower-left coordinate ($x_{l}$, $y_{l}$) and upper-right coordinate ($x_{u}$, $y_{u}$), $W_{q}$ = {$w_{q_{1}}$, $w_{q_{2}}$, $\ldots $, $w_{q_{m}}$} is a set of query keywords, aims to find a subset $S_{st}$ = {$r_{s}$ $|$ $r_{s}$ $\in$ $U_{st}$} satisfying that:

    $ r_{s}.(x, y) \in R_{q} $
    $ r_{s}.t \in [t_{s}, t_{e}] $
    $ r_{s}.W \cap W_{q} \neq \phi $

    Logical view of HBase table. Without loss of generality, we give the descriptions for HBase table. HBase is a distributed key-value database which consists of a number of computing nodes cooperatively processing large-scale data. A physical table in HBase is partitioned into several regions each of which is maintained by a node. From the logical view, a table is similar to a grid, where a cell can be located by the given row identifier and column identifier. Row identifiers are implemented by row keys ($rk$) which are index by B$^{+}$-tree, and the column identifier is represented by column family ($cf$) + column key ($ck$), where a column family consists of several column keys. The value in a cell can be referred to as the format ($rk$, $cf:ck$). Table 1 shows a logical view of a table in HBase. For instance, value $v_{1}$ can be referred to as ($rk_{1}$, $cf_{1}$:$ck_{1}$).

    Table 1. A HBase Logical Table.
    $cf_{1}$$cf_{2}$
    $ck_{1}$$ck_{2}$$ck_{3}$$ck_{a}$$ck_{b}$
    $rk_{1}$$v_{1}$$v_{2}$$v_{3}$$v_{4}$$v_{5}$
    $rk_{2}$$v_{6}$$v_{7}$$v_{8}$$v_{9}$$v_{10}$
     | Show Table
    DownLoad: CSV

    4. Spatio-temporal keyword access model for HBase

    In this section, we describe a spatio-temporal keyword access (STKA) model for the logical view of HBase table. The model is built based on Hilbert curve and Bloom filter. We first introduce Hilbert curve, which maps high-dimensional data into one-dimensional space and then describe the access model.


    4.1. Hilbert curve

    Hilbert curve is a kind of space filling curve which maps multi-dimensional space into one-dimensional space. In particular, the whole space is partitioned into equal-size cells and then a curve is passed through each cell for only once in term of some sequence, so that every cell is assigned a sequence number. Different space filling curves are distinguished by different sequencing methods. Due to information loss in the transformation, different space filling curves are evaluated by the criteria, locality preservation, meaning that how much the change of proximities is from original space to one-dimensional space. Hilbert curve is proved to be the best locality preserved space filling curve. With Hilbert curve, any object in the original space is transformed into [0, $2^{2\lambda}-1$] space, where $\lambda$ is called the order of Hilbert curve. Figure 1 shows four Hilbert curves in two-dimensional space with $\lambda$ = 1, 2, 3 and 4. In Figure 1 (b), there are two points $p_{1}$ and $p_{2}$, and the sequence number 2 and 13 is obtained through the curve for them, respectively. Note that $p_{1}$ and $p_{2}$ are neighbors in the original space, but apart from each other by 11 unit distances in one-dimensional space.

    Figure 1. Hilbert Curves.

    We describe two functions for Hilbert curve, one is mapping a point in the original space to a value in one-dimensional space, the other is mapping a range window to a series of intervals. Specifically, for a Hilbert curve with order = $\lambda$,

    $coorToCode(p)$. Given a point $p$ = ($x_{1}$, $x_{2}$, $\ldots $, $x_{n}$) in $n$-dimensional space $\mathbb{S}$, $coorToCode(p)$ returns a cell number (between $0$ and $2^{2\lambda}-1$) referring the cell where $p$ lies within $\mathbb{S}$.

    $rectToIntervals(R)$. Given a range window $R$ = ($x^{l}_{1}$, $x^{l}_{2}$, $\ldots $, $x^{l}_{n}$, $x^{u}_{1}$, $x^{u}_{2}$, $\ldots $, $x^{u}_{n}$) in $n$-dimensional space $\mathbb{S}$, where $x^{l}_{i}$ and $x^{u}_{i}$ ($1 \leq i \leq n$) are the lower and upper bound of the $i$th-dimension, respectively, $rectToIntervals(R)$ returns a series of intervals representing the cells intersecting with $R$ in $\mathbb{S}$.

    For instance, in Figure 1 (b), $coorToCode(p_{1})$ = 2, $coorToCode(p_{2})$ = 13, and $rectToIntervals(R_{1})$ = {[1,2], [7,8], [11,15]}.


    4.2. STKA model

    For each record $r$ = $\langle x$, $y$, $t$, $W$, $A \rangle$, the row key is generated as $t\oplus coorToCode(x, y)$. Note that for different records, they may share the same row key, which means a row key might correspond to several records. And we use Bloom filter to denote the union of keyword sets of records. A Bloom filter is a space-efficient probabilistic data structure which is used to test whether an element is a member of a set. If the returned value is false for testing whether element $e$ belongs to set $C$, then it is definite that $e$ is not in $C$. However, if the result is true, $e$ may be in $C$, which means false-positives are possible and further inspection should be taken. In particular, for a set of records $S$ = {$r_{1}$, $r_{2}$, $\ldots $, $r_{m}$}, the Bloom filter $B_{S}$ = $addSet$({$w_{i, j}$ $|$ $w_{i, j}$ $\in$ $W_{i}$, $1\leq j \leq |W_{i}|$, $1\leq i \leq m$}), where function $addset()$ means inserting keyword list into Bloom filter structure, and $W_{i}$ is the keyword set of $r_{i}$ ($1\leq i \leq m$).

    The STKA model is $\langle$$rowkey$, $B$, $listof$($x$, $y$, $W$, $A$)$\rangle$, where $rowkey$ identifies a set of records $S$, each of which has $t\oplus coorToCode(x, y)$ = $rowkey$, and $B$ is the Bloom filter of union of the keywords in $S$, and $listof$($x$, $y$, $W$, $A$) is a list, each element in which contains geo-location, keyword set, and other attributes of each records in $S$.

    The STKA model is suitable for the logical view of HBase table. Table 2 shows an example of the model. For storing $listof$($x$, $y$, $W$, $A$), multiple column groups ($loc$, $wordlist$, $attr$) are used.

    Table 2. STKA model for HBase.
    $stdata$
    $bf$$loc_{1}$$wordlist_{1}$$attr_{1}$$loc_{2}$$wordlist_{2}$$attr_{2}$$\ldots $
    $rowkey_{1}$$B_{1}$$(x_{1}, y_{1})$ $W_{1}$$A_{1}$$(x_{2}, y_{2})$ $W_{2}$$A_{2}$$\ldots $
    $\ldots $$\ldots $$\ldots $$\ldots $$\ldots $$\ldots $$\ldots $$\ldots $$\ldots $
    $rowkey_{k}$$B_{k}$$(x_{m}, y_{m})$ $W_{m}$$A_{m}$$(x_{n}, y_{n})$ $W_{n}$$A_{n}$$\ldots $
     | Show Table
    DownLoad: CSV

    5. Algorithms for STK queries

    In this section, we first describe the processing for STK queries on original model. Considering massive data processing, we use MapReduce [2] framework to design parallel algorithm for STK queries.


    5.1. STK queries

    For a STK query $Q$ = $\langle R_{q}, t_{s}, t_{e}, W_{q}\rangle$, where $R_{q}$ = ($x_{l}$, $y_{l}$, $x_{u}$, $y_{u}$) means a query range, $W_{q}$ = {$w_{q_{1}}$, $w_{q_{2}}$, $\ldots $, $w_{q_{m}}$} is a set of query keywords, the basic idea for spatio-temporal range queries processing is to transform the spatial range and temproal range into a series of row keys, and using these row keys to retrieve the records and examin them whether they satisfy the query condition. To be more specific, firstly, the query spatial range is converted into a series of Hilbert value intervals. Then, according to row key design patterns, row keys are generated by time concatenating the Hilbert value. After that, we use $scan$, an interface of HBase, to retrieve the records within the row key range, and then we use Bloom filter to test whether these records intersect with the query keywords. Finally, to avoid false-positive, those satisfied records in the previous step must be further examined. Algorithm 1 describes the processing for STK queries in detail.

    Table . .
    Algorithm 1 Spatio-temporal Keyword Query
    Input:
    $(t_{s}, t_{e})$ //temporal range of query
    $R_{q} (x_{l}, y_{l}, x_{u}, y_{u})$ //spatial range of query
    $W_{q}$ //query keywords
    Output:
    $Qlist$ //result list
    1: $intervals$ = $rectToIntervals(R_{q})$;
    2: $table = connectHTable()$; //connect to table in HBase for query
    3: while $t_{s} < t_{e}$ do
    4: for each $interval$ in $intervals$ do
    5: $startrowkey = t_{s} + interval.Henter$; //starting row key
    6: $endrowkey = t_{s} + interval.Hexit$; //ending row key
    7: $rowkeyRange=(startrowkey, endrowkey)$; //row key range for scanning
    8: $Scan=scan(rowkeyRange)$; //scan table
    9: $ResultScanner=table.getScanner(Scan)$; //results of scanning
    10: for each $Result$ in $ResultScanner$ do
    11: if $Result.bf.intersect(W_{q}) \neq null$ then
    12:while $(str=Result.getNextSTRecord()) \neq null$ do
    13: if $(str.loc \in R_{q}) && (str.wordlist \cap W_{q}\neq \phi)$ then
    14: $Qlist.add(str)$; //add result to query result list
    15: end if
    16: end while
    17: end if
    18: end for
    19: end for
    20: $t_{s}=t_{s} + 1$;
    21: end while
     | Show Table
    DownLoad: CSV

    For the sake of legibility, some functions and variables of the algorithm are explained below:

    1. $rectToIntervals(R)$ returns a series of intervals, each of which is denoted as $[Henter, Hexit]$;

    2. $connectHTable()$ connects to HBase and returns the table;

    3. $rowkeyRange$ is a range of row keys, represented by a starting row key and an ending row key;

    4. $ResultScanner$ is a collection of rows retrieved by scanning table, whose row keys are in $rowkeyRange$;

    5. $bf.intersect(W_{q})$ is the function of Bloom filter $bf$ to test whether $bf$ intersects with $W_{q}$;

    6. $getNextSTRecord()$ returns next spatio-temporal record in a row;

    7. $Qlist$ is the result list for the query.

    In line 1, $R_{q}$ is transformed into a series of intervals. From line 3 to 20, the involved rows are retrieved and examined. In line 9, a collection of rows $ResultScanner$ is retrieved, corresponding to a time stamp in $(t_{s}, t_{e})$ and an area in $R_{q}$. The Bloom filter of the union of keywords in each row is examined whether it intersects with $W_{q}$ in line 11. If the intersection is not null, every spatio-temporal record in the row is further examined on location and word list (from line 12 to 16). For the satisfied record, the result is added into $Qlist$ (line 14).


    5.2. Parallel STK queries

    As we have mentioned above, with the development of wireless communication and positioning technology, more and more data about locations are collected, so it is necessary to consider the problems of massive data processing. For a STK query with large range predicates, a great number of rows are retrieved, which is a bottleneck of performance. Consequently, it is necessary to design parallel algorithm which involved computing nodes to process the query simultaneously, so that throughput and efficiency increase. Based on MapReduce framework, we design a parallel algorithm for STK queries.

    The basic work flow is that, firstly, an intermediate file containing all the involved row key ranges for the query is generated, in which each record corresponds to a row key range in HBase. Then Map procedure reads the records in the file and outputs them to Reduce procedure. And then in the Reduce, corresponding nodes retrieve and examine rows simultaneously to get results. Algorithm 2 describes the parallel algorithm for STK queries based on MapReduce framework.

    Table . .
    Algorithm 2 Parallel Spatio-temporal Keyword Query
    Input:
    $(t_{s}, t_{e})$ //temporal range of query
    $R_{q} (x_{l}, y_{l}, x_{u}, y_{u})$ //spatial range of query
    $W_{q}$ //query keywords
    Output:
    $Qlist$ //result list
    1: /*generating intermediate file $Filerecord$ for Map and Reduce*/
    2: $intervals$ = $rectToIntervals(R_{q})$;
    3: while $t_{s} < t_{e}$ do
    4: for each $interval$ in $intervals$ do
    5:$startrowkey = t_{s} + interval.Henter$; //starting row key
    6:$endrowkey = t_{s} + interval.Hexit$; //ending row key
    7:$rowkeyRange=(startrowkey, endrowkey)$; //row key range for scanning
    8:$Filerecord.write(rowkeyRange)$; //write $rowkeyRange$ range to $Filerecord$
    9:end for
    10:$t_{s}=t_{s} + 1$;
    11: end while
    12: /*Map input: file $Filerecord$ */
    13: while $(record=Filerecord.getNextRecord()) \neq null$ do
    14: $context.write(record)$; //output the record
    15: end while
    16: /*Reduce input: output of Map */
    17: $rowkeyRange=parse(record)$; //parse record into $rowkey$ range
    18: $table=connectHTable()$; //connect to table in HBase for query
    19: $Scan=scan(rowkeyRange)$; //start scanning
    20: $ResultScanner=table.getScanner(Scan)$; //results of scanning
    21: for each $Result$ in $ResultScanner$ do
    22:if $Result.bf.intersect(W_{q}) \neq null$ then
    23:while $(str=Result.getNextSTRecord()) \neq null$ do
    24:if $(str.loc \in R_{q}) && (str.wordlist \cap W_{q}\neq \phi)$ then
    25:$Qlist.add(str)$; //add result to query result list
    26:end if
    27:end while
    28:end if
    29:end for
     | Show Table
    DownLoad: CSV

    Pseudo-codes between line 2 and 11 describe the generation for the intermediate file $Filerecord$. In line 8, the row key range is written into the file in terms of file record in the loops in line 3 and 4. Map procedure is showed between line 13 and 15, and $context.write(record)$ means outputting $record$ to Reduce step. Pseudo-codes between line 2 and line 11 describe Reduce procedure. In line 17, the record passed by Map is parsed into the row key range. The remaining codes are similar to Algorithm 1.


    6. Experimental evaluation

    We evaluate our algorithms on a real dataset, which contains trajectories of taxis in Beijing. In particular, each record in the dataset contains vehicle ID, geo-location, recording time stamp, etc. For each record, we randomly assign it a list of keywords with size varied from 7 to 15 words. For comparison, we extract 5 datasets in different sizes from the original one in terms of temporal range. Table 3 shows the datasets in detail.

    Table 3. Dataset.
    Dataset Temporal Range Size (records)
    dataset10:00-0:48, Nov. 1st1 million
    dataset20:00-8:25, Nov. 1st10 million
    dataset30:00-23:59, Nov. 1st30 million
    dataset4Nov. 1st & Nov. 2nd60 million
    dataset5Nov. 1st, Nov. 2nd & Nov. 3rd100 million
     | Show Table
    DownLoad: CSV

    Our algorithms are implemented in Java, and run on a three-node cluster with Hadoop 2.5.1 and HBase 0.98.6, in which each node is equipped with Intel(R) Core(TM) i3 CPU @ 3.40GHz, 4GB main memory, and 500GB storage, and operating system is CentOS release 6.5 64bit, and network bandwidth is 10Mbps.


    6.1. STK queries

    First, we evaluate the algorithm for STK queries. And we introduce two parameters to test the algorithm under various conditions. One is selectivity $\theta$ defined as:

    $ \theta = \frac{L_{(t_{s}, t_{e})}}{L_{t}} \cdot \frac{A_{R_{q}}}{A_{\mathbb{S}}} $

    where $L_{(t_{s}, t_{e})}$ means the length of query temporal range $(t_{s}, t_{e})$, $L_{t}$ means the length of temporal extent of the dataset, $A_{R_{q}}$ means the area of query spatial range $R_{q}$, and $A_{\mathbb{S}}$ means the area of the whole space. Selectivity specifies the size of the query range, and the larger $\theta$ is, the more spatio-temporal records are involved. The other parameter is the number of query keywords $\kappa$, and a larger $\kappa$ will recall more records.

    Firstly, we fix $\kappa$ = 5(in reality, the number of keywords users usually query is less than 5), and vary $\theta$ from 3 % to 50 %. For each value of $\theta$, we issue 10 queries with different temporal ranges and spatial ranges, and collect the average response time as the measurement of performance. Figure 2 shows the results.

    Figure 2. Response Time vs. Selectivity ($\kappa$ = 5).

    We can see that response time increases with $\theta$ for all the datasets. This is because a larger selectivity would access more records to be retrieved and examined, which increases the processing time. Furthermore, the response time grows linearly with $\theta$. This can be explained by the transformation from spatio-temporal dimensions into one-dimensional space, and by the fast detection of Bloom filter for keywords matching. Another observation is that with the enlargement of dataset size(the amount of data from dataset1 to dataset5 increases), the response time also increases. The explanation is that for a fixed selectivity $\theta$, STK query would recall more records with the increase of data amount.

    We fix $\theta$ = 15 %, and vary $\kappa$ from 3 to 10. The average response time is computed similarly to the previous one. Figure 3 shows the results.

    Figure 3. Response Time vs. Number of Query Keywords ($\theta$ = 15 %).

    Similarly, the response time increases with $\kappa$. The explanation is similar to the previous case, i.e., a larger $\kappa$ would involve more records to be accessed.


    6.2. Parallel STK queries

    After studying the performance of STK queries, we compare parallel STK query with the original one to observe the improvement. For both of them, we issue 5 different queries on dataset 1, and Table 4 shows the conditions in detail.

    Table 4. Query Conditions.
    Conditions Temporal Range(min) Spatial Range(km) Number of Keywords Number of Scanned Records
    c110133987
    c2202518786
    c3303542457
    c4404799789
    c550510108040
     | Show Table
    DownLoad: CSV

    Figure 4 shows the results. We can see that at the beginning, i.e, the number of scanned records is not much, STK query is faster than the parallel one, because the cost of writing and parsing the intermediate file in the MapReduce procedure impacts the performance. However, with the query range enlarged, the parallel STK query is more efficient than the original one. This can be explained by the parallelism of processing the query, making accessing and comparing to operate simultaneously. Therefore, the parallel STK queries are applicable for the condition with large selectivity.

    Figure 4. STK Query vs. Parallel STK Query.

    7. Conclusion

    In this paper, we propose the spatio-temporal keyword queries in HBase, which is a new problem for HBase platform. We devise a proper access model for HBase, utilized Hilbert curve and Bloom filter. Two algorithms are developed suitable for ordinary and large query selectivities, respectively. We conduct experiments on a real dataset, and the results show our methods are capable for large scale spatio-temporal keyword queries.

    In the future, we plan to extend our work to the inner structure of HBase index table, in order to improve efficiency of spatio-temporal keyword queries.


    Acknowledgments

    This work is supported by NSF of China grant 61303062. We would like to thank Peijun He for helping with the implementation.


    [1] Blikslager AT, Moeser AJ, Gookin JL, et al. (2007) Restoration of barrier function in injured intestinal mucosa. Physiol Rev 87: 545. doi: 10.1152/physrev.00012.2006
    [2] Podolsky DK (1999) V. Innate mechanisms of mucosal defense and repair: The best offense is a good defense. Am J Physiol 277: G495–G499.
    [3] Kunzelmann K, Mall M (2002) Electrolyte transport in the mammalian colon: Mechanisms and Implications for disease. Physiol Rev 82: 245–289. doi: 10.1152/physrev.00026.2001
    [4] Ferraris RP, Diamond J (1997) Regulation of intestinal sugar transport. Physiol Rev 77: 257–302. doi: 10.1152/physrev.1997.77.1.257
    [5] Groschwitz KR, Hogan SP (2009) Intestinal barrier function: Molecular regulation and disease pathogenesis. J Allergy Clin Immunol 124: 3–22. doi: 10.1016/j.jaci.2009.05.038
    [6] Bischoff S, Barbara G, Buurman W, et al. (2014) Intestinal permeability-A new target for disease prevention and therapy. BMC Gastroenterol 14: 189–214. doi: 10.1186/s12876-014-0189-7
    [7] Van Itallie CM, Holmes J, Bridges A, et al. (2008) The density of small tight junction pores varies among cell types and is increased by expression of claudin-2. J Cell Sci 121: 298–305. doi: 10.1242/jcs.021485
    [8] Ulluwishewa D, Anderson RC, Mcnabb WC, et al. (2011) Regulation of tight junction permeability by intestinal bacteria and dietary components. J Nutr 141: 769–776. doi: 10.3945/jn.110.135657
    [9] De Magistris L, Picardi A, Sapone A, et al. (2014) Intestinal barrier in autism, In: Patel VB (ed.), Comprehensive guide to autism, New York: Springer, 123.
    [10] Catassi C, Fasano A (2008) Celiac disease. Curr Opin Gastroenterol 24: 687–691. doi: 10.1097/MOG.0b013e32830edc1e
    [11] Bjarnason I, Macpherson A, Hollander D (1995) Intestinal permeability: An overview. Gastroenterology 108: 1566–1581. doi: 10.1016/0016-5085(95)90708-4
    [12] Fasano A (2011) Zonulin and its regulation of intestinal barrier function: The biological door to inflammation, autoimmunity and cancer. Physiol Rev 91: 151–175. doi: 10.1152/physrev.00003.2008
    [13] Lerner A, Matthias T (2015) Changes in intestinal tight junction permeability associated with industrial food additives explain the rising incidence of autoimmune disease. Autoimmun Rev 14: 479–489. doi: 10.1016/j.autrev.2015.01.009
    [14] Siniscalco D, Cirillo A, Bradstreet JJ, et al. (2013) Epigenetic findings in autism: New perspectives for therapy. Int J Environ Res Public Health 10: 4261–4273. doi: 10.3390/ijerph10094261
    [15] Siniscalco D, Antonucci N (2013) Possible use of Trichuris suis ova in autism spectrum disorders therapy. Med Hypotheses 81: 1–4. doi: 10.1016/j.mehy.2013.03.024
    [16] Wakefield AJ (2002) The gut-brain axis in childhood developmental disorders. J Pediatr Gastroenterol Nutr 34: S14–S17. doi: 10.1097/00005176-200205001-00004
    [17] Siniscalco D (2014) Gut bacteria-brain axis in autism. Autism 4: e124.
    [18] Siniscalco D, Antonucci N (2013) Involvement of dietary bioactive proteins and peptides in autism spectrum disorders. Curr Protein Pept Sci 14: 674–679.
    [19] Trivedi MS, Shah JS, Al-Mughairy S, et al. (2014) Food-derived opioid peptides inhibit cysteine uptake with redox and epigenetic consequences. J Nutr Biochem 25: 1011–1018. doi: 10.1016/j.jnutbio.2014.05.004
    [20] Frustaci A, Neri M, Cesario A, et al. (2012) Oxidative stress-related biomarkers in autism: Systematic review and meta-analyses. Free Radic Biol Med 52: 2128–2141. doi: 10.1016/j.freeradbiomed.2012.03.011
    [21] Melnyk S, Fuchs GJ, Schulz E, et al. (2012) Metabolic imbalance associated with methylation dys-regulation and oxidative damage in children with autism. J Autism Dev Disord 42: 367–377. doi: 10.1007/s10803-011-1260-7
    [22] Shattock P, Whiteley P (2002) Biochemical aspects in autism spectrum disorders: Updating the opioid-excess theory and presenting new opportunities for biomedical intervention. Expert Opin Ther Targets 6: 175–183. doi: 10.1517/14728222.6.2.175
    [23] Siniscalco D, Sapone A, Giordano C, et al. (2013) Cannabinoid receptor type 2, but not type 1, is up-regulated in peripheral blood mononuclear cells of children affected by autistic disorders. J Autism Dev Disord 43: 2686–2695. doi: 10.1007/s10803-013-1824-9
    [24] Siniscalco D, Bradstreet JJ, Cirillo A, et al. (2014) The in vitro GcMAF effects on endocannabinoid system transcriptionomics, receptor formation, and cell activity of autism-derived macrophages. J Neuroinflammation 11: 78. doi: 10.1186/1742-2094-11-78
    [25] Fiorentino M, Sapone A, Senger S, et al. (2016) Blood-brain barrier and intestinal epithelial barrier alterations in autism spectrum disorders. Mol Autism 7: 49. doi: 10.1186/s13229-016-0110-z
    [26] Rose DR, Yang H, Serena G, et al. (2018) Differential immune responses and microbiota profiles in children with autism spectrum disorders and co-morbid gastrointestinal symptoms. Brain Behav Immun 70: 354–368. doi: 10.1016/j.bbi.2018.03.025
    [27] Lionetti E, Leonardi S, Franzonello C, et al. (2015) Gluten psychosis: Confirmation of a new clinical entity. Nutrients 7: 5532–5539. doi: 10.3390/nu7075235
    [28] O'Mahony SM, Clarke G, Borre YE, et al. (2015) Serotonin, tryptophan metabolism and the brain-gut-microbiome axis. Behav Brain Res 277: 32–48. doi: 10.1016/j.bbr.2014.07.027
    [29] Theije CGMD, Koelink PJ, Korte-Bouws GA, et al. (2014) Intestinal inflammation in a murine model of autism spectrum disorders. Brain Behav Immun 37: 240–247. doi: 10.1016/j.bbi.2013.12.004
    [30] Baganz NL, Blakely RD (2013) A dialogue between the immune system and brain, spoken in the language of serotonin. ACS Chem Neurosci 4: 48–63. doi: 10.1021/cn300186b
    [31] Van Elst K, Bruining H, Birtoli B, et al. (2014) Food for thought: Dietary changes in essential fatty acid ratios and the increase in autism spectrum disorders. Neurosci Biobehav Rev 45: 369–738. doi: 10.1016/j.neubiorev.2014.07.004
    [32] Halliwell B (2006) Oxidative stress and neurodegeneration: Where are we now? J Neurochem 97: 1634–1658. doi: 10.1111/j.1471-4159.2006.03907.x
    [33] Cartocci V, Catallo M, Tempestilli M, et al. (2018) Altered brain cholesterol/isoprenoid metabolism in a rat model of autism spectrum disorders. Neuroscience 372: 27–37. doi: 10.1016/j.neuroscience.2017.12.053
    [34] Brigida AL, Schultz S, Cascone M, et al. (2017) Endocannabinod signal dysregulation in autism spectrum disorders: A correlation link between inflammatory state and neuro-immune alterations. Int J Mol Sci 18: 1425. doi: 10.3390/ijms18071425
    [35] Acharya N, Penukonda S, Shcheglova T, et al. (2017) Endocannabinoid system acts as a regulator of immune homeostasis in the gut. Proc Natl Acad Sci USA 114: 5005–5010. doi: 10.1073/pnas.1612177114
    [36] Gyires K, Zádori ZS (2016) Role of cannabinoids in gastrointestinal mucosal defense and inflammation. Curr Neuropharmacol 14: 935–951. doi: 10.2174/1570159X14666160303110150
    [37] Maríbauset S, Llopisgonzález A, Zazpe I, et al. (2016) Nutritional Impact of a Gluten-Free Casein-Free Diet in Children with Autism Spectrum Disorder. J Autism Dev Disord 46: 673–684. doi: 10.1007/s10803-015-2582-7
    [38] Iovene MR, Bombace F, Maresca R, et al. (2017) Intestinal Dysbiosis and Yeast Isolation in Stool of Subjects with Autism Spectrum Disorders. Mycopathologia 182: 349–363. doi: 10.1007/s11046-016-0068-6
    [39] Siniscalco D, Mijatovic T, Bosmans E, et al. (2016) Decreased Numbers of CD57 + CD3- Cells Identify Potential Innate Immune Differences in Patients with Autism Spectrum Disorder. Vivo 30: 83–89.
    [40] de Theije CG, Wopereis H, Ramadan M, et al. (2014) Altered gut microbiota and activity in a murine model of autism spectrum disorders. Brain Behav Immun 37: 197–206. doi: 10.1016/j.bbi.2013.12.005
    [41] Needham BD, Tang W, Wu WL (2018) Searching for the gut microbial contributing factors to social behavior in rodent models of autism spectrum disorder. Dev Neurobiol 78: 474–499. doi: 10.1002/dneu.22581
    [42] Fung TC, Olson CA, Hsiao EY (2017) Interactions between the microbiota, immune and nervous systems in health and disease. Nat Neurosci 20: 145–155. doi: 10.1038/nn.4476
    [43] Santocchi E, Guiducci L, Fulceri F, et al. (2016) Gut to brain interaction in Autism Spectrum Disorders: A randomized controlled trial on the role of probiotics on clinical, biochemical and neurophysiological parameters. BMC Psychiatry 16: 183. doi: 10.1186/s12888-016-0887-5
  • This article has been cited by:

    1. Tiantian Liu, Shu Gao, Xiumin Chu, Cong Lu, 2017, Storing and querying AIS data in HBase, 978-1-5090-6792-3, 88, 10.1109/ACIRS.2017.7986071
  • Reader Comments
  • © 2018 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(7836) PDF downloads(1932) Cited by(8)

Article outline

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog