A review of the postharvest characteristics and pre-packaging treatments of citrus fruit

Alaika Kassim; Tilahun S. Workneh; Mark D. Laing; Alaika Kassim; Tilahun S. Workneh; Mark D. Laing

doi:10.3934/agrfood.2020.3.337

AIMS Agriculture and Food

2020, Volume 5, Issue 3: 337-364. doi: 10.3934/agrfood.2020.3.337

Previous Article Next Article

Review Special Issues

A review of the postharvest characteristics and pre-packaging treatments of citrus fruit

1.
Bioresources Engineering, School of Engineering, College of Agriculture, Engineering and Science, University of KwaZulu-Natal, Private Bag X01, Pietermaritzburg, Scottsville, 3209, South Africa
2.
Discipline of Plant Pathology, School of Agricultural, Earth and Environmental Science, University of KwaZulu-Natal, Private Bag X01, Pietermaritzburg, Scottsville, 3209, South Africa

Received: 25 February 2020 Accepted: 08 July 2020 Published: 14 July 2020

Once harvested, fruit continue to respire, which is further exacerbated by elevated temperatures in the field and during transport to packhouses. This favors the proliferation of pathogens, which is detrimental to the postharvest fruit quality and, consequently, results in a decrease in the fruit shelf life. The aim of this review is to highlight the common citrus postharvest disorders and the various pre-packaging treatments that can be used to alleviate such disorders and promote fruit quality. Hot water, surface coatings, ultra-violet irradiation, chlorine (hypochlorous), salt treatments and microbial antagonists have been beneficial in maintaining the citrus quality and reducing the prevalence of postharvest decay. Environmentally friendly anolyte water has also proven to be a favourable postharvest treatment. Integrated treatments, such as hot water treatments and chlorine disinfection, have been successfully used in the global citrus industry. The use of integrated pre-packaging treatments improved the quality and shelf life of citrus, compared to individual treatments. An effective combination of pre-packaging treatments should include: (1) disinfectant; (2) curative and (3) preventive treatments to control pre- and postharvest pathogens.

Keywords:

Citation: Alaika Kassim, Tilahun S. Workneh, Mark D. Laing. A review of the postharvest characteristics and pre-packaging treatments of citrus fruit[J]. AIMS Agriculture and Food, 2020, 5(3): 337-364. doi: 10.3934/agrfood.2020.3.337

Related Papers:

[1]	Xing Tan, Giri Kumar Tayi . CERTONTO: TOWARDS AN ONTOLOGICAL REPRESENTATION OF FAIR TRADE CERTIFICATION STANDARDS. Big Data and Information Analytics, 2017, 2(3&4): 255-264. doi: 10.3934/bdia.2017022
[2]	Xiaoying Chen, Chong Zhang, Zonglin Shi, Weidong Xiao . Spatio-temporal Keywords Queries in HBase. Big Data and Information Analytics, 2016, 1(1): 81-91. doi: 10.3934/bdia.2016.1.81
[3]	Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang . A clustering based mate selection for evolutionary optimization. Big Data and Information Analytics, 2017, 2(1): 77-85. doi: 10.3934/bdia.2017010
[4]	Bill Huajian Yang . Modeling path-dependent state transitions by a recurrent neural network. Big Data and Information Analytics, 2022, 7(0): 1-12. doi: 10.3934/bdia.2022001
[5]	Pankaj Sharma, David Baglee, Jaime Campos, Erkki Jantunen . Big data collection and analysis for manufacturing organisations. Big Data and Information Analytics, 2017, 2(2): 127-139. doi: 10.3934/bdia.2017002
[6]	Yaguang Huangfu, Guanqing Liang, Jiannong Cao . MatrixMap: Programming abstraction and implementation of matrix computation for big data analytics. Big Data and Information Analytics, 2016, 1(4): 349-376. doi: 10.3934/bdia.2016015
[7]	Subrata Dasgupta . Disentangling data, information and knowledge. Big Data and Information Analytics, 2016, 1(4): 377-390. doi: 10.3934/bdia.2016016
[8]	Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002
[9]	Yiwen Tao, Zhenqiang Zhang, Bengbeng Wang, Jingli Ren . Motality prediction of ICU rheumatic heart disease with imbalanced data based on machine learning. Big Data and Information Analytics, 2024, 8(0): 43-64. doi: 10.3934/bdia.2024003
[10]	Guojun Gan, Kun Chen . A Soft Subspace Clustering Algorithm with Log-Transformed Distances. Big Data and Information Analytics, 2016, 1(1): 93-109. doi: 10.3934/bdia.2016.1.93

Abstract

1. Introduction

The term "NoSQL" was first coined in 1998 by Carlo Strozzi for his RDBMS, Strozzi NoSQL [37]. However, Strozzi used the term simply to distinguish his solution from other relational database management systems (RDBMS), which utilize SQL. He used the term NoSQL just for the reason that his database did not expose a SQL interface. Recently, the term NoSQL (meaning 'Not only SQL') has come to describe a larger class of databases, which do not have the same properties as traditional relational databases and are generally not queried with SQL.

Recently the term "NoSQL" has revived in a different context, generally known as the era of Big Data. Big Data is defined as a collection of data sets, which are enormously large and complex that conventional database systems cannot process within desired time [9]. For instance, storing and processing daily tweets at Twitter demand significant data storage, processing, and data analytics capabilities. Although conventional SQL-based databases have proven to be highly efficient, reliable, and consistent in terms of storing and processing structured (or relational) data, they fall short of processing Big Data, which is characterized by large volume, variety, velocity, openness, absence of structure, and high visualization demands among others [9]. Internet-born Companies like Google, Amazon and Facebook have invented their own data stores to cope with the big data that appear in their applications and have inspired other vendors and open source communities to do similarly for other use cases [16].

Figure 1 shows the the Big Data challenges and the corresponding features of NoSQL systems that try to address them. On one hand, in the domain of Big Data, we are obviously talking about very large data sets that should be available to large number of users at the same time (volume). There is also the need for fast data retrieval to enable real-time and critically efficient data analysis (velocity) and the data comes in a much greater variety of formats beyond structured and relational data. On the other hand, NoSQL data stores can accommodate the large volume of data and users by partitioning the data in many storage nodes and virtual structures, thus overcoming infrastructure constraints and ensuring basic availability. Additionally, NoSQL data stores relax the transactional properties of user queries by abandoning the ACID system for the BASE (Basic availability, Soft state, Eventual consistency) [32] system. This way there is less blocking between user queries in accessing particular data structures, a fact which is also supported by data partitioning. Finally, NoSQL data stores come in many flavors, namely data models, to accommodate the data variety that is present in real problems.

Figure 1. Big Data characteristics and NoSQL System features.

DownLoad: Full-Size Img PowerPoint

In this paper, we survey the domain of NoSQL in order to present an overview of the relevant technologies to store and process big data. We present these technologies according to their features and characteristics. For each category, we present a historical overview of how they came to be, we review their key features and competitive characteristics, we discuss some of their most popular implementations and, finally, we identify some future directions and trends for the evolution and advancement of NoSQL systems. In the second part of the paper, we demonstrate how to customize and leverage Yahoo! Cloud Serve Benchmark (YCSB) [12] for performing performance evaluation.

Our work aims to provide a comprehensive overview of state-of-the-art solutions for big data stores, but also to promote and motivate further research and advancement by concisely presenting, what both the academia and the industry have identified as important issues to be addressed in the future. Additionally, through the qualitative comparison and more importantly, providing the ability of quantitative comparison, the goal is to guide developers to select a solution fit for their needs. Finally, this paper contributes several extensions to YCSB, in terms of new drivers for specific implementations, updates for existing drivers and a new write-intensive workload. These extensions are made public with the goal to facilitate and further advance the benchmark and evaluation capabilities of YCSB for the rest of the research community.

The rest of the paper is organized as follows. In Section 2, we describe the common concepts, data models and terminologies in the NoSQL word. In Section 3, we first briefly introduce the classical classes of NoSQL solutions; then we elaborate each class along with top-rated sample datastores. In section 5 we demonstrate how to customize YCSB for new solutions or specific requirements for each user. Finally, in Section 6 we conclude the paper with a summary of new findings and observations of future trends in the area of NoSQL datastores.

2. NoSQL technologies

NoSQL systems range in functionality from the simplest distributed hashing, as supported by the popular Memcached [22,15], a distributed and open-source memory caching system, to highly scalable partitioned tables, as supported by Googles BigTable [11]. In fact, BigTable, Memcached, and Amazons DynamoDB [36] provided a "proof of concept" that inspired many of the data stores we describe here [12]. Memcached demonstrated that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes. DynamoDB pioneered the idea of eventual consistency as a way to achieve higher availability and scalability; data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually. BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to.

NoSQL systems generally have six key characteristics [10]:

1. the ability to horizontally scale CRUD operations throughput over many servers,

2. the ability to replicate and to distribute (i.e., partition or shard) data over many servers,

3. a simple call level interface or protocol (in contrast to a SQL binding),

4. a weaker concurrency model than the ACID transactions of most relational (SQL) database systems,

5. efficient use of distributed indexes and RAM for data storage, and

6. the ability to dynamically add new attributes to data records.

2.1. Transactional properties and performance

In order to guarantee the integrity of data, most of the classical database systems support transactions. This ensures consistency of data in all levels of data management. These transactional characteristics are also known as ACID (Atomicity, Consistency, Isolation, and Durability). However, scaling out of ACID-compliant systems has been shown to be impossible. This is due to conflicts that can arise between the different aspects of high availability in distributed systems, which are not fully solvable, known as the CAP theorem [28]:

● Strong Consistency implies that a system is in a consistent state after the execution of an operation. A Distributed system is typically considered to be consistent, if, after an update operation of some writer, all readers see the updates in some shared data source.

● Availability means that all clients can always find at least one copy of the requested data, even if some of the machines in a cluster are down,

● Partition-tolerance is understood as the ability of the system to continue operating normally in the presence of network partitions. These occur if two or more "islands" of network nodes arise which (temporarily or permanently) cannot connect to each other. Partition tolerance is also understood as the ability of a system to cope with the dynamic addition and removal of nodes (e. g. for maintenance purposes; removed and again added nodes are considered a network partition on their own in this notion).

The CAP-Theorem postulates that only two of the three different aspects of scaling out can be achieved fully at the same time (see Fig. 2).

Figure 2. Visualization of CAP theorem.

DownLoad: Full-Size Img PowerPoint

Many of the NOSQL databases have loosened up the requirements on Consistency in order to achieve better Availability and Partitioning. This resulted in systems known as BASE (Basically Available, Soft-state, Eventually consistent). These have no transactions in the classical sense and introduce constraints on the data model to enable better partition schemes. Cattell [10] classifies NoSQL databases according to the CAP theorem.

While it is always the case that NoSQL stores claim to be faster in terms of performance than RDBMS systems, especially when concerning big data, it should be noted that not all NoSQL datastores are created alike where performance is concerned. Users, e.g., system architects and IT managers, are wise to compare NoSQL databases in their own environments using data and user interactions that are representative of their expected production workloads before deciding which NoSQL database to use for new application [1]. However, using performance evaluation tools designed specifically for NoSQL systems could help to narrow the list dramatically.

In this paper, we aim to provide both theoretical and experimental comparison among popular NoSQL solutions in order to assist users for choosing the right technology that is appropriate for their specific needs. We will employ Yahoo Cloud Serve Benchmarking (YCSB) [12] for the performance evaluation of selected NoSQL stores.

3. Data store categories

There has been a number of approaches to classify NoSQL databases according to various criteria. In the context of our work, we adopt the classification based on the supported data model.

● Key-value stores. These are probably the simplest form of database management systems. They can only store pairs of keys and values, as well as retrieve values when a key is known. These simple systems are normally not adequate for complex applications. On the other hand, it is exactly this simplicity, that makes such systems attractive in certain circumstances. For example, resource-efficient key-value stores are often applied in embedded systems or as high performance in-process databases.

● Document stores. Document stores, also called document-oriented database systems, are characterized by their schema-free organization of data. Records (i.e., document) do not need to have a uniform structure, i.e. different records may have different columns. The types of the values of individual columns can be different for each record and columns can have more than one value (arrays). Records can have a nested structure and document stores often use internal notations, which can be processed directly in applications, mostly JSON.

● Graph databases. Graph DBMS, also called graph-oriented DBMS or graph databases, represent data in graph structures as nodes and edges, which represent relationships between nodes. They allow easy processing of data in that form, and simple calculation of specific properties of the graph, such as the number of steps needed to get from one node to another node.

● Wide Column Stores. Wide column stores, also called extensible record stores, store data in records with an ability to hold very large numbers of dynamic columns. Since the column names as well as the record keys are not fixed, and since a record can have billions of columns, wide column stores can be seen as two-dimensional key-value stores. Wide column stores share the characteristic of being schema-free with document stores, however the implementation is significantly different.

3.1. Key-value data stores

Key-value stores are the most common and simplest NoSQL data stores that store pairs of keys and values, where values are retrieved when keys are known. Key-value stores are mainly used when there is a need for higher performance than SQL databases, but not for the accompanying rigid relational data models and the complex query functionalities of SQL and other NoSQL databases. Fig 3 represents a typical object in such datastores; this object is for loop detector sensors embedded in Ontario highways that measure the speed, volume and length of cars periodically[42].

Figure 3. A Key value data model for traffic data.

DownLoad: Full-Size Img PowerPoint

Key-value stores can be categorized into three types in terms of storage option including temporary, permanent, and hybrid stores. In temporary stores, all the data are stored into memory, hence the access to data is fast. However, data will be lost if the system is down. Permanent key-value stores ensure high availability of data by storing the data on the hard disk but with the price of lower speed of I/O operations on the hard disk. Hybrid approaches combine the best of both temporary and permanent types by storing the data into memory and then writing the input to the hard disk when a set of specified conditions are met.

Berkeley DB [29] is one of the simplest examples of NoSQL key-value stores that provides fundamental functionalities of a key-value store and serves as a base for developing other advanced key-value stores such as Project Voldemort [14], used by LinkedIn and Amazon DynamoDB [36]. Berkeley DB provides ACID guarantees offered by SQL databases, while it provides a simple database with good performance and lightweight footprint.

Key-value stores are generally good solutions if you have a simple application with only one kind of object, and you only need to look up objects up based on one attribute. Table 1 shows system properties comparison among key-value stores that we cover in this section. It has been compiled form [20,44] and each of the candidates' website.

Table 1. System properties comparison among Memcached, Redis and Project Voldemort.

Name	Memcached	Redis	Voldemort
Description	key-value cache	a key-value data structure server	A key-value Database
Data storage	volatile memory	volatile memory and persistent	Database
Data type	string	data structures	JSON and Java objects
Query language	Memcached-protocol	RESTful API calls, memcached-protocol and lua	API calls
Initial Release	2003	2009	?
License	BSD License	BSD License	Apache License
Implementation language	C, Java	C	Java
Secondary index	no	no	no
Composite key	no	no	no
MapReduce	no	no	no
Replication mode	none	master-slave replication	symmetric replication
Sharding	yes	yes	yes
Consistency	yes	yes	yes
Atomicity	yes	no	yes
Full text search	no	no	no
Transaction concept	yes (ACID)	yes	conditional
Concurrency	yes	yes	yes
Durability	yes	yes	yes
Value size max	1 MB	512 MB	?

| Show Table

DownLoad: CSV

As opposed to classical classification that only considers data model for grouping NoSQL data stores, we incorporate other criteria to have more well-defined classes. We categorize a NoSQL solution as a key-value stores if it has all the following features:

● support only for simple data types (no notion of documents);

● search only based on keys (no composite keys);

● no support for full text search;

● no support for secondary index.

Due to mentioned characteristics, some data stores that used to be considered key-value stores, such as Riak or Oracle NoSQL, no longer fall into our key-value class; for example, the latest version of Riak is almost a document store with providing secondary indexes, composite keys, full text search and complex values [7].

3.1.1. Memcached

Memcached [15,22] is a high-performance, distributed memory object caching system, generic in nature, but originally intended for use in speeding up dynamic web applications by alleviating database load. Memcached has been improved to include features analogous to the other key-value stores: persistence, replication, high availability, dynamic growth, backup, and so on. Memcached clients can store and retrieve items from servers using keys. These keys can be any character strings. Typically, keys are MD5 sums or hashes of the objects being stored/fetched. The identification of the destination server is done at the client side using a hash function on the key. Therefore, the architecture is inherently scalable as there is no central server to consult while trying to locate values from keys [22]. Basically, Memcached consists of the following components:

● client software, which is given a list of available memcached servers,

● a client-based hashing algorithm, which chooses a server based on the "key" input,

● server software, which stores the values with their keys into an internal hash table, and

● server algorithms, which determine when to throw out old data (if out of memory), or reuse memory.

3.1.2. Redis

REmote DIctionary Server (Redis) is an in-memory database where data are stored on the memory for faster performance. In Redis, complex objects such as lists and sets can be associated with a key. In Redis, data have time-to-live (TTL) values that can be set, after which keys are removed from memory. Redis uses locking for atomic updates and performs asynchronous replications.

Persistence in Redis is achieved in two ways: one is called snapshotting which is a semi-persistent durability mode where the dataset is asynchronously transferred from memory to disk from time to time, written in RDB dump format. Since version 1.1 the safer alternative is an append-only file (AOF) that is written as operations modifying the dataset in memory are processed. Redis is able to rewrite the AOF in the background in order to avoid an indefinite growth of the AOF file.

By default, Redis syncs data to the disk at least every 2 seconds, with more or less robust options available if needed. In the case of a complete system failure on default settings, only a few seconds of data would be lost. In applications that do not need durability of data, Redis performs very well compared to writing the data into the disk for any changes in the data. Since version 2.8, lexicographically range queries are possible, assuming elements in a sorted set are all inserted with the same identical score. As the primary storage of Redis is memory, Redis might not be the right option for data-intensive applications with dominant read operations because the maximum Redis data set cant be bigger than memory [34].

Redis implements the Publish/Subscribe messaging paradigm where senders (publishers) are not programmed to send their messages to specific receivers (subscribers). Rather, published messages are characterized into channels, without knowledge of what (if any) subscribers there may be. Subscribers express interest in one or more channels, and only receive messages that are of interest, without knowledge of what (if any) publishers there are. This decoupling of publishers and subscribers can allow for greater scalability and a more dynamic network topology[34].

Redis supports consistent hashing, however only Redis Cluster, which has been available since April 2015, can fully leverage such partitioning. Consistent hashing implementation in Redis provides the ability to switch to other Redis instance if the preferred instance for a given key is not available. Similarly if we add a new instance, part of the new keys will start to be stored on the new instance[34].

3.1.3. Project voldemort

Voldemort is a distributed database, where tasks are only achieved through traditional CRUD requests such as GET, PUT and DELETE. Voldemort provides multi-version concurrency control (MVCC) for updates. It updates replicas asynchronously and, as a result, it does not guarantee consistent data. However, it can guarantee an up-to-date view, if you read a majority of replicas. Keys and values can be more complex objects such as JSON objects, maps and lists in addition to simple scalar values [25]. In terms of consistency, there is no guarantee that at given read time data is consistent among different node stores. However, it applies versioning using vector clocks and read-repair options. Voldemort is used by LinkedIn Inc. and it provides simple key-value store concentrating on industry-level performance and efficiently [10].

Voldemorts partitioning scheme relies on consistent hashing to distribute the load across multiple nodes. In consistent hashing, the output range of a hash function is treated as a fixed circular space or "ring" (i.e. the largest hash value wraps around to the smallest hash value). Instead of mapping a node to a single point in the ring, each node gets assigned to multiple points in the ring. To this end, Voldemort uses the concept of "virtual nodes". A virtual node looks like a single node in the system, but each node can be responsible for more than one virtual node. Project Voldemort allows namespaces for key-value pairs called "stores", in which keys are unique. Keys are associated with exactly one value and values are allowed to contain complex structures. Basic operations in Voldemort are atomic to exactly one key-value pair [16]. Serialization in Voldemort is pluggable so any format can be supported by implementing a Serializer class that handles the translation between bytes and objects.

3.1.4. Summary and future trend

Key-value stores support basic operations over relatively simple keys and values. They are only able to search based on key and do not support advanced searching capabilities such as compound keys, secondary indexes, or full text search. The main use case for key-value stores is providing a distributed cashing system with backup capabilities. They all provide scalability by distributing keys on different nodes using various hashing techniques. Voldemort uses MVCC and the others use locks for concurrency control.

Key-value stores tend to support complex data as their value and provide more searching capabilities. Riak and Oracle NoSQL are examples that have paved such a direction. They now support composite keys, secondary indexes, full-text search and handling complex values, similar to what document stores do. More specifically, three distinctive paths can be imagined for the near future of key-value stores: first, to become document stores that not only support simple key-values but also handle documents; second, adding graph data model to introduce themselves as multi-model NoSQL datastores (OrinetDB is an example in this regard); third to be more focused on in-memory capabilities to serve as a fully distributed pure key-value caches (e.g., Redis and Memcached).

3.2. Document stores

Inspired by Lotus Notes, document databases were, as their name implies, designed to manage and store documents. These documents are encoded in a standard data exchange format such as XML, JSON (Java Script Option Notation), YAML (YAML Ain't Markup Language), or BSON (Binary JSON). Unlike the simple key-value stores described above, the value column in document databases contains semi-structured data specifically attribute name/value pairs. A single column can house hundreds of such attributes, and the number and type of attributes recorded can vary from row to row (schema free). Also, unlike key-value stores, both keys and values are fully searchable in document databases. The also usually support complex keys and secondary indexes.

Also referred to as document-oriented database, a document store allows the inserting, retrieving, and manipulating of semi-structured data. The term "document store" may be confusing; while these systems could store documents in the traditional sense (articles, Microsoft Word files, etc.), a document in these systems can be any kind of "pointerless object". They store documents which allow values to be nested documents or lists as well as scalar values, and the attribute names are dynamically defined for each document at runtime. Unlike the key-value stores, these systems generally support secondary indexes and multiple types of documents (objects) per database. Most of the databases available under this category provide data access typically over HTTP protocol using RESTful API or over Apache Thrift protocol for cross-language interoperability. Like other NoSQL systems, the document stores do not provide ACID transactional properties [10,16,43].

Table 2 shows system properties comparison among document stores that we cover in this section. It has been generated form the materials in their official documentations and [20,44].

Table 2. System properties comparison among CouchDB, MangoDB, DynamoDB and RavenDB.

Name	CouchDB	MongoDB	RavenDB
Description	A document store inspired by Lotus Notes	One of the most popular document stores	.NET-based Document Store
Data storage	file system	volatile memory file system	Esent
Query language	JavaScript, REST, Erlang	API calls, Java Scripts, REST	API calls, Direct object access, HTTP, JSON, JavaScript, REST
Initial Release	2005	2009	2010
License	Apache V2	AGPL V3	AGPL V3
Database as service	no	no	no
Language	Erlang	C++	C#
Server OS	Android, BSD, Linux, OS X, Solaris, Windows	Linux, OS X, Solaris, Windows	Windows
Data type	JSON	JSON	JSON, BSON, BLOB
Secondary index	yes (via views)	yes	yes
APIs	RESTful HTTP/JSON API	proprietary protocol using JSON	.NET Client API, RESTful HTTP API
Partitioning method	sharding	sharding	sharding
Replication methods	master-master and master-slave replication	master-slave replication	master-master, master-slave, async and multi-source replication
MapReduce	yes	yes	yes
Integrity model	MVCC, BASE, ACID	BASE	ACID, BASE, eventually consistent
Consistency concept	eventual consistency	eventual and immediate consistency	eventual consistency
Transaction concept	no (atomic operations within a single doc)	no (atomic operations within a single doc)	yes
Concurrency	yes (optimistic locking)	yes	yes
Durability	yes	yes (optional)	yes
Value size max	4 GB	16 MB	2 GB

| Show Table

DownLoad: CSV

3.2.1. CouchDB

Apache CouchDB is a flexible, fault-tolerant database, which supports data formats such as JSON and AtomPub. CouchDB stores documents in "collections" that form a richer data model compared to its counterparts. Collections comprise the only schema in CouchDB, and secondary indexes must be explicitly created on fields in collections. A document has field values that can be scalar (text, numeric, or boolean) or compound (a document or list). Queries are done with what CouchDB calls "views", which are defined with Javascript to specify field constraints. View model is a method of aggregating and reporting on the documents in a database, and are built on-demand to aggregate, join and report on database documents. The indexes are B-trees, so the results of queries can be ordered or value ranges. Queries can be distributed in parallel over multiple nodes using a map-reduce mechanism. However, CouchDBs view mechanism puts more burden on programmers than a declarative query language [4].

The map function gets a document as a parameter, performs some calculation, and produces data based on the views criteria. The data structure produced by the map function is a triplet consisting of the document id, a key and a value result. After the map function has been executed, the results are passed to an optional reduce function to be aggregated on the view. As all documents of the database are processed by a views functions this can be time consuming and resource intensive for large databases. Therefore a view is not created and indexed when write operations occur but on demand and updated incrementally when it is requested again [16,4].

CouchDB provides asynchronous replication to achieve scalability and does not use sharding. The replication process operates incrementally where only modified data since the last replication gets transmitted to the next one. To provide durability, all updates on documents and indexes are flushed to disk on commit, by writing to the end of a file. Therefore, together with the MVCC mechanism, it is claimed that CouchDB provides ACID semantics at the document level. The single update operations are either executed to completion or fail/rollback so that the database never contains partly saved or updated documents. CouchDb does not guarantee consistency, since each client sees a self-consistent view of the database. All replicas are always writable and they do not replicate with each other by themselves[10,43].

3.2.2. MongoDB

MongoDB is a database between relational databases and nonrelational database. MongoDB is a GPL open source document store written in C++. Like CouchDB, it provides indexes on collections, it is lockless, and it provides a document query mechanism. However, there are key differences between the two: CouchDB provides MVCC on documents, while MongoDB provides atomic operations on fields; MongoDB supports dynamic queries with automatic use of indices, like RDBMSs; MongoDB supports automatic sharding, distributing load/data across "thousands of nodes" with automatic failover and load balancing, a feature inspired by Googles BigTable; while CouchDB achieves scalability through asynchronous replication, MongoDB achieves it through sharding (however an extension of CouchDB called CouchDB Lounge supports sharding). To shard a collection, MongoDB uses a shard key. A shard key is either an indexed field or an indexed compound field that exists in every document in the collection. MongoDB divides the shard key values into chunks and distributes the chunks evenly across the shards. To divide the shard key values into chunks, MongoDB uses either range based partitioning or hash based partitioning [27].

MongoDB stores data in a binary JSON-like format called BSON. BSON supports boolean, integer, float, date, string and binary types. Client drivers encode the local languages document data structure (usually a dictionary or associative array) into BSON and send it over a socket connection to the MongoDB server (in contrast to CouchDB, which sends JSON as text over an HTTP REST interface). MongoDB also supports a GridFS specification for large binary objects, e.g. images and videos. These are stored in chunks that can be streamed back to the client for efficient delivery.MongoDB supports master-slave replication with automatic failover and recovery.

A replica set is a group of mongod¹ instances that host the same data set. One mongod, the primary one, receives all write operations. All other mongod instances, secondary ones, apply operations from the primary so that they have the same data set. The primary accepts all write operations from clients. A replica set can have only one primary. In other words, replication (and recovery) is done at the shard level. Replication is asynchronous for higher performance, so some updates may be lost on a crash [27].

¹ mongod is the primary daemon process for the MongoDB system. It handles data requests, manages data access, and performs background management operations.

The key decision in designing data models for MongoDB applications revolves around the structure of documents and how the application represents relationships between data. There are two tools that allow applications to represent these relationships: references and embedded documents. References store the relationships between data by including links or references from one document to another. Broadly speaking, these are normalized data models (Figure 4).

Figure 4. Normalized data model design in MongoDB [27].

DownLoad: Full-Size Img PowerPoint

Embedded documents capture relationships between data by storing related data in a single document structure. MongoDB documents make it possible to embed document structures in a field or array within a document. These denormalized data models allow applications to retrieve and manipulate related data in a single database operation (Figure 5).

Figure 5. Denormalized data model design in MongoDB [27].

DownLoad: Full-Size Img PowerPoint

3.2.3. RavenDB

RavenDB is a transactional, open-source Document Database written in.NET. Data in RavenDB is stored in a schemaless manner as JSON documents, and can be queried efficiently using Linq queries from the.NET code or by REST requests using other tools. Internally, RavenDB makes use of indexes, which are automatically created based on the usage, or created explicitly by the consumer. RavenDB offers replication and sharding support. It is a database technology based on a client-server architecture; data is stored on a server instance and data requests from one or more clients are made to that instance.

In RavenDB every document has metadata attached to it. By default, this metadata only contains data that is used internally by RavenDB. With RavenDB, each document has its own unique global ID, in the sense that if one attempted to store two different entities under the same ID, the second write would overwrite the first one without any warning. The convention in RavenDB is to have a document ID that is a combination of the collection name and the entity's unique ID within the collection. This convention is enforced by default within RavenDB by pluralizing the name of the object class being saved, and adding an auto-incremented number to it. However, this document ID convention is not mandatory: document IDs are independent of the entity type, and therefore don't have to contain the name of the collection they are assigned to.

One of the design principles that RavenDB adheres to is the idea that documents are independent, meaning all data required to process a document is stored within the document itself. However, this doesn't mean that RavenDB does not support related documents. RavenDB offers three approaches to solve the relation problem namely, "denormalization", "includes" and hybrid approaches. Each scenario will need to use one or more of them. When applied correctly, they can drastically improve performance, reduce network bandwidth and speedup development. "Denormalization" and "includes" are corresponding to denormalized and reference approaches in MongoDB. The hybrid approach is the combination of the two by having a short version of the related document inside the master document and providing a reference to the long version of related document. There are no strict rules as to when to use which approach, but the general idea is to give it a lot of thought, and consider the implications each approach has [18].

RavenDB supports replication and to enable it, a built-in replication bundle should be activated, when creating a new database. On every transaction commit, RavenDB will look up the list of replication destinations. For each of the destinations, the replication bundle will query the remote instance for the last document that was replicated to that instance. Next, it starts sending batches of updates that occurred since the last replication. Replication happens in the background and in parallel. RavenDB contains built-in support for sharding. It will handle all aspects of sharding for users, and they only need to define the shard function (how to actually split the documents among multiple shards) [18].

3.2.4. Summary and future trend

To enhance the performance of common queries and updates, document stores usually have full support for secondary indexes. These indexes allow applications to store a view of a portion of the collection in an efficient data structure. Most indexes store an ordered representation of all values of a field or a group of fields. Indexes may also enforce uniqueness, store objects in a geospatial representation, and facilitate text search. Full text search is another capability that most of document stores support and adopt various technique to make the searching efficient and low cost.

Document stores already implemented some notion of reference documents in their data model. Recently some document stores such as OrientDB [30] and ArangoDB [5] matured this feature to support graph data model as well so that introduce themselves as multi-model stores. This trend is not specific to document stores and can be seen in other classes to mitigate a problem known as "polyglot persistence" in which different types of NoSQL solutions are leveraged to satisfy all business requirements in the data layer [24,9].

3.3. Graph databases

Graph databases are designed to fill the gap between graph data modeling requirements and the tabular abstraction in traditional data-bases. Graph databases employ Graph theory concepts like nodes and edges. Nodes are entities in the data domain and edges are the relationship between two entities. Nodes can have properties or attributes to describe them. Graph databases help us to implement graph processing requirements in the same query language level as we use for fetching graph data without the extra layer of abstraction for graph nodes and edges. This means less overhead for graph-related processing and more flexibility and performance. Performance and flexibility in such databases are becoming more important nowadays, especially with the emergence of new widely used applications such as social media or e-commerce software. Big companies can store and model users' interests and recommend more related advertisement to them. Additionally, graph databases make it easier to implement complex traversals in graph datasets. Figure 6 depicts a data example in a graph database.

Figure 6. A Graph data model.

DownLoad: Full-Size Img PowerPoint

Graph databases are growing fast in industry. Examples of relevant use cases include social networks analysis, authentication and authorization systems, e-com-merce recommender systems and fraud detection in financial services. Such demanding use cases can be implemented easier using graph databases. Based on online reports [2], the popularity of graph databases has increased six times since 2013. This report implies that by 2017, 25% of the enterprise solutions will use graph database solutions.

McColl et al. [26] have conducted a performance evaluation on multiple database systems but focused more on graph datasets with different characteristics or injected a graph-related experiment (i.e., PageRank) in each evaluation. Also, Jouili et al. [23] introduced their own bench-marking system (instead of using YSCB) and implemented popular workloads for graph datasets such as shortest path between nodes and neighborhood exploration experiment. In both of them, we observed a lack of implementation of enough general scenarios to let us compare implementations of graph data systems.

Table 3 shows the features comparison among graph databases that we cover in this section [21,44].

Table 3. System properties comparison among Neo4j, Titan and, OrientDB.

Name	Neo4j	OrientDB	Titan
Description	open source graph database	multi-model DBMS (Document, Graph, Key/Value)	a Graph DBMS optimized for distributed clusters.
Data Storage	disk Persistence, volatile memory	disk Persistence, volatile memory, remote	disk Persistence, volatile memory, remote
Data model	graph DBMS	document store, Graph DBMS and, Key-value store	graph DBMS
Query Language	Cypher query language, Java API, RESTful HTTP API	Java API, RESTful HTTP/JSON API, Tinkerpop	Java API, TinkerPop
Initial Release	2007	2010	2012
License	AGPL V3 (Commercial) / BSD (Community)	Apache V2	Apache V2
Language	Java	Java	Java
Secondary index	yes (via Apache Lucene)	yes	yes
Composite key	no	yes	yes
MapReduce	no	no (but achievable with distributed queries)	yes (via Faunus)
Replication methods	master-slave replication (commercial)	master-slave replication	symmetric replication
Sharding	no	yes	yes
Data schema	schema free	schema free	yes
SQL	no	no	yes
Consistency	eventual Consistency, immediate consistency	unknown	eventual Consistency, immediate Consistency
Atomicity	yes	yes	yes
Transaction concept	ACID	ACID	ACID
Concurrency	yes	yes	yes
Value size max	1 TB	2 GB	2 GB

| Show Table

DownLoad: CSV

3.3.1. Neo4j

Neo4j is a well-known project of a graph storage since 2007. It has various native APIs in most of programming languages such as Java, Go, PHP, and others. Neo4j has its own database file structure for storing files, which is flexible enough to be embedded inside Java applications. Neo4j is fully ACID compatible and schema-free.

Neo4j uses its own query language called Cypher [38] that is inspired by SQL, but it supports syntax related to graph nodes and edges. Cypher designed to be declarative language, which is more focused on the data to be retrieved rather than commands needed to be used for retrieval. This makes Cypher queries richer for graph and node model definitions.

Neo4j supports data replication in a master-slave fashion which ensures fault-tolerance against server failures. owever, it does not allow for data partitioning in multiple servers. This means the data size should be less than the capacity of the server. Nevertheless, Neo4J supports replication by which it replicates the whole data on another cluster node.

3.3.2. Titan

Titan is an open-source project with focus on serializing and compacting graph data structure, rich graph data modeling, and execution of graph queries in a more efficient way [6]. This database system implements modular interfaces for data persistence, data indexing, and client access. Titan can perform its storage over various NoSQL databases such as HBase, Cassandra and Oracle Berkeley database while Neo4j stores its data on its proprietary file structures. Titan's main feature against Neo4j is its scalability and real-time processing of data. Titan supports various secondary indexing systems from Lucence², Solr³ and ElasticSearch⁴.

² https://lucene.apache.org/core

³ http://lucene.apache.org/solr

⁴ https://www.elastic.co

Titan presents its abstraction layer on top of commonly used NoSQL database systems. Thus, Titan can inherit back-end storage systems capabilities by integrating with them. One good example can be data partitioning which is not implemented in Neo4j but it is ready to use in Titan, simply by using Cassandra or HBase. This project provides abstractions to work easily with a large volume of edges and vertices in big datasets.

Titan can be used in two ways: embedded or remote. In the first case, Titan is embedded inside the Java application as a library and can negotiate with the storage back-end form application. In the second case, Titan can be accessible with Tinkerpop stack API [3].

3.3.3. OrientDB

OrientDB is another open-source solution for storing various models of data structure. One of the supported models in this database system is Graph. This database system supports relationships and that is the key feature to enter to the domain of Graph database systems. This feature is implemented by document pointers. Document pointers are values inside JSON documents that point to other documents. In this way, pointers can operate like edges in graph data structures. Instead of providing JOIN functionality, OrientDB uses document pointers to connect documents with a linear cost for big data [30].

There are two special features in OrientDB: "dataset profiling", which is implemented using roles for database users, and an "indexing algorithm", which is inspired from B+ tree and Red-Black tree.

Like Neo4j, OrientDB supports RESTful access to data. In addition to REST, OrientDB supports SQL in the same manner that SQL is used in a relational database, while Neo4j uses the Cypher query language. OrientDB supports TinkerPop Blueprints API [3] to access graph data. This feature is also implemented in Titan. OrientDB also can be embedded inside Java application like Neo4j.

OrientDB uses horizontal partitioning for bigger databases stored on multiple servers (i.e., sharding). Each chunk of data, i.e. shard, contains smaller subsets of data, i.e. some rows. Besides sharding, OrientDB supports multi-master feature. Other solutions, like HBase, have one master server and some secondary nodes, which may create throughput bottlenecks on the master side. In multi-master configurations, the client can connect to any of the database cluster servers and synchronization is handled by OrientDB. Thus, the total throughput of the cluster is equal to the throughput of all database cluster servers instead of the single master server. Multi-master feature is unique in OrientDB among all graph solutions.

3.3.4. Summary and future trend

In summary, graph databases are trending upwards and are becoming more popular in the industry. One of the reasons for their popularity is that they can model some industrial problems in a more convenient way. We investigated three graph storage and processing solutions. Titan only provides abstraction layer for graph storage on top other key-value solutions like Cassandra. OrientDB and Neo4j have their backend storage. While OrientDB supports SQL, Neo4j has an exclusive query declaration language called Cypher. OrientDB was the only solution that supports the concept of a multi-master topology for maximizing storage throughput. All of the solutions support durability and ACID features and secondary indexes. Titan does not support any data partitioning feature, but Giraph and OreintDB support sharding.

One future trend for graph datastores is the provision of more graph-related functionality in their query languages. Such functionalities can be added in their query language like Cypher in Neo4j or by adding more functions to SQL language. Also another concern in graph databases to be addressed in future is their elasticity tolerance. Graph databases support more complex and resource-intensive queries than the other types of databases, so their scaling can cause more issues in queries.

3.4. Wide column stores

Although column-oriented data stores are defined as being non-relational, it can be argued that they are the equivalent of Big Data for relational databases, since they exhibit similarities on a conceptual level. In this sense, they have retained the notions of tables, rows and columns. This creates the notion of a schema, explicit from the client's perspective, but rather flexible from the data store's point of view, which enables the execution of queries on the data. Nevertheless, the underlying philosophy, architecture and implementation are quite different from traditional RDBMS. The basic difference is that the notion of tables is primarily for interacting with clients, while the storage, indexing and distribution of data is taken care by a file and a management system.

Almost all of the popular wide-column data store systems have inherited the data model proposed by Google for the BigTable system [11]. In BigTable, a table is defined as a "sparse, distributed, persistent multidimensional sorted map". A cell, a unit of data, is uniquely identified by its row key, its column key and its timestamp. The row keys are lexicographically sorted and consecutive keys are grouped into tablets, which dictates how data is stored in the distributed underlying file system, thus implementing and exploiting the locality property of the data to improve performance of reads and writes. Given this, the design of the row key is essential to enable efficient reads of short ranges of row keys (i.e., scan) without having to "wake up" a large number of machines. This way the row key defines the load balancing of the data. Transactional consistency in BigTable is guaranteed on a row level; every read and write on a single row is serializable and timestamped, thus allowing concurrent updates to the same row.

The data in each row is organized in column families. Each family is of the same data type, to allow for data compression, and may have an arbitrary number of columns. The number of column families is usually small to minimize the amount of shared metadata. Columns define the properties of access control and resource accounting. Columns may be added or deleted from column families without affecting the data. However, the removal of a column family will change the schema of the data. Since the model does not guarantee transactions across multiple rows, if a column family is to be deleted and there is corresponding data in multiple rows, then this data cannot be deleted automatically. Figure 7 represents a sample table in a wide-column datastore.

Figure 7. A Wide-Column data model for drivers/ticket information.

DownLoad: Full-Size Img PowerPoint

Under the hood, most wide-column datastores are based on a distributed file system. More specifically, BigTable is built on top of the GFS (Google File system). The data is persistently (and immutably) stored in files called SSTables. Updates are handled in an intermediate, cache-like structure called the MEMTABLE. Periodically the updates are pushed to the SSTables. When a read request is received, both the MEMTABLE and the sequence of SSTables are checked.

The infrastructure of a BigTable cluster contains a Chubby [8] server acting as master and Tablet servers acting as data nodes. Chubby is responsible among others for access control, maintain tablets and assign data to tablets. Tablet servers maintain a collection of tablets and a tablet maintains a sequence of SSTables. All servers (master and data) also contain metadata to address locality issues and update the topology (with the addition or removal of servers) and logs about the operations on the data. The user tables, the ones used by clients to access the data, correspond to a schema representation of the underlying tablets.

Wide-column stores are designed for data that spans billions of rows and millions of columns. Additionally, they support cells of arbitrarily large size. They are more efficient than traditional relational databases for applications that scan a few columns of many rows, because this needs to load significantly less data than reading the whole row. Moreover, rows can be of large size and their keys of high complexity, which makes storing to and retrieving data from a single machine possible. Finally, wide-column stores allow different access level for different columns.

In this paper, we review four independent implementations of BigTable, namely Apache HBase [41], Apache Cassandra [40], Apache Accumulo [39] and Hypertable [19], ranked as the most popular ones by DB-engines [20]. Table 4 compares the presented implementations based on some of their basic properties. It has been generated with information from DB-engines [20] and vsChart [44].

Table 4. System properties comparison among HBase, Cassandra, Accumulo and Hypertable.

Name	HBase	Cassandra	Accumulo	Hypertable
Description	based on Apache Hadoop and on concepts of BigTable	based on ideas of BigTable and DynamoDB	featuring cell level security and server-side computation	An open source BigTable implementation based on Hadoop
Data model	Column-oriented Key-value	Column-oriented Key-value	Column-oriented Key-value Schema-less	Column-oriented
Initial Release	2008	2008	2011	2009
License	Apache V2	Apache V2	Apache V2	BSD
Language	Java	Java	Scala	C++
Server OS	Linux, Unix, Windows (through an emulator)	BSD, Linux, OS X, Windows	Unix	Linux, OS X, Windows (unofficially)
Data schema	schema free	schema free	schema free	schema free
SQL	no	no	no	no
Secondary index	no	yes (for some queries)	yes	yes (for some queries)
APIs and other access methods	Java, REST, Thrift, XML	CQL (proprietary), Thrift	Java, REST, Thrift	C++, REST, Thrift
Partitioning method	sharding	sharding	sharding	sharding
Replication methods	Master-slave	Peer to Peer, Asynchronous, Multi-source	Multi-master	Master-slave
MapReduce	yes	yes	yes	yes
Integrity model	Log replication	BASE	MVCC	MVCC
Consistency concept	immediate consistency	event/imm consistency	immediate consistency	immediate consistency
Transaction concept	no (atomic for single row)	no (atomic for single row)	no (atomic for single row)	no (atomic for single row)
Concurrency	yes	yes	yes	yes
Durability	yes	yes	yes	yes
Value size	2 TB	2 GB	1 EB	unknown

| Show Table

DownLoad: CSV

3.4.1. Apache HBase

Apache HBase is the NoSQL wide-column store for Hadoop, the open-source implementation of MapReduce for Big Data analytics. The purpose of HBase is to support random, real-time read and write access to very large tables with billions of rows and millions of columns. Just like BigTable, HBase runs on top of HDFS (Hadoop Distributed File System) and is deployed on commodity hardware serving as the Master server, responsible for cataloging and load balancing, and RegionServers, acting as the data nodes. HBase provides the same level of transactional support as BigTable; consistency is guaranteed on the row level. Finally, just like BigTable is tailored to support MapReduce, HBase is built in a way to conveniently support Hadoop jobs.

3.4.2. Apache cassandra

Cassandra is designed to be a fault-tolerant, highly reliable and available data store. The premise is that failures may happen both in software and hardware and they are practically inevitable. For this reason, Cassandra's architecture allows for fast recovery to ensure high data availability. To achieve this, two entities are central; data centers and clusters. A data center is a collection of homogeneous nodes that hold a number of redundant replicas of the data for high availability. Nodes in data centers are always local, i.e. do not span multiple physical locations. Clusters, on the other hand, are collections of data centers and can span across multiple locations. The nodes in a cluster are connected in a peer-to-peer network and communication information among each other every second using a gossip protocol to mediate failures. Cassandra also employs the concepts of commit log and SSTables, in a similar manner as BigTable, to ensure ACID actions.

3.4.3. Apache accumulo

Accumulo shares the notion of the Master server, TabletServers and Tablets with BigTable and is also built on top of HDFS like HBase. Accumulo offers a number of improved features over the native BigTable implementation. It offers server-side programming capabilities to define functions that can happen at any stage of the data management process, including when reading or writing in the disk. Furthermore, unlike BigTable, Accumulo allows for access control on the level of individual cells and it shreds the requirement that a single row should fit in memory as it allows for very large cell values, practically of unlimited size. Additionally, using Zookeeper, Accumulo allows for the definition of multiple Master servers, so that it is tolerant against Master failures.

3.4.4. Hypertable

Hypertable inherits from BigTable the notions of Master and Range servers (equivalent to the RegionServers of HBase and data nodes of BigTable) and also the Hyperspace, which plays the role of the BigTable's Chubby server. However, unlike the other three aforementioned implementations, Hypertable possesses a unique feature; instead of being tied to a specific underlying file system, Hypertable abstracts its interface by directing all the requests for actions on the file system level through a broker process. By developing and contributing different brokers, Hypertable can be connected to any file system. Another unique feature is the ability of the RangeServers to allocate their memory depending on whether the workload is write- or read-heavy.

3.4.5. Summary and future trend

Wide column stores are tightly coupled with analytical frameworks, like MapReduce and Hadoop. Their design aims for high availability and high velocity in retrieving data to enable fast analytics. In addition, the flexible design of the data schema with the column families of variable length (i.e. different records may have different number of columns per column family) enables targeted retrieval of data for more efficient queries and analysis.

In their latest versions, the presented stores have worked towards implementing or improving the following key aspects of their features:

1. fast reads and high availability,

2. faster writes, and

3. improved security.

All three Apache-supported solutions (HBase⁵, , Accumulo⁶ and Cassandra) have implemented methods to improve data retrieval. More specifically, HBase is offering increased availability by replicating regions across different RegionServers⁷. In practice, a RegionServer holds the data of a region for reading and writing along with read-only replicas of other regions. This way if a RegionServer is busy, down or unusually slow, a client request may be served by another server holding a replica. The reads in this paradigm are timeline-consistent, implying that requests for newly added data are served by the primary RegionServers and replicas are updated periodically. Similarly, Accumulo offers data center replication with eventual consistency⁸. Cassandra offers rapid read protection⁹ through data replication across nodes and the concept of eager retries. In Cassandra, the nodes exchange states within the cluster through the gossip protocol, so the cluster knows if a node is down. When a node is being slow, the system would resend the user request to replica nodes, before the query times out.

⁵ https://blogs.apache.org/hbase/entry/start_of_a_new_era

⁶ http://hortonworks.com/blog/apache-accumulo-1-7-0-released/

⁷ https://issues.apache.org/jira/browse/HBASE-10070

⁸ https://issues.apache.org/jira/browse/ACCUMULO-378

⁹ http://digbigdata.com/apache-cassandra-2-features/

With respect to faster writes, HBase has implemented a multithreaded method for writing multiple WALs (write-ahead logs) on a given RegionServer, to increase write throughput¹⁰. In addition, HBase has adopted Accumulo's cell level access control to increase security¹¹. Finally, HBase¹² and Cassandra¹³ have proposed novel, hybrid compaction methods to reduce compaction time and improve read efficiency.

¹⁰ https://issues.apache.org/jira/browse/HBASE-8755

¹¹ https://issues.apache.org/jira/browse/HBASE-6222

¹² https://issues.apache.org/jira/browse/HBASE-7667

¹³ http://www.datastax.com/wp-content/uploads/2013/09/WP-DataStax-WhatsNewC2.0.pdf

4. How to select the right one

Out of the 3Vs of Big Data (i.e., Volume, Velocity and Variety), velocity and variety are the most requested ones for NoSQL solutions rather than volume. It is known that 90% of web based companies will likely never rise to the volume levels implied by NOSQL (please see Figure 8). When coming to the selection of a NoSQL solution, the good thing is that there are several types of Big Data databases to pick from. The bad news, however, is that there is no one type that does everything the best.

Figure 8. NoSQL solutions; complexity vs size.

DownLoad: Full-Size Img PowerPoint

Stefan Edlich [13] compiled six groups of aspects/questions for those who are about to leverage or migrate to a NoSQL solution. However, we describe the main elements in choosing the proper NoSQL datastore in three classes as follows:

● Data Model and access pattern. First of all, the data model is required to be identified. This cuts the target domain down to one of the classes that we described earlier in this paper. If the data is strongly relational and has some characteristics of big data then NewSQL solutions¹⁴ with horizontal scalability are the right solutions to choose which is out of the scope of this paper. Column oriented, document-like, graph, object, multi-values, JSON and BLOBS are among data types that can be handled with one of the four groups of NoSQL solutions that we investigated in this paper. Then some data-related features such as volume, complexity, schema flexibility, durability and data access pattern are taken into account to further narrow the candidates. The access pattern including distribution of read/write and random vs sequential access needs a careful consideration. Figure 8 shows the comparison between NoSQL solutions in terms of data complexity and size.

¹⁴ MySQL, Postgres, Percona provides distributed solutions for relational data.

● Query Requirements. Next, the required query capabilities should be identified; is SQL preferred or is LINQ¹⁵ a better fit? Is MapReduce appropriate or do we need higher level query or script languages? What is the significance of Ad-Hoc queries, secondary indexes, full text search, aggregations, views or column oriented functions in our business? For instance, get, put and delete functions are best supported by key-value systems. Aggregation becomes much easier while using column-oriented systems rather than the conventional row oriented databases. The former use tables but do not have joins. Mapping data from object-oriented software becomes easy using a Document oriented NoSQL database such as XML or JSON as they use structured document formats.

¹⁵ LINQ (Language Integrated Query) is a uniform query syntax in C# and VB.NET used to save and retrieve data from different sources. It is integrated in C# and VB.

● Non-functional properties. Large number of concerns and features fall into this category, chief among them is performance. Usually performance translates into latency (read, write, modify or insert) and throughput. Performance is a non-functional property that depends on other non-functional properties including, partitioning, replication (synchronous vs asynchronous), horizontal scalability, load balancing, auto-scalability, consistency technique (e.g., ACID, BASE or adjustable consistency), CAP trade-off (i.e., the ability to tune CA, CP and AP) and concurrency mechanisms (e.g., locks, MVCC, ACID or None). The way all these properties are designed and implemented in a solution has direct impact on the overall performance. Other important non-functional factors include elasticity, DB simplicity (i.e., installation, configuration, upgrade, maintenance and development), security (i.e., authentication, authorization, validity), license model, vendor reliability, community support and total cost of ownership (e.g., license cost, scaling cost, sysadmin cost, operational cost, safety, backup and restore costs, disaster management and monitoring cost). Implementation language helps to determine how fast a database will process. Typically NoSQL databases written in low level languages such as C/C++ and Erlang will be the fastest. On the other hand, those written in higher level languages such as Java make customization easier. We have compiled such non-functional capabilities for the selected solutions in this paper in Tables 1, 2, 3 and 4.

After all this systematic comparison and contrast, a real and independent performance evaluation is required to make sure the final candidate(s) fits our needs. Database vendors usually measure productivity of their solutions with custom hardware and software settings designed to demonstrate the advantages of their solutions. Experiments done by others might also not fulfill one's particular requirements; everyone has their own specific workload and access patterns. Therefore, any potential user is advised to conduct an independent and unbiased experiment to determine whether the final candidate is the real fit. We suggest users to leverage YCSB [12] that is the de facto standard for performance evaluation of NoSQL solutions. YCSB has the built-in implementation of a large number of solutions and can generate various types of workloads. In section 5, we demonstrate how to implement new interfaces for new or unimplemented solutions and also show how to modify the existing ones to work with new version of a solution. Moreover, we show how to customize workloads to imitate the real workload in the user business.

5. Performance evaluation

Benchmarking is widely used for evaluating computer systems, and benchmarks exist for a variety of levels of abstraction, from the CPU, to the database software, to complete enterprise systems. A few efforts in the area of big data benchmarks emerged, such as YCSB [12], PigMix [45], CALDA [31] and GraySort [17]. While some are focused on one or a subset of components and tasks typical for big data systems, others are based on specific map-reduce-style systems. Rabl et al. [33] presented "BigBench" an end-to-end big data benchmark that includes a data model, synthetic data generator and workload description. Among all, we choose YCSB to deal with performance evaluation of selected NoSQL datastores. YCSB has a plugin-based architecture and can be extended or customized easily.

As we mentioned in section 4, the approach in this paper is not to compare a subset of solutions under a pre-defined set of conditions; rather we describe how to leverage, customize and extend YCSB to do the performance analysis in your environment and under your specific workload. Figure 9 shows the high level architecture of YCSB.

Figure 9. YCSB Client Architecture [12].

DownLoad: Full-Size Img PowerPoint

Our own experience with selecting a NoSQL solution was in the context of our Connected Vehicles and Smart Transportation¹⁶ project [42]. For this project, we needed a NoSQL solution to act as a warehouse and host various types of traffic data. We were also in need of very efficient write operations under heavy load. Using a systematic approach, similar to what presented in this paper, HBase was selected as the first candidate. Then, we needed to examine the performance of HBase in our environment. So, we set up a cluster and tuned it according to our needs on SAVI cloud [35]. Next, we tried YCSB to examine the datastore, but the driver was not working properly. We modified the HBase driver to work with the specific version that we were interested, and then we introduced a heavy write workload¹⁷ based on the YCSB core workload to imitate our data ingestion and access pattern. Thanks to acceptable performance by HBase for our project, we selected it as the final solution for the warehouse.

¹⁶ http://cvst.ca

¹⁷ We refer to this workload as workloadg.

This experiment motivated us to do the research presented in this work and make it available for other users. Our edition of YCSB (i.e., ASRL-YCSB) contains all our development, fully commented, and is publicly available¹⁸.

¹⁸ ASRL-YCSB can be obtained from here: https://github.com/ceraslabs/ASRL-YCSB

We have examined other NoSQL datastores discussed in this paper including Redis, Voldemort, Memcached, MongoDB, CouchDB, RavenDB, OrientDB, Titan, Cassandra and Accumulo to verify the YCSB compatibility against the latest version of these datastores. These experiments have been done with minimum configuration and under the default deployment that makes the results disqualified for any comparison. Due to the popularity of Neo4J and the lack of its support in YCSB, we developed a new driver for Neo4J. ASRL-YCSB repository¹⁹ contains the tested drivers for latest versions²⁰ of the above-mentioned datastores.

¹⁹ https://github.com/ceraslabs/ASRL-YCSB

²⁰ The versions are the latest as of July 2015.

5.1. Benchmark configuration

We leverage SAVI cloud [35], which is an Open-Stack-based academic and experimental testbed in Canada, for conducting the performance evaluation. Table 5 shows the specification of machines that we used for the experiment.

Table 5. Virtual Machines (VM) specifications.

Name	Extra Large (Xlarge)	Large	Medium	Small
vCPU	8	4	2	1
Disk (GB)	160	80	40	20
RAM (GB)	16	8	4	2

| Show Table

DownLoad: CSV

Table 6 shows the specifications of our environment.

Table 6. Configuration parameters in performance evaluation.

Description	YCSB Client	HBase	Neo4J
VM flavor	xlarge	xlarge	All flavors
No. of VMs	1	4	1
Version	Yahoo YCSB 2010	1.0.0	2.2.3
OS	Ubuntu 14.04 64B	Ubuntu 14.04 64B	Ubuntu 14.04 64B

| Show Table

DownLoad: CSV

5.2. Results

In this section, we present the results that we obtained under our conditions for HBase. We also present the result of Neo4J to test the new developed driver under a highly stressed environment. Table 7 describes the workloads that we used for the performance evaluation. We added workload G (i.e., "workloadg") to examine HBase cluster under extensive insert and seldom reads. We used synthetic data provided by YCSB. The data size (i.e., each record) is 1 KB comprising of 10 fields, each of which 100 bytes and 24 bytes for the key. For each "read" operation, one record will be retrieved using a random key. The "insert" command is hashed and not ordered. The way the YCSB client does the "scan" is that it will pick a start key, and then request a number of records; this works fine even for hashed insertion. By default, the scan length size for each call is a uniformly random number between 1 and 100 records. Requests (i.e. operations) are distributed based on Zipfian distributions. By default, the YCSB client inserts 1000 records into datastore during the "load" phase and then will do 1000 operations against the datastore during the "run" phase. All these values can be adjusted according to the target environment. We set load and run size to one million records and operations respectively. The full specifications of our experiment can be found in "script" directory as shell scripts²¹.

Table 7. Workload specifications.

	Workloads
Operations	A	B	E	F	G
Read	0.5	0.95	0	0.5	0.05
Update	0.5	0.05	0	0	0
Scan	0	0	0.95	0	0
Insert	0	0	0.05	0	0.95
Read-Modify-Write	0	0	0	0.5	0

| Show Table

DownLoad: CSV

²¹ https://github.com/ceraslabs/ASRL-YCSB/tree/master/scripts

Figure 10 and 11 show the read and update latency (or response time) for workloads A, B and F under different target throughput.

Figure 10. Read latency vs target throughput for Neo4J.

DownLoad: Full-Size Img PowerPoint

Figure 11. Update latency vs target throughput for Neo4J.

DownLoad: Full-Size Img PowerPoint

Figure 12 shows the comparison of read-modify-write delay on small, medium, large and extra large machines. As it can be seen, after upgrading to the next larger machine, the delay decreases almost linearly with the factor of two, which is also the factor with which the individual resources of a VM (CPU, memory and disk) are multiplied by. For other operations, such as read, update and insert the improvement was almost equivalent.

Figure 12. Read-Modify-Write delays for four configurations; running Neo4J on small, medium, large and extra large machine.

DownLoad: Full-Size Img PowerPoint

Figures 13, 14, 15 and 16 show the results for HBase cluster. We ran the experiment using one YCSB client with 40 active threads and the client buffering was disabled. Figure 16 shows the actual throughput versus target throughput and as can be seen, the maximum throughput was obtained for workload A.

Figure 13. Read delay for HBase.

DownLoad: Full-Size Img PowerPoint

Figure 14. Update and Insert delays for HBase.

DownLoad: Full-Size Img PowerPoint

Figure 15. Scan and Read-Modify-Write delays for HBase.

DownLoad: Full-Size Img PowerPoint

Figure 16. Target throughput versus actual throughput for HBase.

DownLoad: Full-Size Img PowerPoint

6. Conclusion

Big data has been forcing businesses to leverage new types of datastores that are more performant, economical, reliable and scalable compared to traditional RDBMS solutions. Selecting the right datastore is not a trivial task due to diversity and lack of standard benchmarks in this domain. There exist research works and experiments to compare and contrast various solutions but none of them are truly generalizable and applicable for other interested parties. Most of those works have been done for pre-selected solutions under controlled conditions.

In this paper however, we provided a comprehensive comparison for major classes in NoSQL world in a systematic manner. For each class, we elaborated on the main functionalities and characteristics and then introduced 3 to 4 dominant solutions in that class. We introduced new criteria to redefine traditional classifications of NoSQLs in a more distinctive manner. Then, as the final step, we demonstrated the methodology for customizing and configuring YCSB to be repeatable for a different target environment; new drivers and one new workload have been implemented to materialize the tailoring process. This will help businesses to choose the right solution(s) for their specific data and environment with high confidence.

Acknowledgments

This research was supported by Fuseforward Solutions Group Ltd., the Natural Sciences and Engineering Council of Canada (NSERC), and the Ontario Research Fund for Research Excellence under the Connected Vehicles and Smart Transportation (CVST) project.

References

[1]	Blasco J, Aleixos N, Gomez-Sanchi J, et al. (2009) Recognition and classification of external skin damage in citrus fruits using multispectral data and morphological features. Biosyst Eng 103: 137-145. doi: 10.1016/j.biosystemseng.2009.03.009
[2]	Holmes GJ, Eckert JW (1999) Sensitivity of Penicillium digitatum and P. italicum to postharvest citrus fungicides in California. Phytopathology 89: 716-721.
[3]	Altieri G, Di Renzo GC, Genovese F, et al. (2013) A new method for the postharvest application of imazalil fungicide to citrus fruit. Biosyst Eng 115: 434-443. doi: 10.1016/j.biosystemseng.2013.04.008
[4]	Youssef K, Sanzani SM, Ligorio A, et al. (2014) Sodium carbonate and bicarbonate treatments induce resistance to postharvest green mould on citrus fruit. Postharvest Biol Tec 87: 61-69. doi: 10.1016/j.postharvbio.2013.08.006
[5]	Ben-Yehoshua S, Rodov V, D'hallewin G, et al. (2005) Elicitation of resistance against pathogens in citrus fruits by combined UV illumination and heat treatments; V International Postharvest Symposium, 2013-2020. Acta Hortic: 682.
[6]	Zhang J (2007) The potential of a new fungicide fludioxonil for stem-end rot and green mold control on Florida citrus fruit. Postharvest Biol Tec 46: 262-270. doi: 10.1016/j.postharvbio.2007.05.016
[7]	Sullivan GH, Davenport LR, Julian JW (1996) Precooling: Key factor for assuring quality in new fresh market vegetable crops. In: Janick, J, Progress in New Crops, ASHS Press, Virginia, USA, 521-524.
[8]	Brosnan T, Sun DW (2001) Precooling techniques and applications for horticultural products-a review. Int J Refrig 24: 154-170. doi: 10.1016/S0140-7007(00)00017-7
[9]	Porat R, Daus A, Weiss B, et al. (2000) Reduction of postharvest decay in organic citrus fruit by a short hot water brushing treatment. Postharvest Biol Tec 18: 151-157. doi: 10.1016/S0925-5214(99)00065-4
[10]	Njombolwana NS, Erasmus A, van Zyl JG, et al. (2013) Effects of citrus wax coating and brush type on imazalil residue loading, green mould control and fruit quality retention of sweet oranges. Postharvest Biol Tec 86: 362-371. doi: 10.1016/j.postharvbio.2013.07.017
[11]	Johnston JW, Banks NH (1998) Selection of a surface coating and optimization of its concentration for use on 'Hass' avocado (Persea americana Mill.) fruit. New Zeal J Crop Hort 26: 143-151. doi: 10.1080/01140671.1998.9514051
[12]	Workneh TS, Osthoff G, Pretorius JC, et al. (2003) Comparison of anolyte and chlorinated water as a disinfecting dipping treatment for stored carrots. J Food Quality 26: 463-474. doi: 10.1111/j.1745-4557.2003.tb00261.x
[13]	Beghin S (2014a) Personal communication. Premier Fruit Exports (Pty) Ltd, Durban, Republic of South Africa, 1 April 2014.
[14]	Droby S, Wisniewski M, Macarisin D, et al. (2009) Twenty years of postharvest biocontrol research: Is it time for a new paradigm? Postharvest Biol Tec 52: 137-145. doi: 10.1016/j.postharvbio.2008.11.009
[15]	Abraham AO, Laing MD, Bower JP (2010) Isolation and in vivo screening of yeast and Bacillus antagonists for the control of Penicillium digitatum of citrus fruit. Biol Control 53: 32-38. doi: 10.1016/j.biocontrol.2009.12.009
[16]	Whangchai K, Saengnil K, Singkamanee C, et al. (2010) Effect of electrolyzed oxidizing water and continuous ozone exposure on the control of Penicillium digitatum on tangerine cv. 'Sai Nam Pung' during storage. Crop Prot 29: 386-389.
[17]	Korf HJG, Schutte GC, Kotze JM (2001) Effect of packhouse procedures on the viability of Phyllosticta citricarpa, anamorph of the citrus black spot pathogen. African Plant Protection 7: 103-109.
[18]	Obagwu J, Korsten L (2003) Integrated control of citrus green and blue molds using Bacillus subtilis in combination with sodium bicarbonate or hot water. Postharvest Biol Tec 28: 187-194. doi: 10.1016/S0925-5214(02)00145-X
[19]	Hong P, Hao W, Luo J, et al. (2014) Combination of hot water, Bacillus amyloliquefaciens HF-01 and sodium bicarbonate treatments to control postharvest decay of mandarin fruit. Postharvest Biol Tec 88: 96-102. doi: 10.1016/j.postharvbio.2013.10.004
[20]	Moscoso-Ramirez PA, Palou L (2014) Preventive and curative activity of postharvest potassium silicate treatments to control green and blue molds on orange fruit. Eur J Plant Pathol 138: 721-732. doi: 10.1007/s10658-013-0345-x
[21]	Kassim A, Workneh TS, Laing MD, et al. (2016) The effects of different pre-packaging treatments on the quality of kumquat fruit. CyTA-J Food 14: 639-648. doi: 10.1080/19476337.2016.1190407
[22]	Porat R, Weiss B, Cohen L, et al. (2004) Reduction of postharvest rind disorders in citrus fruit by modified atmosphere packaging. Postharvest Biol Tec 33: 35-43. doi: 10.1016/j.postharvbio.2004.01.010
[23]	Ladaniya MS (2008c) Citrus Fruit: Biology, Technology and Evaluation. Elsevier, London, United, Kingdom.
[24]	Li Z, Zhong H, Peng X, et al. (2008) Effect of chitosan and CaCl₂ on senescence and membrane lipid peroxidation of postharvest kumquat fruits. Acta Hortic 769_37: 259-264.
[25]	Grierson W, Ben-Yehoshua S (1986) Storage of citrus fruits. In: Wardowski WF, Nagy S, Grierson W, Fresh Citrus Fruit, AVI Publishing Co., Connecticut, USA, Ch. 20, 479-507.
[26]	Kader AA (1999) Fruit maturity, ripening and quality relationships. In: Michalczuk L, Proceedings of the International Symposium Effect of Pre- and Postharvest factors in Fruit Storage, Acta Hortic, 203-208.
[27]	D'hallewin G, Schirra M, Manueddu E, et al. (1999) Scoparone and scopoletin accumulation and ultraviolet-c induced resistance to postharvest decay in oranges as influenced by harvest date. J Am Soc Hortic Sci 124: 702-707. doi: 10.21273/JASHS.124.6.702
[28]	McGuire RG, Reeder WF (1992) Predicting market quality of grapefruit after hot-air quarantine treatment. J Am Soc Hortic Sci 117: 90-95. doi: 10.21273/JASHS.117.1.90
[29]	Houck LG, Jenner JF, Mackey BE (1990) Seasonal variability of the response of desert lemons to rind injury and decay caused by quarantine cold treatments. J Hortic Sci 65: 611-617. doi: 10.1080/00221589.1990.11516100
[30]	Schirra M, Agabbio M, D'hallewin G, et al. (1997) Response of tarocco oranges to picking date, postharvest hot water dips, and chilling storage temperature. J Agric Food Chem 45: 3216-3220. doi: 10.1021/jf970273m
[31]	Schueller JK, Whitney JD, Wheaton TA, et al. (1999) Low-cost automatic yield mapping in hand-harvested citrus. Comput Electron Agr 23: 145-153. doi: 10.1016/S0168-1699(99)00028-9
[32]	Sanders KF (2005) Orange harvesting systems review. Biosyst Eng 90: 115-125. doi: 10.1016/j.biosystemseng.2004.10.006
[33]	Jimenez AR, Ceres R, Pons JL (2000) A survey of computer vision methods for locating fruit on trees. Trans ASAE 43: 1911-1920. doi: 10.13031/2013.3096
[34]	Beghin S (2014b) Personal communication. Premier Fruit Exports (Pty) Ltd, Durban, Republic of South Africa, 8 May 2014.
[35]	Berger CN, Sodha SV, Shaw RK, et al. (2010) Fresh fruit and vegetables as vehicles for the transmission of human pathogens. Environ Microbiol 12: 2385-2397. doi: 10.1111/j.1462-2920.2010.02297.x
[36]	Fallik E (2004) Prestorage hot water treatments (immersion, rinsing and brushing). Postharvest Biol Tec 32: 125-134. doi: 10.1016/j.postharvbio.2003.10.005
[37]	Boyette MD, Ritchie DF, Carballo SJ, et al. (1993) Chlorination and postharvest disease control. HortTechnology 3: 395-400. doi: 10.21273/HORTTECH.3.4.395
[38]	Droby S, Cohen L, Daus A, et al. (1998) Commercial testing of Aspire: a yeast preparation for the biological control of postharvest decay of citrus. Biol Control 12: 97-101. doi: 10.1006/bcon.1998.0615
[39]	Schirra M, Angioni A, Cabras P, et al. (2011) Effects of postharvest hot water and hot air treatments on storage decay and quality traits of kumquat (Fortunella japonica Lour. Swingle, cv. Ovale) fruit. J Agric Sci Tech 13: 89-94.
[40]	Gomez-Sanchis J, Martin-Guerrero JD, Soria-Olivas E, et al. (2012) Detecting rottenness caused by Penicillium genus fungi in citrus fruits using machine learning techniques. Expert Syst Appl 39: 780-785. doi: 10.1016/j.eswa.2011.07.073
[41]	Chalutz E, Lomenic E, Waks J (1989) Physiological and pathological observations on the postharvest behavior of kumquat fruit. Trop Sci 29: 199-206.
[42]	Mokomele P (2013) Reports of a Ban of Exports of Fresh Citrus Fruit to the European Union due to Citrus Black Spot, Department of Agriculture, Forestry and Fisheries, Media Release, Republic of South Africa. Available from: http://www.gov.za/speeches/view.php?sid=42268.
[43]	Yonowa T, Hattingh V, de Villiers M (2013) CLIMEX modelling of the potential global distribution of the citrus black spot disease caused by Guignardia citricarpa and the risk posed to Europe. Crop Prot 44: 18-28.
[44]	Cooke T, Persley D, House S (2009) Diseases of Fruit Crops in Australia. CSIRO Publishing, Collingwood, Australia.
[45]	Mercier J, Smilanick JL (2005) Control of green mold and sour rot of stored lemon by biofumigation with Muscodor albus. Biol Control 32: 401-407. doi: 10.1016/j.biocontrol.2004.12.002
[46]	Talibi I, Askarne L, Boubaker H, et al. (2012) Antifungal activity of some Moroccan plants against Geotrichum candidum, the causal agent of postharvest citrus sour rot. Crop Prot 35: 41-46. doi: 10.1016/j.cropro.2011.12.016
[47]	Grierson W (1986) Physiological disorders. In: Wardowski WF, Nagy S, Grierson W, Fresh Citrus Fruit, AVI Publishing Co., Connecticut, USA, Ch. 14, 361-378.
[48]	Wardowski WF (1988b) Inherited abnormalities and weaknesses. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 64.
[49]	Ritenour MA, Dou H, Bowman KM, et al. (2004) Effect of rootstock on stem-end rind breakdown and decay of fresh citrus. HortTechnology 14: 315-319. doi: 10.21273/HORTTECH.14.3.0315
[50]	Stall RE (1988) Infectious (biotic) diseases. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 6.
[51]	Khalaf A, Moore GA, Jones JB, et al. (2007) New insights into the resistance of Nagami kumquat to canker disease. Physiol Mol Plant P 71: 240-250. doi: 10.1016/j.pmpp.2008.03.001
[52]	Kotze JM (1988) Fungal diseases in nurseries and orchards. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 6.
[53]	Bonants PJM, Carroll GC, de Weerdt M, et al. (2003) Development and validation of a fast PCR-based detection method for pathogenic isolates of the citrus black spot fungus, Guignardia citricarpa. Eur J Plant Pathol 109: 503-513. doi: 10.1023/A:1024219629669
[54]	Brown GE, Eckert JW (1988a) Postharvest fungal diseases. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 32.
[55]	Palou L, Usall J, Munoz JA, et al. (2002) Hot water, sodium carbonate, and sodium bicarbonate for the control of postharvest green and blue molds of clementine mandarins. Postharvest Biol Tec 24: 93-96. doi: 10.1016/S0925-5214(01)00178-8
[56]	Venditti T, Molinu MG, Dore A, et al. (2005) Sodium carbonate treatment induces scoparone accumulation, structural changes, and alkalinization in the albedo of wounded citrus fruits. J Agric Food Chem 53: 3510-3518. doi: 10.1021/jf0482008
[57]	Brown GE, Eckert JW (1988b) Postharvest fungal diseases. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 35-36.
[58]	Smilanick JL, Mackey BE, Reese R, et al. (1997) Influence of concentration of soda ash, temperature, and immersion period on the control of postharvest green mold of oranges. Plant Dis 81: 379-382. doi: 10.1094/PDIS.1997.81.4.379
[59]	Smilanick JL, Margosan DA, Mlikota F, et al. (1999) Control of citrus green mold by carbonate and bicarbonate salts and the influence of commercial postharvest practices on their efficacy. Plant Dis 83: 139-145. doi: 10.1094/PDIS.1999.83.2.139
[60]	Pavoncello D, Lurie S, Droby S, et al. (2001) A hot water treatment induces resistance to Penicillium digitatum and promotes the accumulation of heat shock and pathogenesis-related proteins in grapefruit flavedo. Physiol Plantarum 111: 17-22. doi: 10.1034/j.1399-3054.2001.1110103.x
[61]	Smilanick JL, Mansour MF, Margosan DA, et al. (2005) Influence of pH and NaHCO₃ on effectiveness of imazalil to inhibit germination of Penicillium digitatum and to control postharvest green mold on citrus fruit. Plant Dis 89: 640-648. doi: 10.1094/PD-89-0640
[62]	Brown GE (1986) Diplodia stem-end rot, a decay of citrus fruit increased by ethylene degreening treatment and its control. P Fl St Hortic Soc 99: 105-108.
[63]	Brown GE, Eckert JW (1988c) Postharvest fungal diseases. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 33-34.
[64]	Brown GE, Lee HS (1993) Interactions of ethylene with citrus stem-end rot caused by Diplodia natalensis. Phytopathology 83: 1204-1208. doi: 10.1094/Phyto-83-1204
[65]	Zhang J, Swingle PP (2005) Effects of curing on green mold and stem-end rot of citrus fruit and its potential application under Florida packing system. Plant Dis 89: 834-840. doi: 10.1094/PD-89-0834
[66]	Wardowski WF (1988a) Inherited abnormalities and weaknesses. In: Whiteside JO, Garnsey SM, Timmer LW, Compendium of Citrus Diseases, The American Phytopathological Society, Minnesota, USA, Part I, 63-64.
[67]	Sapitnitskaya M, Maul P, McCollum GT, et al. (2006) Postharvest heat and conditioning treatments activate different molecular responses and reduce chilling injuries in grapefruit. J Exp Bot 57: 2943-2953. doi: 10.1093/jxb/erl055
[68]	Olmo M, Nadas A, García JM (2000) Nondestructive methods to evaluate maturity level of oranges. J Food Sci 65: 365-369. doi: 10.1111/j.1365-2621.2000.tb16008.x
[69]	Singh KK, Reddy BS (2006) Post-harvest physico-mechanical properties of orange peel and fruit. J Food Eng 73: 112-120. doi: 10.1016/j.jfoodeng.2005.01.010
[70]	Pathare PB, Opara UL, Al-Said FA (2013) Colour measurement and analysis in fresh and processed foods: a review. Food Bioprocess Tech 6: 36-60. doi: 10.1007/s11947-012-0867-9
[71]	Ortiz JM (c2002) Botony: Taxonomy, morphology and physiology of fruits, leaves and flowers. In: Dugo G, Di Giacomo A, Citrus: The Genus Citrus, Taylor and Francis, London, Ch. 2, 16-35.
[72]	Iglesias DJ, Cercos M, Colmenero-Flores JM, et al. (2007) Physiology of citrus fruiting. Braz J Plant Physiol 19: 333-362. doi: 10.1590/S1677-04202007000400006
[73]	Stewart I, Wheaton TA (1971) Effects of ethylene and temperature on carotenoid pigmentation of citrus peel. P Fl St Hortic Soc 84: 264-266.
[74]	Rodrigo MJ, Zacarias L (2007) Effect of postharvest ethylene treatment on carotenoid accumulation and the expression of carotenoid biosynthetic genes in the flavedo of orange (Citrus sinensis L. Osbeck) fruit. Postharvest Biol Tec 43: 14-22. doi: 10.1016/j.postharvbio.2006.07.008
[75]	Rodov V, Agar T, Peretz J, et al. (2000) Effect of combined application of heat treatments and plastic packaging on keeping quality of 'Oroblanco' fruit (Citrus grandis L.C. paradisi Macf.). Postharvest Biol Tec 20: 287-294. doi: 10.1016/S0925-5214(00)00129-0
[76]	Smilanick JL, Mansour MF, Sorenson D (2006) Pre-and postharvest treatments to control green mold of citrus fruit during ethylene degreening. Plant Dis 90: 89-96. doi: 10.1094/PD-90-0089
[77]	Porat R, Weiss B, Cohen L, et al. (1999) Effects of ethylene and 1-methylcyclopropene on the postharvest qualities of 'Shamouti'oranges. Postharvest Biol Tec 15: 155-163. doi: 10.1016/S0925-5214(98)00079-9
[78]	Chien P, Sheu F, Lin H (2007) Coating citrus (Murcott tangor) fruit with low molecular weight chitosan increases postharvest quality and shelf life. Food Chem 100: 1160-1164. doi: 10.1016/j.foodchem.2005.10.068
[79]	Ghanema N, Mihoubib D, Kechaoua N, et al. (2012) Microwave dehydration of three citrus peel cultivars: effect on water and oil retention capacities, color, shrinkage and total phenols content. Ind Crop Prod 40: 167-177. doi: 10.1016/j.indcrop.2012.03.009
[80]	Purvis AC (1983) Moisture loss and juice quality from waxed and individually seal-packaged citrus fruits. P Fl St Hortic Soc 96: 327-329.
[81]	D'hallewin G, Arras G, Castia T, et al. (1994) Reducing decay of Avana mandarin fruit by the use of UV, heat and thiabendazole treatments. International Symposium on Postharvest Treatment of Horticultural Crops, Acta Hortic 368: 387-394.
[82]	Hall DJ (1981) Innovations in citrus waxing-an overview. P Fl St Hortic Soc 94: 258-263.
[83]	Hagenmaier RD, Baker RA (1994) Wax microemulsions and emulsions as citrus coatings. J Agric Food Chem 42: 899-902. doi: 10.1021/jf00040a012
[84]	Cohen E, Shalom Y, Rosenberger I (1990) Postharvest ethanol buildup and off-flavor in 'Murcott' tangerine fruits. J Am Soc Hortic Sci 115: 775-778. doi: 10.21273/JASHS.115.5.775
[85]	Ben-Yehoshua S, Burg SP, Young R (1985) Resistance of citrus fruit to mass transport of water vapor and other gases. Plant Physiol 79: 1048-1053. doi: 10.1104/pp.79.4.1048
[86]	Rodov V, Ben-Yehoshua S, Albagli R, et al. (1995) Reducing chilling injury and decay of stored citrus fruit by hot water dips. Postharvest Biol Tec 5: 119-127. doi: 10.1016/0925-5214(94)00011-G
[87]	Citrus Growers' Association (2013) Citrus Growers' Association of South Africa Annual Report 2013. Citrus Growers' Association of South Africa, Durban, Republic of South Africa.
[88]	Abbott JA (1999) Quality measurement of fruits and vegetables. Postharvest Biol Tec 15: 207-225. doi: 10.1016/S0925-5214(98)00086-6
[89]	Petrisor C, Lucian-Radu G, Balan V, et al. (2010) Rapid and non-destructive analytical techniques for measurement of apricot quality. Rom Biotech Lett 15: 5213-5216.
[90]	Sadka A, Artzi B, Cohen L, et al. (2000) Arsenite reduces acid content in citrus fruit, inhibits activity of citrate synthase but induces its gene expression. J Am Soc Hortic Sci 153: 288-293.
[91]	Albertini M, Carcouet E, Pailly O, et al. (2006) Changes in organic acids and sugars during early stages of development of acidic and acidless citrus fruit. J Agric Food Chem 54: 8335-8339. doi: 10.1021/jf061648j
[92]	Lobit P, Soing P, Genard M, et al. (2002) Theoretical analysis of relationships between composition, pH, and titratable acidity of peach fruit. J Plant Nutr 25: 2775-2792. doi: 10.1081/PLN-120015538
[93]	Hong SI, Lee HH, Kim D (2007) Effects of hot water treatment on the storage stability of Satsuma mandarin as a postharvest decay control. Postharvest Biol Tec 43: 271-279. doi: 10.1016/j.postharvbio.2006.09.008
[94]	Baldwin EA, Nisperos-Carriedo M, Shaw PE, et al. (1995) Effect of coatings and prolonged storage conditions on fresh orange flavor volatiles, degrees brix, and ascorbic acid levels. J Agric Food Chem 43: 1321-1331. doi: 10.1021/jf00053a037
[95]	Rocha AMCN, Brochado CM, Kirby R, et al. (1995) Shelf-life of chilled cut orange determined by sensory quality. Food Control 6: 317-322. doi: 10.1016/0956-7135(95)00019-4
[96]	Lado J, Rodrigo MJ, Zacarías L (2014) Maturity indicators and citrus fruit quality. Stewart Postharvest Review 10: 1-6.
[97]	Palou L, Smilanick JL, Droby S (2008) Alternatives to conventional fungicides for the control of citrus postharvest green and blue moulds. Stewart Postharvest Rev 4: 1-16.
[98]	Schutte GC, Beeton KV, Kotze JM (1997) Rind stippling on Valencia oranges by copper fungicides used for control of citrus black spot in South Africa. Plant Dis 81: 851-854. doi: 10.1094/PDIS.1997.81.8.851
[99]	Agostini J, Peres NA, Mackenzie SJ, et al. (2006) Effect of fungicides and storage conditions on postharvest development of citrus black spot and survival of Guignardia citricarpa in fruit tissues. Plant Dis 90: 1419-1424. doi: 10.1094/PD-90-1419
[100]	Seberry JA, Leggom D, Kiely TB (1967) Effect of skin coatings on the development of black spot in stored Valencia oranges. Anim Prod Sci 7: 593-600. doi: 10.1071/EA9670593
[101]	Wu CT, Roan SF, Hsiung TC, et al. (2011) Effect of harvest maturity and heat pretreatment on the quality of low temperature storage avocados in Taiwan. J Fac Agr Kyushu U 56: 255-262.
[102]	Kassim A, Workneh TS, Bezuidenhout CN (2013) A review on postharvest handling of avocado fruit. Afr J Agr Res 8: 2385-2402.
[103]	Fallik E, Grinberg S, Alkalai S, et al. (1999) A unique rapid hot water treatment to improve storage quality of sweet pepper. Postharvest Biol Tec 15: 25-32. doi: 10.1016/S0925-5214(98)00066-0
[104]	Gonzalez-Aguilar GA, Gayosso L, Cruz R, et al. (2000) Polyamines induced by hot water treatments reduce chilling injury and decay in pepper fruit. Postharvest Biol Tec 18: 19-26. doi: 10.1016/S0925-5214(99)00054-X
[105]	Ben-Yehoshua S, Peretz J, Rodov V, et al. (2000) Postharvest application of hot water treatment in citrus fruits: The road from the laboratory to the packing-house; XXV International Horticultural Congress; Part 8: Quality of Horticultural Products. Acta Hortic, 19-28.
[106]	Schirra M, Palma A, Aquino S, et al. (2008) Influence of postharvest hot water treatment on nutritional and functional properties of kumquat (Fortunella japonica Lour. Swingle Cv. Ovale) fruit. J Agric Food Chem 56: 455-460. doi: 10.1021/jf0714160
[107]	Schirra M, D'hallewin G, Ben-Yehoshua S, et al. (2000) Host-pathogen interactions modulated by heat treatment. Postharvest Biol Tec 21: 71-85. doi: 10.1016/S0925-5214(00)00166-6
[108]	Irtwange S (2006) Hot water treatment: A non-chemical alternative in keeping quality during postharvest handling of citrus fruits. Agric Eng Int 8:1-10.
[109]	Kim JJ, Ben-Yehoshua S, Shapiro B, et al. (1991) Accumulation of scoparone in heat-treated lemon fruit inoculated with Penicillium digitatum Sacc. Plant Physiol 97: 880-885. doi: 10.1104/pp.97.3.880
[110]	Ben-Yehoshua S, Rodov V, Kim JJ, et al. (1992) Preformed and induced antifungal materials of citrus fruits in relation to the enhancement of decay resistance by heat and ultraviolet treatments. J Agric Food Chem 40: 1217-1221. doi: 10.1021/jf00019a029
[111]	Schirra M, D'Aquino S, Continella G, et al. (1995) Extension of kumquat fruit storage life by postharvest hot dip treatments in water and freshening agent. Adv Hortic Sci 9: 1000-1004.
[112]	Strano MC, Calandra M, Aloisi V, et al. (2014) Hot water dipping treatments on Tarocco orange fruit and their effects on peel essential oil. Postharvest Biol Tec 94: 26-34. doi: 10.1016/j.postharvbio.2014.01.026
[113]	Mannheim CH, Soffer T (1996) Permeability of different wax coatings and their effect on citrus fruit quality. J Agric Food Chem 44: 919-923. doi: 10.1021/jf950230a
[114]	Nisperos-Carriedo MO, Shaw PE, Baldwin EA (1990) Changes in volatile flavor components of pineapple orange juice as influenced by the application of lipid and composite films. J Agric Food Chem 38: 1382-1387. doi: 10.1021/jf00096a018
[115]	Maftoonazad N, Ramaswamy HS (2008) Effect of pectin-based coating on the kinetics of quality change associated with stored avocados. J Food Process Pres 32: 621-643. doi: 10.1111/j.1745-4549.2008.00203.x
[116]	Arnon H, Granit R, Porat R, et al. (2015) Development of polysaccharides-based edible coatings for citrus fruits: A layer-by-layer approach. Food Chem 166: 465-472. doi: 10.1016/j.foodchem.2014.06.061
[117]	Tesfay SZ, Magwaza LS (2017) Evaluating the efficacy of moringa leaf extract, chitosan and carboxymethyl cellulose as edible coatings for enhancing quality and extending postharvest life of avocado (Persea americana Mill.) fruit. Food Packag Shelf Life 11: 40-48. doi: 10.1016/j.fpsl.2016.12.001
[118]	Plácido GR, da Silva RM, Cagnin C, et al. (2016) Effect of chitosan-based coating on postharvest quality of tangerines (Citrus deliciosa Tenore): Identification of physical, chemical, and kinetic parameters during storage. Afr J Agr Res 11: 2185-2192. doi: 10.5897/AJAR2014.9355
[119]	Nisperos-Carriedo MO, Baldwin EA, Shaw PE (1991) Development of an edible coating for extending postharvest life of selected fruits and vegetables. P Fl St Hortic Soc 104: 122-125.
[120]	Dodd M, Cronje P, Taylor M, et al. (2008) A review of the postharvest handling of fruits in South Africa over the past twenty five years. S Afr J Plant Soil 27: 97-116.
[121]	Hagenmaier RD, Shaw PE (1992) Gas permeability of fruit coating waxes. J Am Soc Hortic Sci 117: 105-109. doi: 10.21273/JASHS.117.1.105
[122]	Hagenmaier RD, Baker RA (1993) Reduction in gas exchange of citrus fruit by wax coatings. J Agric Food Chem 14: 283-287.
[123]	Stapleton AE (1992) Ultraviolet radiation and plants: burning questions. Plant Cell 4: 1353-1358. doi: 10.2307/3869507
[124]	Rodov V, Ben-Yehoshua S, Kim JJ, et al. (1992) Ultraviolet illumination induces scoparone production in kumquat and orange fruit and improves decay resistance. J Am Soc Hortic Sci 117: 788-792. doi: 10.21273/JASHS.117.5.788
[125]	Rodov V, Ben-Yehoshua S, Fang D, et al. (1994) Accumulation of phytoalexins scoparone and scopoletin in citrus fruits subjected to various postharvest treatments. International Symposium on Natural Phenols in Plant Resistance. Acta Hortic, 517-525.
[126]	D'hallewin G, Schirra M, Pala M, et al. (2000) Ultraviolet C irradiation at 0.5 kJ.m^-2 reduces decay without causing damage or affecting postharvest quality of star ruby grapefruit (C. paradisi Macf.). J Agric Food Chem 48: 4571-4575.
[127]	Terry LA, Joyce DC (2004) Elicitors of induced disease resistance in postharvest horticultural crops: a brief review. Postharvest Biol Tec 32: 1-13. doi: 10.1016/j.postharvbio.2003.09.016
[128]	Stevens C, Wilson CL, Lu JY, et al. (1996) Plant hormesis induced by ultraviolet light-C for controlling postharvest diseases of tree fruits. Crop Prot 15: 129-134. doi: 10.1016/0261-2194(95)00082-8
[129]	Canale MC, Benato EA, Cia P, et al. (2011) In vitro effect of UV-C irradiation on Guignardia citricarpa and on postharvest control of citrus black spot. Trop Plant Pathol 36: 356-361. doi: 10.1590/S1982-56762011000600003
[130]	Lers A, Burd S, Lomaniec E, et al. (1998) The expression of a grapefruit gene encoding an isoflavone reductase-like protein is induced in response to UV irradiation. Plant Mol Biol 36: 847-856. doi: 10.1023/A:1005996515602
[131]	Delaquis PJ, Stewart S, Toivonen PMA, et al. (1999) Effect of warm, chlorinated water on the microbial flora of shredded iceberg lettuce. Food Res Int 32: 7-14. doi: 10.1016/S0963-9969(99)00058-7
[132]	Prusky D, Eshel D, Kobiler I, et al. (2001) Postharvest chlorine treatments for the control of the persimmon black spot disease caused by Alternaria alternata. Postharvest Biol Tec 22: 271-277. doi: 10.1016/S0925-5214(01)00084-9
[133]	Kitinoja L, Kader AA (1994) Small-Scale Postharvest Handling Practices: A Manual for Horticultural Crops. Report No. 8E. Department of Pomology, University of California, California, USA.
[134]	Tefera, A, Seyoum T, Woldetsadik K (2007) Effect of disinfection, packaging, and storage environment on the shelf life of mango. Biosyst Eng 96: 201-212. doi: 10.1016/j.biosystemseng.2006.10.006
[135]	Simons LK, Sanguansri P (1997) Advances in the washing of minimally processed vegetables. Food Aust 49: 75-80.
[136]	Suslow T (1997) Chlorination in the production and postharvest handling of fresh fruits and vegetables. University of California-Davis, 1997. Available from: http://extension.psu.edu.
[137]	Jowkar MM (2006) Water relations and microbial proliferation in vase solutions of Narcissus tazetta L. cv. 'Shahla-e-Shiraz' as affected by biocide compounds. J Hortic Sc Biotech 81: 656-660.
[138]	Premuzic Z, Palmucci HE, Tamborenea J, et al. (2007) Chlorination: Phytotoxicity and effects on the production and quality of Lactuca sativa var. Mantecosa grown in a closed, soil-less system. Phyton-Int J Exp Bot 76: 103-117.
[139]	Gil MI, Selma MV, Lopez-Galvez F, et al. (2009) Fresh-cut product sanitation and wash water disinfection: Problems and solutions. Int J Food Microbiol 134: 37-45. doi: 10.1016/j.ijfoodmicro.2009.05.021
[140]	Smilanick JL, Sorenson D (2001) Control of postharvest decay of citrus fruit with calcium polysulfide. Postharvest Biol Tec 21: 157-168. doi: 10.1016/S0925-5214(00)00142-3
[141]	Beghin S (2014c) Personal communication. Premier Fruit Exports (Pty) Ltd, Durban, Republic of South Africa, 3 June 2014.
[142]	Gil MI, Gómez-López VM, Hung YC, et al. (2015) Potential of electrolyzed water as an alternative disinfectant agent in the fresh-cut industry. Food Bioprocess Tech 8: 1336-1348. doi: 10.1007/s11947-014-1444-1
[143]	Coroneo V, Carraro V, Marras B, et al. (2017) Presence of Trihalomethanes in ready-to-eat vegetables disinfected with chlorine. Food Addit Contam A 34: 2111-2117. doi: 10.1080/19440049.2017.1382723
[144]	Hall DJ (1986) Use of postharvest treatments for reducing shipping decay in kumquats. P Fl St Hortic Soc 99: 108-112.
[145]	Stange RR, Eckert JW (1994) Influence of postharvest handling and surfactants on control of green mold of lemons by curing. Phytopathology 84: 612-616. doi: 10.1094/Phyto-84-612
[146]	Sen F, Knay P, Karacal I, et al. (2007) Effects of the chlorine and heat applications after harvest on the quality and resistance capacity of Satsuma mandarins. Proceedings of the International Congress, CRIOF, University of Bologna, Bologna, Italy, 231-239.
[147]	Chiou CT, Freed VH, Schmedding DW, et al. (1977) Partition coefficient and bioaccumulation of selected organic chemicals. Environ Sci Technol 11: 475-478. doi: 10.1021/es60128a001
[148]	Bakhir VM (1997) Electrochemical activation of water: Past, present and future. Proceedings of the 1st International Symposium on Electrochemical Activation, Moscow, 38-45.
[149]	Leonov BI (1997) Electrochemical activation of water: Theory and practice. Proceedings of the First International Symposium on Electrochemical Activation, Moscow, 11-27.
[150]	Buck JW, Van Iersel MW, Oetting RD, et al. (2002) In vitro fungicidal activity of acidic electrolyzed oxidizing water. Plant Dis 86: 278-281. doi: 10.1094/PDIS.2002.86.3.278
[151]	Workneh TS, Osthoff G (2010) A review on integrated agro-technology of vegetables. Afr J Biotechnol 9: 9307-9327.
[152]	Guentzel JL, Lam KL, Callan MA, et al. (2010) Postharvest management of gray mold and brown rot on surfaces of peaches and grapes using electrolyzed oxidizing water. Int J Food Microbiol 143: 54-60. doi: 10.1016/j.ijfoodmicro.2010.07.028
[153]	Lesar K (2002) The screening of neutral anolyte against post-harvest fungal spores causing disease of citrus fruit. Unpublished report, Citrus Research International, Nelspruit, South Africa.
[154]	Porat R, Daus A, Weiss B, et al. (2002) Effects of combining hot water, sodium bicarbonate and biocontrol on postharvest decay of citrus fruit. J Hortic Sci Biotech 77: 441-445. doi: 10.1080/14620316.2002.11511519
[155]	Huanga Y, Deverall BJ, Morris SC (1995) Postharvest control of green mould on oranges by a strain of Pseudomonas glathei and enhancement of its biocontrol by heat treatment. Postharvest Biol Tec 5: 129-137. doi: 10.1016/0925-5214(94)00016-L
[156]	El-Ghaouth A, Smilanick JL, Wilson CL (2000) Enhancement of the performance of Candida saitoana by the addition of glycolchitosan for the control of postharvest decay of apple and citrus fruit. Postharvest Biol Tec 19: 103-110. doi: 10.1016/S0925-5214(00)00076-4
[157]	Ippolito A, El Ghaouth A, Wilson CL, et al. (2000) Control of postharvest decay of apple fruit by Aureobasidium pullulans and induction of defense responses. Postharvest Biol Tec 19: 265-272. doi: 10.1016/S0925-5214(00)00104-6
[158]	Wisniewski ME, Wilson CL (1992) Biological control of postharvest diseases of fruits and vegetables: Recent advances. HortScience 27: 94-98. doi: 10.21273/HORTSCI.27.2.94
[159]	Sharma RR, Singh D, Singh R (2009) Biological control of postharvest diseases of fruits and vegetables by microbial antagonists: A review. Biol Control 50: 205-221. doi: 10.1016/j.biocontrol.2009.05.001
[160]	Arras G (1996) Mode of action of an isolate of Candida famata in biological control of Penicillium digitatum in orange fruits. Postharvest Biol Tec 8: 191-198. doi: 10.1016/0925-5214(95)00071-2
[161]	Bar-Shimon M, Yehuda H, Cohen L, et al. (2004) Characterization of extracellular lytic enzymes produced by the yeast biocontrol agent Candida oleophila. Curr Genet 45: 140-148. doi: 10.1007/s00294-003-0471-7
[162]	Lahlali R, Hamadi Y, Jijakli MH (2011) Efficacy assessment of Pichia guilliermondi strain Z1, a new biocontrol agent, against citrus blue mould in Morocco under the influence of temperature and relative humidity. Biol Control 56: 217-224. doi: 10.1016/j.biocontrol.2010.12.001
[163]	Bi Y, Li Y, Ge Y (2007) Induced resistance in postharvest fruits and vegetables by chemicals and its mechanism. Stewart Postharvest Rev 3: 1-7. doi: 10.2212/spr.2007.6.16

This article has been cited by:

1.	Vasilii P. Kirnos, Threshold Analysis of Request Degradation in the Computer Network, 2019, 26, 2313-5417, 195, 10.18255/1818-1015-2019-2-195-202
2.	Wenhao Deng, Ioannis Papavasileiou, Zhi Qiao, Wenlong Zhang, Kam-Yiu Lam, Song Han, Advances in Automation Technologies for Lower Extremity Neurorehabilitation: A Review and Future Challenges, 2018, 11, 1937-3333, 289, 10.1109/RBME.2018.2830805
3.	Natalia Chaudhry, Muhammad Murtaza Yousaf, Architectural assessment of NoSQL and NewSQL systems, 2020, 38, 0926-8782, 881, 10.1007/s10619-020-07310-1
4.	Saeed Zareian, Marios Fokaefs, Hamzeh Khazaei, Marin Litoiu, Xi Zhang, 2016, A big data framework for cloud monitoring, 9781450341523, 58, 10.1145/2896825.2896828
5.	Jeang-Kuo Chen, Wei-Zhe Lee, An Introduction of NoSQL Databases Based on Their Categories and Application Industries, 2019, 12, 1999-4893, 106, 10.3390/a12050106
6.	Yong-Cheol Seo, Sung-Hoon Park, Yeong-Mok Kim, Jae-Youp Lee, Yoon Kim, Comparing Key-Value Store based Distributed NoSQL Database Performance on a Single-Node Instances, 2019, 20, 1598-2009, 2227, 10.9728/dcs.2019.20.11.2227
7.	Davide Carneiro, Daniel Araújo, André Pimenta, Paulo Novais, Real Time Analytics for Characterizing the Computer User's State, 2016, 5, 2255-2863, 01, 10.14201/ADCAIJ201654118
8.	Noa Roy-Hubara, Peretz Shoval, Arnon Sturm, 2019, Chapter 18, 978-3-030-20617-8, 261, 10.1007/978-3-030-20618-5_18
9.	Norwini Zaidi, Iskandar Ishak, Fatimah Sidi, Lilly Suriani Affendey, An Efficient Schema Transformation Technique for Data Migration from Relational to Column-Oriented Databases, 2022, 43, 0267-6192, 1175, 10.32604/csse.2022.021969
10.	Piotr Marcin Tracz, Małgorzata Plechawska-Wójcik, Comparative analysis of the performance of selected database management system, 2024, 31, 2544-0764, 89, 10.35784/jcsi.5927

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Agriculture and Food

1.3 3.9

Metrics

Article views(11207) PDF downloads(1223) Cited by(12)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(1) / Tables(8)

AIMS Agriculture and Food

A review of the postharvest characteristics and pre-packaging treatments of citrus fruit

Related Papers:

Abstract

1. Introduction

2. NoSQL technologies

2.1. Transactional properties and performance

3. Data store categories

3.1. Key-value data stores

3.1.1. Memcached

3.1.2. Redis

3.1.3. Project voldemort

3.1.4. Summary and future trend

3.2. Document stores

3.2.1. CouchDB

3.2.2. MongoDB

3.2.3. RavenDB

3.2.4. Summary and future trend

3.3. Graph databases

3.3.1. Neo4j

3.3.2. Titan

3.3.3. OrientDB

3.3.4. Summary and future trend

3.4. Wide column stores

3.4.1. Apache HBase

3.4.2. Apache cassandra

3.4.3. Apache accumulo

3.4.4. Hypertable

3.4.5. Summary and future trend

4. How to select the right one

5. Performance evaluation

5.1. Benchmark configuration

5.2. Results

6. Conclusion

Acknowledgments

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog