With the advent of the Internet of Things (IoT) and cloud computing, the need for data stores that would be able to store and process big data in an efficient and cost-effective manner has increased dramatically. Traditional data stores seem to have numerous limitations in addressing such requirements. NoSQL data stores have been designed and implemented to address the shortcomings of relational databases by compromising on ACID and transactional properties to achieve high scalability and availability. These systems are designed to scale to thousands or millions of users performing updates, as well as reads, in contrast to traditional RDBMSs and data warehouses. Although there is a plethora of potential NoSQL implementations, there is no one-sizefit-all solution to satisfy even main requirements. In this paper, we explore popular and commonly used NoSQL technologies and elaborate on their documentation, existing literature and performance evaluation. More specifically, we will describe the background, characteristics, classification, data model and evaluation of NoSQL solutions that aim to provide the capabilities for big data analytics. This work is intended to help users, individuals or organizations, to obtain a clear view of the strengths and weaknesses of well-known NoSQL data stores and select the right technology for their applications and use cases. To do so, we first present a systematic approach to narrow down the proper NoSQL candidates and then adopt an experimental methodology that can be repeated by anyone to find the best among short listed candidates considering their specific requirements.
Citation: Hamzeh Khazaei, Marios Fokaefs, Saeed Zareian, Nasim Beigi-Mohammadi, Brian Ramprasad, Mark Shtern, Purwa Gaikwad, Marin Litoiu. How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey[J]. Big Data and Information Analytics, 2016, 1(2&3): 185-216. doi: 10.3934/bdia.2016004
With the advent of the Internet of Things (IoT) and cloud computing, the need for data stores that would be able to store and process big data in an efficient and cost-effective manner has increased dramatically. Traditional data stores seem to have numerous limitations in addressing such requirements. NoSQL data stores have been designed and implemented to address the shortcomings of relational databases by compromising on ACID and transactional properties to achieve high scalability and availability. These systems are designed to scale to thousands or millions of users performing updates, as well as reads, in contrast to traditional RDBMSs and data warehouses. Although there is a plethora of potential NoSQL implementations, there is no one-sizefit-all solution to satisfy even main requirements. In this paper, we explore popular and commonly used NoSQL technologies and elaborate on their documentation, existing literature and performance evaluation. More specifically, we will describe the background, characteristics, classification, data model and evaluation of NoSQL solutions that aim to provide the capabilities for big data analytics. This work is intended to help users, individuals or organizations, to obtain a clear view of the strengths and weaknesses of well-known NoSQL data stores and select the right technology for their applications and use cases. To do so, we first present a systematic approach to narrow down the proper NoSQL candidates and then adopt an experimental methodology that can be repeated by anyone to find the best among short listed candidates considering their specific requirements.
| [1] | Abubakar Y., Adeyi T. S., Auta I. G. (2014) Performance evaluation of nosql systems using ycsb in a resource austere environment. Performance Evaluation 7: 23-27. |
| [2] | P. Andlinger, 2015, URL http://db-engines.com/en/blog_post/43. |
| [3] | Apache Software Foundation, Apache tinkerpop, 2015, URL http://tinkerpop.incubator.apache.org. |
| [4] | Apache Software Foundation, Technical overview of apache couchdb, 2015, URL http://wiki.apache.org/couchdb/TechnicalOverview. |
| [5] | ArangoDB GmbH, Arangodb documentation, 2015, URL https://www.arangodb.com/documentation. |
| [6] | Aurelius LLC, Titan architecture overview, 2015, URL http://s3.thinkaurelius.com/docs/titan/0.9.0-M2/arch-overview.html. |
| [7] | Basho Technologies, Inc, Riak docs, 2015, URL http://docs.basho.com/riak/latest/intro-v20. |
| [8] | M. Burrows, The chubby lock service for loosely-coupled distributed systems, in Proceedings of the 7th symposium on Operating systems design and implementation, USENIX Association, 2006, 335-350. |
| [9] | Casado R., Younas M. (2015) Emerging trends and technologies in big data processing. Concurrency and Computation: Practice and Experience 27: 2078-2091. |
| [10] | Cattell R. (2010) Scalable sql and nosql data stores. ACM SIGMOD Record 39: 12-27. |
| [11] | F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes and R. E. Gruber, Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems (TOCS), 26 (2008), p4. |
| [12] | B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan and R. Sears, Benchmarking cloud serving systems with ycsb, in Proceedings of the 1st ACM symposium on Cloud computing, ACM, 2010, 143-154. |
| [13] | S. Edlich, A. Friedland, J. Hampe, B. Brauer, M. Brückner, S. Edlich, A. Friedland, J. Hampe, B. Brauer and M. Brückner, Nosql. |
| [14] | A. Feinberg, Project voldemort: Reliable distributed storage, in Proceedings of the 10th IEEE International Conference on Data Engineering, 2011. |
| [15] | Fitzpatrick B. (2004) Distributed caching with memcached. Linux journal : p5. |
| [16] | S. K. Gajendran, A survey on nosql databases, University of Illinois. |
| [17] | J. Gray, Graysort benchmark, 2015, URL http://sortbenchmark.org. |
| [18] | Hibernating Rhinos., Ravendb -the open source nosql database for. NET, 2015, URL http://ravendb.net/docs/article-page/3.0/csharp/start/getting-started. |
| [19] | Hypertable Inc, Hypertable, 2014, URL http://hypertable.org/. |
| [20] | S. IT, Knowledge base of relational and nosql database management systems, 2015, URL http://db-engines.com. |
| [21] | S. IT, System properties comparison neo4j vs. orientdb vs. titan, 2015, URL http://db-engines.com/en/system/Neo4j. |
| [22] | J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. Wasi-ur Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur et al., Memcached design on high performance rdma capable interconnects, in Parallel Processing (ICPP), 2011 International Conference on, IEEE, 2011, 743-752. |
| [23] | S. Jouili and V. Vansteenberghe, An empirical comparison of graph databases, in Social Computing (SocialCom), 2013 International Conference on, 2013, 708-715. |
| [24] | J. Klein, I. Gorton, N. Ernst, P. Donohoe, K. Pham and C. Matser, Performance evaluation of nosql databases: A case study, in Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, PABS '15, ACM, New York, NY, USA, 2015, 5-10. |
| [25] | LinkedIn, Project voldemort, 2015, URL http://www.project-voldemort.com. |
| [26] | R. C. McColl, D. Ediger, J. Poovey, D. Campbell and D. A. Bader, A performance evaluation of open source graph databases, in Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, ACM, New York, NY, USA, 2014, 11-18. |
| [27] | MongoDB Inc., Mongodb 3. 0 manual, 2015, URL http://docs.mongodb.org/manual. |
| [28] | A. Moniruzzaman and S. A. Hossain, Nosql database: New era of databases for big data analytics-classification, characteristics and comparison, arXiv preprint arXiv: 1307.0191. |
| [29] | M. A. Olson, K. Bostic and M. I. Seltzer, Berkeley db., in USENIX Annual Technical Conference, FREENIX Track, 1999, 183-191. |
| [30] | Orient Technologies, Top 10 key advantages for going with orientdb, 2015, URL http://orientdb.com/why-orientdb/. |
| [31] | A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden and M. Stonebraker, A comparison of approaches to large-scale data analysis, in Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, ACM, 2009, 165-178. |
| [32] | D. Pritchett, Base: An acid alternative, Queue, 6 (2008), 48-55. |
| [33] | T. Rabl, A. Ghazal, M. Hu, A. Crolotte, F. Raab, M. Poess and H.-A. Jacobsen, Bigbench specification v0. 1, in Specifying Big Data Benchmarks, Springer, 2014, 164-201. |
| [34] | RedisLabs, Redis, 2015, URL http://redis.io/documentation. |
| [35] | SAVI, Smart Applications on Virtual Infrastructure, Cloud platform, 2015, URL http://www.savinetwork.ca. |
| [36] | S. Sivasubramanian, Amazon dynamodb: A seamlessly scalable non-relational database service, in Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, ACM, 2012, 729-730. |
| [37] | C. Strozzi, Nosql-a relational database management system, 2015, URL http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/HomePage. |
| [38] | Technology, Cypher query language, 2015, URL http://neo4j.com/docs/stable/cypher-query-lang.html. |
| [39] | The Apache Foundation, Apache accumulo, 2015, URL http://accumulo.apache.org/. |
| [40] | The Apache Foundation, Welcome to apache cassandra, 2015, URL http://cassandra.apache.org/. |
| [41] | The Apache Foundation, Welcome to apache hbase, 2015, URL http://hbase.apache.org/. |
| [42] | A. Tizghadam and A. Leon-Garcia, Connected Vehicles and Smart Transportation -CVST Platform, 2015, URL http://cvst.ca/wp/wp-content/uploads/2015/06/CVST.pdf. |
| [43] | G. Vaish, Getting started with NoSQL, Packt Publishing Ltd, 2013. |
| [44] | vsChart. com, The comparison wiki: Database list, 2015, URL http://vschart.com/list/database/. |
| [45] | P. Wiki, Pig mix benchmark, 2015, URL https://cwiki.apache.org/confluence/display/PIG/PigMix. |