Category Archives: No SQL

Types of Databases in existing world

The capture and analyzing of data is typically performed by database management systems, otherwise known as DBMS’s. These types of database software systems are programmed in SQL. SQL (pronounced “ess-que-el”) stands for Structured Query Language. SQL is used to communicate with a database. According to ANSI (American National Standards Institute), it is the standard language for relational database management systems. The most common of all the different types of databases is Relational Databases.

Let’s learn now the different types of databases that exist in today’s world and how to use them in our work.

Types of Databases

  • Relational Databases: A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970. Example are: PostgreSQL, SQLite, MySQL ,Oracle, Sysbase.
  • No SQLDatabases/Non-relational Databases : A NoSQL (originally referring to “non SQL”, “non relational” or “not only SQL”) database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. NoSQL databases are increasingly used in big data and real-time web applications.NoSQL systems are also sometimes called “Not only SQL” to emphasize that they may support SQL-like query languages. Motivations for this approach include: simplicity of design, simpler “horizontal” scaling to clusters of machines (which is a problem for relational databases), and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, columnar, graph, or document) are different from those used by default in relational databases, making some operations faster in NoSQL.

Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability and partition tolerance.  Some reasons that block adoption of NoSQL stores include the use of low-level query languages, the lack of standardized interfaces, and huge investments in existing SQL.  Also, most NoSQL stores lack true ACID transactions or only support transactions in certain circumstances and at certain levels (e.g., document level).  Finally, RDBMS’s are usually much simpler to use as they have GUI’s where many NoSQL solution use a command-line interface.

 

  • New SQL Databases: NewSQL is a term to describe a new group of databases that share much of the functionality of traditional SQL relational databases, while offering some of the benefits of NoSQL technologies.Like it provides, ACID transactional consistency of traditional operational databases; the familiarity and interactivity of SQL; and the scalability and speed of NoSQL.

 

Example to understand both above Databases:

 

If we use a bank example, each aspect of a customer’s relationship with a bank is stored as separate row items in separate tables.  So the customer’s master details are in one table, the account details are in another table, the loan details in yet another, investments in a different table, and so on.  All these tables are linked to each other through the use of relations such as primary keys and foreign keys.

Non-relational databases, specifically a database’s key-value stores or key-value pairs, are radically different from this model.  Key-value pairs allow you to store several related items in one “row” of data in the same table.  We place the word “row” in quotes because a row here is not really the same thing as the row of a relational table.  For instance, in a non-relational table for the same bank, each row would contain the customer’s details as well as their account, loan and investment details.  All data relating to one customer would be conveniently stored together as one record.

In the relational model, there is an built-in and foolproof method of ensuring and enforcing business logic and rules at the database layer, for instance that a withdrawal is charged to the correct bank account, through primary keys and foreign keys.  In key-value stores, this responsibility falls squarely on the application logic and many people are very uncomfortable leaving this crucial responsibility just to the application.  This is one reason why relational databases will continued to be used.

However, when it comes to web-based applications that use databases, the aspect of rigorously enforcing business logic is often not a top priorities.  The highest priority is the ability to service large numbers of user requests, which are typically read-only queries.  For example, on a site like eBay, the majority of users simply browse and look through posted items (read-only operations).  Only a fraction of these users actually place bids or reserve the items (read-write operations).  And remember, we are talking about millions, sometimes billions, of page views per day.  The eBay site administrators are more interested in quick response time to ensure faster page loading for the site’s users, rather than the traditional priorities of enforcing business rules or ensuring a balance between reads and writes.

 

Types and examples of NoSQL databases

There have been various approaches to classify NoSQL databases, each with different categories and subcategories, some of which overlap. What follows is a basic classification by data model, with examples:

 

  1. Key-Value Pair (KVP) Databases: Key-value (KV) stores use the associative array (also known as a map or dictionary) as their fundamental data model. In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. The key-value model can be extended to a discretely ordered model that maintains keys in lexicographic order. This extension is computationally powerful, in that it can efficiently retrieve selective key e.g., InfinityDB, Oracle NoSQL Database and dbm.
  2. Document Databases: each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML and JSON. Documents are addressed in the database via a unique key that represents that document. One of the other defining characteristics of a document-oriented database is that in addition to the key lookup performed by a key-value store, the database offers an API or query language that retrieves documents based on their contents.

Different implementations offer different ways of organizing and/or grouping documents:

  • Collections
  • Tags
  • Non-visible metadata
  • Directory hierarchies

In short, Store documents or web pages, e.g.,MongoDB, Apache CouchDB

  1. Columnar Databases: Store data in columns, e.g., Hbase, SAP Hana
  2. Graph Databases: This kind of database is designed for data whose relations are well represented as a graph consisting of elements interconnected with a finite number of relations between them. The type of data could be social relations, public transport links, road maps or network topologies. Stores nodes and relationship, e.g., Neo4J, FlockDB
  3. Spatial Databases: For map and nevigational data, e.g.,OpenGEO, PortGIS, ArcSDE
  4. In-Memory Database (IMDB): All data in memory. For real time applications
  5. Cloud Databases: Any data that is run in a cloud using IAAS,VM Image, DAAS
            dbimages                                                                      Image courtesy: theWindowsclub.com
Advantages of NoSQL database:
  • Process data faster
  • Have simple data models to understand and execute
  • manage unstructured text

 

 

 

Advertisements

What is Big Data ? Is it Only Hadoop ? (Tutorial day 1)

Big Data, the new buzz word in the today’s technology is gaining more importance due to its high rewards. A systematic and focused approach toward the adoption of Big Data allows one to derive maximum value and utilize the power of Big Data.

 Its nothing but a new framework or system to get insight of existing different data forms and increasing the researchers/analyst power to get more out of existing system.

As BG Univ says, “Big data is about the application of new tools to do MORE analytic on MORE data for More people.”

Lifecycle of data can be defined as :

 

People get confuse with Big Data & Hadoop as 2 similar things. But no, Big data is not only Hadoop

Big Data is not a tool or single technique. Its actually a platform or a framework having various components like Data Warehouses (providing OLAP data/History), Real time Data systems and Hadoop (provides insight to structured/semi or unstructured Data).

Examples of Big Data are like Traffic data, Flights Data/ Search engine data etc.

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it will be of three types :

a) Structured data: Relational data.
b) Semi Structured data: XML data.
c) Unstructured data: Word, PDF, Text, Media Logs.

 Big Data can be characterized by 3 V’s :

1) Velocity -> Batch processing data, real time
2) Variety-> Structured, semi-structured, unstructured and polymorphic data
3) Volume-> Terabytes to Petabytes

Big Data puts existing traditional systems into trouble due to many reasons because when data increases the complexity, Security, maintenance, processing time of it also increases. Big Data gets Distributed processing system into picture. Its using multiple system/disk for parallel processing.

There are various tools & technologies in the market from different vendors including IBM, Microsoft, etc., to handle big data. Few of them are:

1) No SQL Big Data systems are designed to provide operational capabilities for real-time, interactive workloads where data is primarily captured and stored. It allows massive computations to be run inexpensively and efficiently. This makes operational big data workloads much easier to manage, cheaper, and faster to implement. For example MongoDB
2)MPP & MapReduce provide analytical capabilities for complex analysis including lot of data. Based on them we have Hadoop, Hive, Pig, Impala
3) Storage (HDFS ie Hadoop Distributed File System)
4) Servers (Google App Engine)
There are major challenges with Big Data.

Read  Day 2 tutorial to understand further and bookmark this page for future reference.