Tuesday, March 19, 2013

A Tool for the Time


For the last 35 years the world has been using relational databases to store our all kinds of data.  From financial to statistical data to web sites, databases such as Oracle, MS SQL Server and MySQL have been used to store petabytes of information around the world. Developers have been conditioned to normalize our data to minimize the physical space uses, enforce consistency, and maximize performance.
Now there’s a new kid on the block gaining popularity called No-SQL (Non-Relational) databases. Industry leaders such as Google, Amazon and Twitter are using No-SQL databases such as BigTable, CouchDB and MongoDB in their production environments. But what are their differences and how did we get here?
To know where we are now, we have to start back at the beginning. The first RDBMS (Relational Database Management System) was the Multics Relational Data Store in 1978.  At this time the price of hardware was very high and performance expectations were relatively low. Even a slow computer was faster that a human with a calculator. Apple Computer just introduced the Disk II, a 5.25 inch floppy disk drive linked to the Apple II by a cable for  $495 providing 720 kB. Program languages were completely procedural.  Pascal, C & ADA were the most popular. The demands on the industry were limited by the available technology.
Jump forward to the 2000’s when hardware prices have dropped drastically.  Programming languages are in their 10th and 12th generation and utilize object-oriented models. The Internet is beginning to mature and has gained use in the general public. It was this movement that fueled the development of the No-SQL database. No longer was the physical size of our data the highest priority. Uptime, redundancy and performance were the new goals. Database structures have begun to mimic objects instead of tables. Even the rules of normalization which architects hold so dear have been cast aside as they are no longer applicable to No-SQL databases.
Relational Databases
  • Minimize disk space through foreign key relations.
  • Force consistency through table definitions and constraints.
  • Data is usually stored in a single database. Replication is available to maintain a hot “stand by” in case of a crash.

No-SQL Databases
  • Do not enforce any foreign keys but store multiple copies of the same data.
  • Consistency is enforced at the application level, not the database. No table definitions or constraints exist on the database. Document definitions are defined on creation of the individual document.
  • Database clustering is available, allowing query distributions across several servers providing increased performance. Additionally, clustering allows for better uptime and lower maintenance impacts.

As with any technology, acceptance to change is slow and relational databases still have many years ahead of them.  However, is the writing on the wall and the extinction of relational databases on the horizon? Only time will tell.

Mongo DB for Developers Certification

As of Tuesday March 12, I have passed my certification. For anyone interested in No-SQL databases or MongoDB I highly recommend this 7 week course. It covers CRUD, python integration, performance tuning, sharding, replication and many more valuable topics. There's also a course for DBAs. For more information, check out http://education.10gen.com.