Most applications utilize some form of database to persist data in.
Knowledge about databases is therefore essential for a software developer.
This article targets beginners and attempt to provide them with a basic understanding of database types, products and terminology with the intent to provide the necessary background information to work with databases.
If you want to skip the background information and go straigth to the "HowTo", then got to this article.
Databases are often categorized based on their storage model:
|Hierarchical database||Data are organized in a tree like structure (think org chart)||The standard database type from mid 1960's to mid 1980's and still used in the mainframe world|
|Relational database (RDBMS)||Data are organized as sets (see below for details)||The standard database type from mid 1980's|
|Object oriented database (OODBMS)||Data are stored as an object model||Never really caugth on even though OOP became the standard for programming|
|Key value store||Data are stored as keys and values||Widely used on mainframe and mini computers in 60's, 70's and 80's - returned in the late 00's as part of the NoSQL movement|
|Document store||Data are stored as large documents (usually in XML or JSON format)||Became popular as part of the NoSQL movement|
|Column store||Data are stored as keys, values and timestamps||Became popular as part of the NoSQL movement|
|NoSQL databases||Different definitions exist:
||Became popular in internet startups and similar in the late 00's|
|NewSQL databases||Relational databases build on NoSQL technology||Concept showed up in early 10's but has not gained wide popularity yet|
Databases can also be categorized based on how the application communicate with it:
Some of the more popular database products are:
|Database type||Popular products|
|Relational database (RDBMS)||
|Object oriented database (OODBMS)||
|Key value store||
|NoSQL databases||See above|
Relational databases are still the most widely used databases and furthermore there is a tendency to measure non-relational databases against relational database, so it is worth looking a little bit into relational databases.
The relational model was invented by Edgar Codd in 1969/1970.
It was later advocated by Chris Date, whose book "An Introduction to Database Systems" can be considered the bible of relational databases (and note that the term "introduction" in the title should not be take literally - it is a book at a relative high level).
Relational database became popular during the 1980's when products like IBM DB2, Oracle DB, DEC RDB, Sybase and Informix arrived in the market.
There are many confusing aspects of relational database theory including the terms used for the most basic concepts:
|Relational database theory term||Standard database term||Application term|
The different key types are:
The different relation types are:
Joining is combing two tables from a relation. The different join types are:
Normalization is the process of optimizing database tables to renmove any redundancy in the data.
There are different levels of normalization:
|1NF||1st Normal Form||All values are atomic - not a list of multiple values||Move the list to a separate table with a FK to the original table|
|2NF||2nd Normal Form||1NF + all fields not part of any CK depends on all fields in any CK - not on some of the fields||Move those fields to separate tables|
|3NF||3rd Normal Form||2NF + no field depends on any field not part of a CK||Move those fields to separate tables|
|BCNF||Boyce Codd Normal Form|
|4NF||4th Normal Form|
|5NF||5th Normal Form|
It should be obvious by now that a relational database model and an object model used in an application is not a 1:1 match.
This is known as the object-relational impedance mismatch.
Some consider this a major problem for the mix of object oriented programming and relational databases. But I don't see it as a major problem. Due to both OOP and RDBMS being very popular then methodologies and tools has been developed that mitigate most of the problems.
All relational databases today supports SQL (Structured Query Language).
Its history goes back to 1974.
It has been standardized by ANSI and ISO in 1986, 1992, 1999, 2003, 2006, 2008 and 2011.
Note that unlike traditional programming languages then the new features are rarely used. Most SQL usage just require SQL 1992.
These API's are typical using SQL.
Common API's/Libraries are:
|MySQL||MySQL Connector/C aka libmysql||C|
|Oracle DB||LIBOCI||C and C++|
|MySQL||mysql extension (NOTE: OBSOLETE)||PHP|
I don't see much need to use these API's today. I would use a standardized API instead.
For examples of DBLIB see here.
For examples of MySQL Connector/C see here.
For examples of MySQL Connector/C++ see here.
For example of LIBOCI see here.
For examples of mysql extension see here.
For examples of mysqli extension see here.
Embedded SQL was the primary way to access relational databases for many years and are still used in IBM mainframe, commercial Unix and OpenVMS environments.
The concept is that one write a program with:
and then a preproceser convert the embedded SQL statements to call to database specific API and then it is compiled.
Embedded SQL is not something I would consider if there are any alternatives.
For examples of embedded SQL (using C) see here.
Java has a special flavor of embedded SQL called SQLJ. For example see here.
These API's are typical using SQL.
Common API's are:
|JDBC||Java||JDBC Driver||There are four types of drivers - most important are type 2 that uses native libraries and type 4 that is pure Java|
|ODBC||C (common on Windows, available on Unix/Linux, standard for all IBM DB2)||ODBC Driver||Note that when using ODBC on Windows it is often using OLE DB with the bridge to ODBC|
|OLE DB / ADO||COM (C++, VB6 and VBS on Windows)||OLE DB Provider / ADO Provider|
|ADO.NET||.NET (C#, VB.NET)||ADO.NET Provider|
|Qt SQL||C++||Qt SQL driver|
For relational databases these should be the preferred API for all usage that does not fit well with ORM. This include operations on record sets instead of individual records - often in a DWH context.
For examples of JDBC see here.
For examples of ODBC see here.
For examples of ADO see here.
For examples of ADO.NET see here.
For examples of PDO see here.
For examples of Qt SQL see here.
These are very different between different NoSQL databases.
The API typical consists of:
In my opinion these API's are often rather cumbersome to use. But if one need to use a NoSQL database, then one does not have any choice.
The concept is to write:
and then the ORM handles the database access without the programmer needing to write any SQL.
ORM's are typical using an ORM specific query lanaguage for advanced queries.
Common ORM's are:
|EJB Entity Bean CMP||Java EE||EJBQL||XML||Obsolete. Replaced by JPA.|
|MyBatis||Java||SQL||Convention||Fork of iBatis|
|JPA||Java||JPQL||Class annotations||JPA is really an API. Multiple implementations exist: Hibernate, EclipseLink and OpenJPA.|
|LINQ to SQL||.NET (C# and VB.NET)||LINQ||Class attributes||From .NET 3.5. Obsolete. Replaced by EF.|
|EF||.NET (C# and VB.NET)||LINQ||Class attributes||From .NET 4.0.|
|NHibernate||.NET (C# and VB.NET)||HQL or LINQ||XML||Port of Hibernate.|
|Doctrine||PHP||DQL or SQL||XML or class annotations (doc)||Hibernate inspired.|
For relational databases these should be the preferred API for CRUD operating on single records - usually in an OLTP context.
In theory ORM can be used for NoSQL databases as well, but it is very rare.
For examples of using Hibernate see here.
For examples of using LINQ to SQL see here.
For examples of using EF see here.
For examples of using NHibernate see here.
For examples of using Doctrine see here.
|1.0||October 17th 2016||Initial version based on multiple old articles on Eksperten.dk|
|1.1||December 22nd 2017||Add bunch of links to examples|
See list of all articles here
Please send comments to Arne Vajhøj