Abstract Your Data(base): Introduction

We at APP(ideas) have worked on some massive projects for some of the world’s largest corporations. We have created some equally impressive products for smaller companies. We have created our own products in TimeLogger, journalPad, bible.APP and dailyVerseJournal. We made the very first PHP-based MySQL interface and the first PHP shopping cart. We made an Open Source Content Management System (CMS) that was way ahead of its time, and what was once the best PHP calendar system available. We have developed for the web, for the desktop and for mobile. One thing is true about every product we have ever created: Data is King.

We understand that to most people, talking about data is good bedtime reading. But to those who are in the business of creating custom software, the importance of managing data cannot be overestimated. Truth be told, data is hugely important to “regular” people too — pay attention to your blood pressure if any of yours ever disappears if you need evidence of that fact. We have a lot of experience with storing, retrieving, organizing and assigning value to data. A large portion of our accumulated experience has gone into creating solutions for this very purpose.

This is the beginning of a six part series in which we will share our experience and some code — the code that currently serves as the foundation of every app that we create. In this introduction, we’ll provide a little history and make a case for the use of a bare-bones database abstraction layer as opposed to writing your code for a specific database implementation or settling for one of the popular database abstraction libraries that exists. The parts of this series are as follows:

Database Abstraction

Database Abstraction Layer for PHP and MySQL, PostgreSQL and Oracle (and beyond)
Database Abstraction Layer for iOS and SQLite
Database Abstraction Layer for Android and SQLite

Schema Adapter (an implementation)

Schema Adapter for PHP
Schema Adapter for iOS
Schema Adapter for Android

The sections following this introduction will skip much of the monologue and get straight into the code, but we felt that it was important to use some extra words to introduce the concept, convince you that it’s a good one, and show you that we have the experience to prove our expertise.

Call me Ishmael…

Some History

When I began working in the tech industry back in 1995, dynamic web sites were almost unheard of. At that time, PHP was still a personal project of Rasmus Lerdorf. My employer, an Internet Service Provider, had started making dynamic web sites for its real estate clients to showcase their listings. There was no such thing as a database system that connected to a web server, so we wrote our own file-based database system in perl.

By the end of 1997, it was clear that this business model wasn’t going to scale well because we could only put about 10 web sites on each one of our Pentium 166 machines (each with 32 whopping MB of RAM!) before they would get bogged down by normal traffic, so we started looking into better solutions, and PHP 3 with MySQL support, although still in beta at that time, became the obvious choice.

Before long, we began building a reputation as a company that could make impressive web-based applications, and this lead to the need to have a pre-built software system that we could easily deploy for multi-purpose use. By the end of 1999, we began work on our Content Management system that would later become known as dorcMan (Dynamic Object-Related Content MANagement system). In early 2001, that CMS was still undergoing a lot of regular changes, and I took it with me in the founding of APP(ideas). An important change had happened in PHP just a few months before that event, and that was the release of PHP 4, which introduced language constructs that allowed it to be utilized like a “real” Object-Oriented programming language. With all the pieces in place, we were able to stabilize dorcMan as the foundation for any custom application that needed a database back-end.

I referred to dorcMan as a “Content Management System” earlier, but it was (and is) not a CMS in the same way as WordPress and the like. WordPress (and similar systems) is (are) designed as a start-to-finish web site tool for people who don’t want to make a web site from scratch. There are many advantages to that (and they are why this site is built on WordPress), but, for many reasons, a traditional CMS is not a good foundation to start from if your end goal is to have a custom application that runs on the web. The reason that dorcMan was created was to give custom application builders a solid foundation from which to build. That foundation is where we begin this discussion.

A quick wrap-up of our history… Around 2003, it became apparent to us that mobile applications were going to catch on. Our client-base at that time, primarily massive corporations, were already asking us to create mobile apps for them, even though consumer demand was nowhere near what it is today. As was the case with our custom web-based software, it quickly became clear that we needed a good foundation for information storage and retrieval, and over the years, we have adapted dorcMan’s database abstraction to PalmOS, Windows Mobile (although we had to create our own database system for those two), Blackberry, Android and iOS.

Why Are You Doing This (and why should I care)?

Those of you who have experience with database applications are inevitably going to ask this question. It is undeniable that there are many popular database abstraction libraries written for nearly every programming language out there. Furthermore, most programming languages have some form of native database hooks, or lacking that, ODBC support at some level. If you desire to create the best possible products for your clients and for yourself, you should strongly question every important decision that you make — especially one as important to the foundation of your app as, “How should I access the data?”

In this industry, experience is the best teacher, and our research, study, and sometimes brutal experience has given us the answers to these questions. More than anything, having large projects that have had to be maintained over a period of many years has taught us that it is vastly important to make good decisions about the foundational code, it is very difficult to undo poor decisions, and you MUST have complete control of every aspect of your software at all times. Put plainly, the reasons that you should consider a light-weight database abstraction layer are many. Here are a few.

Why should I choose database abstraction as opposed to using my programming language’s native database connector?

You don’t always know how your software is going to change in the future, and you should ALWAYS be prepared for the worst-case scenario. Case in point: We once wrote an application to be used with a MySQL database, and several years into the project, our client’s IT department mandated that all database applications must use Oracle. Since we had abstracted the database layer of the application, the necessary work took about a week. Had we written in database-specific language, it would have been a complete rewrite. We looked like superstars among the other third-party developers employed by that client who had to do just that — and we got to paid at nearly triple our normal profit margin while our client was ecstatic.
If you are writing software for public use or that MAY (possibly) need to be ported to another platform at some point in the future, you have up to double the work cut out for you if you go native.
Sometimes it’s a pain (or just not obvious) how to get a database schema installed or query that database on an unfamiliar system. My first foray into Android development is such an example. This database abstraction takes care of most of the details for you so that you can get started writing queries.
If used properly, your software can be made more secure. For example, this abstraction can make calls through a routine that protects against SQL injection attacks. Having to implement that code yourself (especially across multiple systems) can be cumbersome and inefficient. In most cases, it is left out altogether.

Why should I use something like this instead of a popular one like Zend framework, MDB2, Doctrine, DiBi or <fill in the blank> or ODBC?

Frameworks are great. We won’t knock them. We have used and do use many of them for many things. Database abstraction is not one of those things.
Bloat. Frameworks are (usually) stable systems built by large teams of people. They seem to do everything except make you toast in the morning. That’s the problem. Most of these systems have to do everything for everybody, and in order to do so, much of the entire library will be things that you will never use. Easily, 80% of the code is related to features or systems that you won’t ever use. Beginning with a light-weight system leaves you with code that you understand and that you can quickly and easily change to suit your specific needs. Particularly if you are using someone else’s code, if the code base is large, there is a lot of bloat that you don’t need, and even worse, there’s a lot of code that you won’t be able to learn well enough to change (without knowing everything you might affect) to do precisely what you need if your needs differ in any way from the assumptions of the developer. This implementation takes a minimalist, roll-your-own type approach to give you the tools you need to get the job done with a level of resource efficiency that rivals native database connectors, while being simple enough to modify if you need to make it behave differently. Furthermore, it includes a full regression test suite so that you can easily guarantee that custom modifications have no affect on any of the functionality.
This is especially true of ODBC connectors, but also holds true for many of the popular libraries out there… In order to connect to many database systems, they have to be dumbed down to the least common denominator, meaning that you will be stuck with the features and (in)efficiency of the worst system supported by your chosen library.
You aren’t in control — You don’t know how it actually works. This is an important topic in software development, and one that reveals inexperience and incompetence in developers more than any other I can think of. As soon as you start thinking of any piece of your system as “magic” and don’t understand how it actually works, you are in trouble. You are coding by coincidence and praying that your code will continue to work without knowing why you should believe that it will. Why take such a chance with something as important as your data?
These systems are somewhere between difficult and impossible to change to suit your specific needs. Do you ever need to do something just a little differently than the default behavior? If you’re smart enough to change your system to fit your specific requirements, will your changes survive a security update by the framework vendor?
Sometimes the systems change themselves. There was an infamous case a few years back in which a PEAR library for one database system suddenly changed the calling order of its parameters of an important function, and everyone who was using that library and updated his or her PEAR libraries suddenly had broken web sites to fix. That sucked for everyone involved. Don’t you want to be in control of your own database management software?
If you must write code in multiple programming languages. A version of this database abstraction layer is available for PHP, Android and iOS, so you don’t have to learn a new way to access your data when you program in these different languages. You’ll be hard-pressed to find another decent database abstraction layer that can make that claim. There is more about this in the section below, but we felt that the fact that most (all?) of the existing libraries only work for a single programming language warranted a mention in this set of reasons.
Some of the existing database abstraction layers are pretty awkward (or downright difficult) to use. We aren’t going to site any examples because we don’t want to hurt anyone’s feelings. But one of the major points of using such a system should be to make your job easier, not to make your code more complicated.
In many cases, using a library still requires you to write code that is for a specific database system. See the previous section for why that is bad.

Why else should I consider such a system?

Consistency. Consistency. Consistency. If there is any possibility that you will end up porting your code to a new database system, abstracting the database layer is the right choice. Also, in this mobile world, it is becoming increasingly rare to have any piece of software that lives only in one place, on one device or server. The chance that you are going to have to spend some time in a different programming language or environment than that to which you are accustomed is very high. Keeping programming languages straight is hard enough as it is. Wouldn’t it be nice to not have to learn a new way to query your database when that time comes?
I made this point in the first list of answers, but it bears repeating as a general reason for using database abstraction: If your database access is abstracted, it makes the task of switching to a new database system or platform quick and simple. This not only fills the need to have code that is portable, it fulfills the number one tenet of Object-Oriented programming: reusable code. Imagine a world in which you can use your code and knowledge on a completely new platform immediately following an FTP transfer. That world exists. I’ve seen it.
Resource utilization can be greatly reduced, both system-wide and on a personal level. Using a light-weight database abstraction layer allows you to trim your code to your exact needs, making it more efficient and reducing a lot of the overhead (disk access, database connections, etc.). It also greatly reduces the number of hours you have to spend troubleshooting problems by placing core functionality in a single place that you can control, know, stabilize and learn to trust as it matures.
It’s a good learning experience. There’s no better way of learning how to do something than by doing it yourself. Do you aspire to be a good programmer (or a better one if you are already good)? Then dig in. Get your hands dirty (metaphorically speaking).
A good database abstraction layer (like this one!) can manage schema updates for you. This is especially important for projects that have ongoing development.

Is a system like this right for everyone in every case?

No. If you are working on a project that has pre-built software or libraries that meet your specific needs AND you are 100% confident that there will be no maintenance of that project over time, using database abstraction is probably not the right choice. You’ll sink time into something that won’t provide a long-term benefit. In all other cases, we would say, “Yes.”

We hope that you’re convinced by now that this is the right direction for you, or at least that it is worth researching more in-depth.

Our Goal

Our purpose for creating this system is to have a consistent, repeatable and stable database abstraction layer that we can count on using across the platforms for which we develop regularly. At the moment, that means PHP with MySQL, PostgreSQL and Oracle as well as Android and iOS with SQLite. Our purpose for writing about it now is to share our experience with the development community in hopes of seeing better use of good programming practices. It would be nice to see some community support in making this a more generally suitable set of tools (as opposed to being specifically for the needs of APP(ideas) and its customers) and in promoting our great apps.

A Few Final Technical Details

The source code for the APP(ideas) database abstraction layer is free for any use under the LGPL v3 license (click here) and full source code is available on GitHub (click here). The code comes with no warranties and is provided as-is.

Included with the source code is API-like documentation that was generated with PHPDocumentor. We are also including the results of unit testing the code. An explanation of the layout of the files is included within each section of these blog posts.

Use of this code requires a little bit of knowledge and a system that can run (and/or build) applications for the target platform (PHP 5 or newer with MySQL, PostgreSQL or Oracle, Android 4.0 or newer or iOS 6 or newer with SQLite). Take a look at the installation instructions in each of the following sections for complete details.

For those who like to nitpick the Object-Oriented practice of this code… There is quite a bit of code repeated in each of the database implementations, and we would normally write extended classes when such is the case. We chose to implement an interface rather than extend a base class in this particlar case for two reasons: 1. By design, this system will only be accessing one DBMS per installation — possibly ever. In a typical installation, unused files may be removed from the system, and 2. We wanted to enforce a contract related to what a database must implement.

Another thing that I would like to point out is that this database abstraction lacks something that is present in most of the others that exist in PHP, and that is a “save” wrapper to the query method. It is easier to create an array of key/value pairs and send them into a method for saving in the database than it is to provide a query in SQL, but in the context of a complete application, the database abstraction layer is not the right place to provide that functionality. When you consider your application in its entirety, the objects that represent and implement instances of your data are the correct place to include “save” functionality because it is there that you can include logic for such things as input and sanity checking. Unlike other PHP-based database abstraction layers, our approach to database abstraction is not that your entire application is the data, it is rather that having an easy and stable means of accessing your data is the most important piece of the foundation of a complete application. Your data is your data, and your representation and assignment of meaning to that data should be separate (and separable from) the database abstraction. This is one of the several sources of unnecessary bloat in PHP’s existing database abstraction layers, and our strong opinion is that including such functionality at the data layer is a mistake. It is opposed to the very foundation of the Model-View-Controller method of Object Oriented Programming. You will see a much better approach to “save” functionality when we get to our follow-up series about Schema Adapters.

We cannot provide technical support, but leave comments if you have questions and hopefully we and other users of this web site will be able to offer answers. Our general discussion forum is also a good place to go with questions. Click here to go to the forums.

Let’s Get Started

Without further ado, here is part one: PHP and a few database systems.

Links to posts in this series:

Introduction (You’re there now)
Part 1: PHP
Part 2: iOS
Part 3: Android (Coming soon)
Follow-up series: Schema Adapter (Coming soon-ish)