Why your database version control strategy sucks and what to do about it, Part I

 

At the very least, it’s probably not nearly as good as it should be.  As a pathological job-hopper who sees one company after another wasting hidden hours and days on insufficient database change control strategies, I’m sick of it.

So how does my company handle database changes?  I’ve asked this of potential employers many times, and I usually just get blank stares, or a vague answer along the line of “uh… with scripts?”  Sometimes, if they are honest, they just reply “poorly.”  There’s not even an explicit Joel Test for it.  Considering how bad the situation is at most companies, the real test is not whether their database change management is any good, but just whether they are willing to recognize how problematic it really is.

Given how much thought and effort goes into source code control and change management at many of these same companies, it is confusing and a little unsettling that so much less progress has been made on the database change management front.  Many developers can give you a 15 minute explanation of their source code strategy, why they are doing certain things and referencing books and blog posts to support their approach, but when it comes to database changes it is usually just an ad-hoc system that has evolved over time and everyone is a little bit ashamed of it. 

Hopefully we can get to the bottom of why this is and come up with a few ways to make it better.  It’s not like the database is important, is it?

 
I like you, but not in that way

So why don’t developers think about databases the same way that they think about source code? 

The most obvious reason is that database changes are fundamentally different in many ways.  While source code is usually just a collection of files that recompiled, versioned, and released at any given time, databases are much more temperamental.  They have existing data, and they have history.  Sure, source code has history, which you can review for reference purposes, but in databases the historical lineage of a table is actually very important.  In C#, if you added a field to a class, and then a week later someone else changed the type, and then a month later someone else changed the name, it usually doesn’t really matter too much when or in what order those changes happened, all that matters is the current state of the code and that everything builds together and works in its latest state.  However, if you were to do the same thing to a new field in a table, it definitely makes a difference, because there are data implications at every step of the way.  This alone scares a lot of developers away from maintaining the database.

To many developers, there is something fundamentally uncontrollable about databases.  They don’t fit into the safe model that we’re use to.  Managing those changes is definitely introduces new challenge, and many developers just don’t want to be bothered.

 

Doing Business As

Another major reason for the difference is just cultural.  Developers want the database to work well, and they may even like tinkering around with some stored procedures from time to time, but at the end of the day they like to be able to wash their hands of anything that slightly resembles being a “DBA”.

When you compare the DBAs (database administrators) and SCMs (stupid code monkeys), often times there is just a different way of looking at the world.

Developers often see themselves as the general of a utilitarian army of code that they build and train and then order off to carry out their mission, and they usually won’t hesitate to trash it all and replace it if it proves itself unable to complete the mission.  DBAs on the other hand are used to dealing with gentle lion that could easily lose it temper and kill everyone in the room if it’s not treated with kindness and respect.  Developers often have the option to wipe the slate clean and start over, and usually want to, especially when they are dealing with someone else’s code.  DBAs however are stuck with the original version and they need to keep it alive and make it work until the point that the data can be migrated to a better system, and we all secretly know that is never going to happen.

 

Code source control is an over-ripe banana

As developers, we are awash with tools and techniques for managing our source code changes.  We have a seemingly endless number of source control platforms to choose from, each with more features than we could every possibly need or even understand, and some people build entire careers around defining complicated branching and merging strategies for massive codebases.  I figured for a while that the ideas of practical source control where pretty well nailed down, but then you read Eric’s Sink’s posts on Distributed Version Control  Systems, and get an idea of how quickly the field is continuing to evolve.

At some internal IT department, a group of developers is toiling away on a set of dull yet fabulously enterprise-y software that five people will use.  Jimmy checks his code changes into his company’s version control system of choice, where it is automatically held at the gates until it is code-reviewed by a senior developer, and then it is checked into the team’s current release branch.  Meanwhile the continuous integration build server will download, compile, package, and unit test the code to make sure that Jimmy hasn’t broken anything, and that nobody else’s changes broke Jimmy.  The package code is then migrated through an integration environment, QA environment, UAT environment, and staging environment on its way to production.  All the while, as the changes are validated from one environment to another, version numbers are automatically assigned to allow anyone to trace back the exact revision for a given build, and the corresponding code changes slowly work their way towards the code-release promised land, the Main Trunk.  Those branches can get pretty damned complicated, even when everything is going smoothly, and it never goes smoothly.

Code branchs can get  

Hopefully this is a process that Jimmy’s company evolved out of mixture of necessity and forethought over the years.  The other, less attractive, and far more common scenario is that the company hired an astronaut trapped in a developer’s body, bored with his work and not bound by any sense of urgency, who assured the company that he was going to implement a lot of big complicated processes because that’s just want professional companies do. 

In the end, a whole lot of people are paying a whole lot of attention to managing the source code.  Hopefully at your company, you are paying attention to this to.

 

Databases, the redheaded stepchild of source control

Now ask yourself, how does Jimmy handle database changes? 

For example, say that all database changes need to be checked into an isolated directory in source control; after all we’re not savages.  However, since they don’t really “build” a database, that a directory which is ignored by their continuous integration server.  This in turn breaks the unit tests that are pointing to the integration database server, so Jimmy then needs to run those scripts manually in the integration environment. 

In this process, Jimmy sees other database scripts that were checked in recently in the same source control directory, but he has no way to know which scripts have already been applied to the integration server.  For the briefest moment, Jimmy considers applying those scripts as well, just to make sure that the integration server is fully up-to-date, but then he realizes that he can’t be sure which scripts have already been run without manually comparing the schema and scripts to see which have been applied, and this would make Jimmy the defacto owner for any issue that arise because of it.  With his own tasks and deadlines to worry about, Jimmy doesn’t have the time or patience for this silliness, so he just deploys his scripts, forgets about the others, and hopes for the best. 

It’s worth noting here that this is the kind of things that can silently kill software quality.  A motivated developer just tried to make things a little better, but the process was so frustratingly inadequate that it was impractical for him to do so.  Software companies depend on their developers taking the initiative to improve things, and when they are discouraged from doing so, either by person or by process, the company will slowly slide backwards into mediocrity, and it will drag every developer’s morale with them.

 

Now once Jimmy makes the changes to the integration server database, that also breaks some other developers that have been using that server for their development.  Those developers now need to stop and download the latest code changes to get back in sync, cursing Jimmy’s name the whole way.

Anyhow, during the next deployment to QA, someone needs to remember that these specific changes need to get deployed.  Since there is no defined strategy for tying database changes to code changes, every time code is deployed there is a little bit of confusion around exactly which database changes need to be released, which were already released, and what order the scripts need to be run in.  Jimmy is getting upset.

Another darker possibility is that instead Jimmy needs to submit his changes to the database review board, a collection of detached idealists, college professors without the college, who will criticize every aspect of the scripts in order to justify their existence, but will not really offer any true value because they don’t understand the business problem that needs to be solved, nor do they appreciate the application considerations beyond the database that need to be satisfied.

One of the long term impacts of this is that Jimmy will look for any possible way to accomplish what he is trying to do without making database changes, because, in his colorful youthful vernacular, “making changes to the database is a sodding bitch.”  And if he does indeed need to change the database, he’ll try to accomplish it just by changing stored procedures, because changing table schemas are even worse.  In the end, he’s definitely not trying to find the appropriate solution to the problem; instead he backed into a situation of being forced to find a “good-enough” solution that will minimize his hassle, regardless of the downstream impacts. 

From now on, he’ll look for any way he can accomplish it by only changing stored procedures and not changing the underlying schema.  If he’s lucky (sort of), he’ll find a way that he can just kludge the stored procedures to work around the problem for now, and let it be someone else’s problem to solve later.  He has long since given up trying to find the “right” solution, because it is so exhausting the current state of things is so busted up that it’s not even worth trying anymore.

Further complicating the issue, some developers and DBAs make one-off changes in the QA and production environments without going through source control.  Either they need to make an emergency fix in one of the environments and forget to go back and add the scripts to source control, or worse they just don’t believe that databases should be under source control (I’ve seen this attitude far too often from some DBAs, because they can’t stand the idea of being forced to use a source control system managed by some other developers, just so that they can make changes to their own database).  Pretty soon, every environment is a perverted branch of the one true database, and trying to identify why a test works in one environment and fails in another quickly becomes a nightmare.

 

Some day, things will be different!

So what’s the solution?  Well, in my humble opinion, we need something like this:

  • The database must be versioned, so that it is easy to tell which changes have applied and when they were applied.
  • All database changes must be checked into source control.
  • All database changes can tied to the code changes that they affect, ideally checked into source control as a part of the same changeset transaction.
  • The database changes are built along with the code changes.
  • The database changes are deployed along with the code changes.
  • The continuous integration server must be able to build and update its own copy of the database, so that it can run automated tests of code and scripts that are checked in at the same time.

I’ve seen some companies that have had home-grown utilities that come close to accomplishing this, but in the end they all fell just a little bit short, which is not too bad.  However, the vast majority of companies I’ve seen were not even in the ball park.

Some of you are probably asking, “doesn’t Visual Studio Team System” do this?  Yeah, I think so.  Probably, but who knows.  Honestly I tried working with it a few times, and it caused me nothing but problems.  Sure, I could spend a lot of time mastering all the quirks, but I’m looking for something a little bit more accessible here.  The underlying concepts are hard enough; we need an approach that simplifies it, and I just don’t think that VSTS accomplishes that.  More importantly, and also along the lines of accessibility, VSTS costs a fortune, and so most developers will never have access to it, so I’d like to fine something that can work for the other 95% of developers out there that are stuck using reasonably-priced tools.

What about Red Gate’s SQL Compare and SQL Data Compare products?  Absolutely, they tools are indeed awesome products that can prove invaluable in many cases.  However, they are often leveraged once an unknown number of database changes have already been made by an unknown number of people.  However, I think the better solution, whenever possible, is to step back and track the changes as they come in, proactively considering the changes, just like we do for regular source code, which allows for far more robust change management and accountability for the database.

So that’s the idea.  In the next few weeks I’ll have some more posts on this topic, getting into some more specifics about how to solve this.  Hopefully we can find a simple, easy to manage solution, that developers can adapt to quickly, and which promotes database changes to the first-class citizen of the change management world that it deserves to be. 

29 thoughts on “Why your database version control strategy sucks and what to do about it, Part I

  1. Marius CC

    No. VSTS don’t help you in this aspect. It’s a huge beast with a lot of bugs, so better stay away from it.
    The price is 0 if you are a MS shop (ie. you have MSDN).

    Reply
  2. Taylor

    I’ve been successfully using this model for years:

    1) Store some kind of database version inside the database
    2) Don’t use shitty databases that don’t support transactions
    3) Create a directory for DB updates that works something like this:

    db_updates/
    milestone_1/
    01-drop-headaches.sql
    02-die-another-day.sql
    03-convert-id-to-float.pl
    milestone_2/

    trunk/
    01-dgaf.sql
    01-order-dont-matter-here.sql
    qa/

    4) Log which scripts were executed against the database, inside of the database

    When we release, we move scripts from trunk to a milestone directory and commit. Then we execute the scripts in order, within a single transactions. Note that developer conflicts are usually not a problem unless they give their files the exact same name.

    Reply
  3. Domen Kožar

    Great post! I help at development of SQLAlchemy-migrate, which is Python ORM migration library. A quite few users came to our mailing lists saying that linear versioning doesn’t suite SCM tools versioning.

    Example:

    When two developers commit database changes in distributed SCM, conflicts do happen.

    Date and/or dependency oriented revisions do not help much at this area.

    I still don’t see the best solution, maybe linear versioning is the best I can think of since versions are explicit and no bugs can be produced without prior knowledge.

    Reply
  4. Pingback: Tweets that mention The Mooney Project » Blog Archive » Why your database version control strategy sucks and what to do about it, Part I -- Topsy.com

  5. Meff

    VSTS 2010 (And possibly 2008) has the database projects, I’m using it currently to version control the DBs.

    Check the Data Dude blog:

    http://blogs.msdn.com/gertd/

    VSTS holds all database objects as create scripts, and you can import existing databases via the schema comparison (Database -> Project). The create scripts are then versioned as if they were source code.

    It’s very good, and I’m pleased.

    Reply
  6. shawn

    Good article, good topic, but could be better written. I would cut out the parts about your personal beliefs on DBA and developers (SCM’s). I’m not a big fan of the mysterious Jimmy guy who appears seemingly out of no where, either. Why not just keep the article informative and straight to the point? I think all the side stories just distract from the real content.

    I agree with you about how poorly database changes are handled. One of my personal strategies is to not use any stored procedures / functions. I keep it all in the application code and use an ORM. This way when I regen the DAL classes from the schema, I can see some new compile errors. This of course won’t save you from non nullable columns that were added, and other situations. Schema changes and the data itself are still a huge pain.

    Reply
  7. Mike Mooney Post author

    Thanks for the feedback everyone, I appreciate it. Some of the utilities you all suggested are definitely interesting. A lot of them we’ve looked into and they didn’t really capture what we were trying to accomplish. I review each and address them in a future post.

    Reply
  8. Stephanie

    If you like Red Gate’s SQL Compare and SQL Data Compare, please check out SQL Source Control, http://www.red-gate.com/products/SQL_Source_Control. We are trying to solve this problem exactly by enabling access to source control directly from SQL Server Management Studio while db developers continue to work directly on their db instead of offline script files. This will track the day to day changes and then you can still use SQL Compare and SQL Data Compare when it comes time to deploy these changes to your Test/QA/Prod environments. There is a free early access release available on the website to try. We’d love to know your thoughts.

    Thank you!

    Reply
  9. Mark

    Its always funny to watch a dev team spend hours unit testing, a QA team spend hours on functional tests.. and it all goes down the drain when it rolls to production.. because someone forgot some manual DB change.

    It really does need a formalized process.. and one that is simple..not some automated wizard voodo that only the MS Visual Studio core developers understand…

    Reply
  10. Emily

    Its always funny to watch a dev team spend hours unit testing, a QA team spend hours on functional tests.. and it all goes down the drain when it rolls to production.. because someone forgot some manual DB change.

    It really does need a formalized process.. and one that is simple..not some automated wizard voodo that only the MS Visual Studio core developers understand…

    Reply
  11. someone

    What a great post. You took the words right out of my mouth…

    I am very disappointed with the current offerings. Most tools simply manage incremental scripts; a very simple-minded approach, in my opinion. Let me explain…

    How would you like to manage your source code by sharing diff patches with your co-workers? I’ll give you my patch, (I added 2 lines here, and removed 10 lines there) and you can get me your patch, and so on, and by sequencing them properly, we should be able to reconstitute a workable set of text files. My brain hurts just thinking of branching and merging using this approach..

    Crazy, right? So, does sequencing together a list of scripts (patches) using some script-runner/tracker to reconstitute a workable database make any sense? I propose that we version database artifacts the same way we version our source code files (it’s a little more complicated than that, i know).

    Reply
  12. Christophe

    Excellent article ! I wish that people start to be honest about how big this problem is, so that we will all start to solve it… as a problem we don’t want to see is a problem that doesn’t exist and that will never get solved 😉

    We started to build a database development platform to tackle this exact problematic. It is open source (GPL), cross-vendors (Oracle, MySql, PostgreSql, DB2, and MSSQL support is on the way), cross-platforms (Windows, linux, mac). Its goal is to provide a database development environment based on the version-control paradigm : you develop on a version-control repository, you synchronize it with any target database, you package your changes into deliveries built from the differences between any 2 versions of your model.

    As a development environment, everything has been designed to reduce development time so you ‘ll get data model diagrams, dependencies management, powerful sql clients, auto-completion, hyperlinks, etc. The whole environement is based on Eclipse RCP.

    We are currently working on the continuous integration aspects so that it will soon be possible to assemble any database version automatically (from scratch or by upgrading any already-existing database).

    I suggest you join the adventure and send us your feedbacks !
    Discover the project here :
    http://www.nextep-softwares.com
    Documentation can be found on the wiki here :
    http://www.nextep-softwares.com/wiki

    Tell me what you think,
    Christophe.

    Reply
  13. Pingback: The Mooney Project » Generating Azure-Friendly SQL Scripts

  14. Hans

    If you are using Oracle then you might find the tool Lure (http://earthly-software.com) of interest.

    Lure extracts all database source code to a set of source files representing the objects in the database. The file names match the object names in the database so you can import these files into a version control repository (e.g. SVN) and subsequently use the version control tool to tag, branch and merge your source code like any other set of source files.

    Lure can also automatically deploy all the extracted source code to another Oracle database. In doing so it takes into account objects already present in the target database and installs just the differences.

    For table changes Lure automatically creates a series of patch files for each table, containing the ALTER statements for the table. Developers can customize these patch files if necessary to ensure data is migrated in the desired way. During deployment Lure automatically determines (based on the current state of the table in the target database) which patch files to install and in what sequence. As a result Lure deployment is completely automate and re-runnable.

    As part of the deployment process Lure verifies the final state of the database and will clearly indicate if any differences remain. Lure was designed from the start to make continuous integration a real possibility.

    We believe that Lure is the best answer to the challenges of database change management as elaborated in this article.

    Reply
  15. onemojofilter

    If you are an Oracle developer than you’d recognize that much of what “Lure” offers can already be done with DBMS_METADATA which is included with the RDBMS already.

    If you work in a shop that makes a distinction between a “developer” and the database then you’re ignoring a bigger problem. Namely that the database is considered to be a big black hole that is a mysterious nuisance at best and a scary unknown quantity at worse. Indeed, this sentence is quite telling: “Developers want the database to work well, and they may even like tinkering around with some stored procedures from time to time”.

    It sounds as if people that code in a database are not considered developers in this article.

    With that said, it is very true that most shops tend to roll their own when it comes to deployment of database changes to production. Some places come close to a repeatable experience while others spend literally entire nights getting their product on-line. It all really depends on who you employ to code in the database. Database professionals or “developers” that dabble in procedures but don’t really understand what they’re doing.

    Reply
  16. Pete

    The database is a critical part of your application. As well as holding code in the form of stored procedures, functions, views and triggers it also holds data. Many applications are highly configurable using metadata tables to manage application functionality. In such applications this data is as important as the code, however all too often this data is not under any form of source control.

    The result is that changes often get overwritten, or the wrong data is deployed especially as organisation have little choice other than copying databases to manage environments. This often creates an error prone, inflexible and slow development lifecycle, with little or no audit capability and limited options for backing out changes.

    DATASTAR (www,data-star.co.uk) is a commercial development tool for managing database components through the development lifecycle. DATASTAR aims to componentize data such that it encapsulates a business function. DATASTAR’s unique selling point is its ability to allow organizations to apply their standard source control methodology to data. With DATASTAR you descibe how you want to componentize your data and then use the software to generate scripts which encapulates this data as components. The scipts are used to deploy to other environments so these scripts would typically be placed under version control. However this is just one aspect of the development lifecycle, DATASTAR will integrate with your version control system making it easy to package multiple changes into a release and enable you to deploy to multiple enviornments quickly and simply.

    Reply
  17. Lawrence Bottorff

    I can follow all your logic here, but still not really believe you’ve got a handle on SQL relational database “version control”. Ultimately, I don’t think it’s possible. Sometimes the morphing of the data *is* the app, hence, version control of the code or database scripts is quite secondary. For example, imagine a jigsaw puzzle where the pieces are constantly changing shape, as well as merging with neighbors. How do you keep a record of these changes? How would you “run it backwards” to a previous puzzle state? This is basically what happens at every county assessor’s office with land parcels and land ownership. You often want to know what the puzzle (or your county’s land parcel/ownership) looked like at some point in the past.

    Sure, with some company’s product inventory, it’s not so critical or even interesting to know at some given time in the past, what was on what shelf. Database snapshots as backups can give you a decent picture, then just (weighted) average between snapshots. But these days, lots of things are like my jigsaw puzzle example. If they’re not directional in time, then cyclic, i.e., the jigsaw puzzle goes through a cycle of changes, then returns to some starting point.

    Reply
  18. jamesrod214

    Honestly, I am not an expert when it comes to database information. I know that we use this software for work, but that’s about it. What is the best way to compare databases? How can you tell if you need an upgrade?

    Reply
  19. Kyvos

    Fab Article with great insights! and nice to know a quick reference guide to introduce database version control strategy and data concepts and key terms. Many of the definitions in this article are found in which will be helpful for me. Thank you!

    Reply
  20. gnarlo

    Developers make terrible databases. Hire a database guy that knows how properly organise and segment the data and build the app off that. Have change control.
    Developers making it up as they go along (aka “agile”) and pressing a button to generate an RDBMS produces sucky schemata with just fucking terrible optimisation.
    Keeping liquibase XML (I mean, wtf?) around under version control is an utter abomination. Have the DBA make controlled uniform dump cuts and go get that whenever you need a refresh.
    I cannot believe big organisations bearing a ton of risk have fallen for this idea some kid on his first or second gig should have them and their data catalog by the bollocks. Its tantamount to malfeasance.

    Reply

Leave a Reply to Mark Cancel reply

Your email address will not be published.