A while ago I had some posts about how to set up simple backups of SQL Azure to make up for a few holes in the tooling (here and here).  I recently ran into a the same issue with RavenDB, and it required stringing a few pieces together, so I figured I’d write up the steps.

TL;DR

Yet again I started out to make a quick how-to, and ended up going into a lot of detail.  Anyhow, here’s the short version:

  1. Download s3.exe from s3.codeplex.com
  2. Run this:
Raven.Smuggler.exe out http://[ServerName]:[Rort] [DatabaseName].dump --database=[DatabaseName]
s3 auth /nogui [AccessKeyID] [SecretAccessKey]
s3 put /nogui [S3BucketName]/[TargetS3Folder]/ [DatabaseName].dump

 

What is RavenDB?

RavenDB is a flat out awesome document database written in .NET.  It’s sort of link MongoDB or CouchDB, but very Windows- and Microsoft-friendly, and has much better ACID and LINQ query support than your average document database.

Whether a document database is right for your project is a complicated question, well beyond the scope of this post, but if you decide you need one, and you’re doing .NET on Windows, RavenDB should be the first one you check out.

While document databases are not ideal for every situation, I’ve found them to be very good for message based applications, which pump data messages from one queue to another.  I’ve used RavenDB as the default primary data store for SrirachaDeploy and MMDB.DataService, and besides a few bumps in the road, it’s worked great.

Types of backups in RavenDB

RavenDB offers two flavors of backups.  One is the official “Backup and Restore” feature, which is very similar to a SQL Server backup/restore, including advanced options link incremental backups.  This does a low-level backup of the ESENT files, including index data.  Restores are all-or-nothing, so you can’t import a file if you’re going to overwrite data in the process.

The other type is the the “Smuggler” feature, which is more of a data import/export utility.  This generates an extract file that contains all of the documents, index definitions, and attachments (controllable by command line parameters).  It’s worth noting though, while Smuggler will backup the index definition, it does not backup the index data, so after you import via Smuggler you may have to wait a few minutes for your indexes to rebuild, depending on your data size.  Since it’s just a data import, you can import it into an existing database, without deleting your existing records, and it will just append the new records, and override existing records if there is a matching ID.

The simplest way to get started with either option is to try them out in the RavenDB user interface.  The RavenDB UI is continually evolving, but as of this writing, under the Tasks section there are Import/Export Database options that use Smuggler, and a Backup Database option as well.

image

Personally, I prefer Smuggler.  It’s very easy to use, the defaults do what I want them to do most of the time, and it can do a low-impact data import to an existing database without blowing away existing test data.  Also, because backup/restore feature uses some OS ESENT logic, it has some OS version portability limitations.  In the end, I usually don’t want anything too fancy or even incremental, the first and foremost backup I want to get in place is “export ALL of my data in the most portable format possible on a regular schedule so I can always get another server running if this one fails, and I can restore it on my machine to recreate a production issue”, and Smuggler has fit that bill nicely.

RavenDB Periodic Backup

RavenDB does have a very cool feature called “Periodic Backup”.  This option actually uses the Smuggler data export functionality, and runs incremental backups and uploads them to your hosting provider of choice (File system, Amazon Glacier, Amazon S3, or Azure storage).

image

The cool thing with this feature is that it’s easy to setup up without any many confusing options.  My problem with this is  that it doesn’t quite have enough options for me, or rather the defaults are not what I really want.  Rather than doing incremental backups on a schedule, I want to be able to do a full backup any time I want.  Unfortunately it doesn’t (yet?) offer the options to force a full backup, nor to force a backup on demand or a a specific time of day.  I’m guessing that these features will continue to improve over time, but in the mean time this is not really what I’m looking for.

Smuggler Export 101

So how to get started with Smuggler?  Of course, visit the documentation here, but here’s the short version for everything I usually need to do.

First, open a command line. 

Yes a command line.  What, you don’t like using the command line?  Oh well, deal with it.  I know, I have hated the command line through much of my career, and I fought against it, but complained about it.  Then I gave up and embraced it.  And guess what, it’s not that bad.  There are plenty of things that are just plain easier to do in a command line and don’t always need a pointy-clicky interface.  So please, just stop complaining and get over it, it’s  all a part of being a developer these days.  If you refuse to use a command line, you are tying a hand behind your back and refusing to use some of the most powerful tools at your disposal.  Plus we are going to be scripting this to run every night, so guess what, that works a lot better with a command line.  I’ll throw in some basic command line tips as we go.

Anyhow, in your command line, go to the Smuggler folder under your RavenDB installation (usually C:\RavenDB\Smuggler on my machines).

Tip: You don’t have type the whole line.  Type part of a folder name and hit TAB, and it will autocomplete with the first match.  Hit TAB a few times and it will cycle through all of matches.  Even us a wildcard (like *.exe) with TAB and it will autocomplete the file name.

Type Raven.Smuggler.exe (or Raven + TAB a few times, or *.exe + TAB) to run Smuggler without and parameters, and you’ll get some detailed instructions.

image

The most common thing you want to do here is backup a whole database to a file.  You do this with the command “Raven.Smugger.exe out [ServerUrl] [OutputFileName]”.

Note: the instructions here will dump the System database (if you use http://localhost:8080/ or something similar as your URL), which is almost certainly not what you want.  It’s not entirely clear the documentation, but the way to export a specific database instance is to use a URL like “http://[servername]:[port]/databases/[databasename]”, or use the –-database variable at the end.  For example, to backup my local demo database, I would use the command:

Raven.Smuggler.exe out http://localhost:8080/databases/demodb demodb.dump

or

Raven.Smuggler.exe out http://localhost:8080 demodb.dump --database=demodb

 

And off it goes:

image

Depending on the size of your database, this may take a few seconds to a few minutes.  Generally it’s pretty fast, but if you have a lot of attachments, that seems to slow it down quite a bit.  Once it’s done, you can see your output file in the same directory:

image

Tip: Are you in a command line directory and really wish you had a Explorer window open in that location?  Run “explorer %cd%” to launch a new version of Explorer defaulted to your current directory.  Note: sometimes this doesn’t always work, like if you’re running the command line window in administrator mode.

Yes, that’s not a very big file, but it’s a pretty small database to start with.  Obviously they can get much bigger, and I usually see backups gettting up to a few hundred MB or a few GB.  You could try to compress it with your favorite command line compression tool installed (I really like 7-Zip), but it’s not going to get you much.  RavenDB already does a good job of compressing the content while it’s extracting it via Smuggler.

Amazon S3

Next, you have to put it somewhere, preferably as far away from this server as possible.  A different machine is a must, a different data center or even different hosting company is even better.  For me, one of the cheapest/easiest/fastest places to put it in Amazon S3.

There are a few ways to get the file up to S3.  The first option is to upload it straight from Amazon’s S3 website, although that can require installing Java, and you may not be into that kind of thing.  Also, that’s not really scriptable.

Or you could use S3 Browser, which is an awesome tool.  For end user integration with S3, it’s great, and I’ve gladly paid the nominal charge for a professional license, and recommended all of my S3-enabled clients to do the same.  However, while it’s a great UI tool for S3, it is not very scripting friendly.  It stores your S3 connection information in your Windows user profile, which means if you want to script it you need to log in as that user first, setup the S3 profile in S3 Browser, and then make sure you run the tool under that same user account.  That’s a lot of headache I don’t really want to worry about setting up, much less remembering in 6 months when I need to change something.

One great S3 backup tool is CloudBerry.  It’s not free, but it’s relatively inexpensive, and it’s really good for scheduling backups of specific files or folders to S3.  Depending on your wallet tolerance, this may be the best option for you.

But you may want a free version, and you’re probably asking, “why is this so hard to just push a file to S3?  Isn’t that just a few lines of AWSSDK code?”.  Well yeah, it is.  Actually it can be quite a few lines, but yeah, it’s not rocket science.  Luckily here is a great tool on CodePlex that lets you do this: http://s3.codeplex.com/.  It’s a simple command tool with one-line commands like “put” and “auth” to do the most simple tasks.  To push your new file to S3, it would just be:

s3 auth /nogui [AccessKeyID] [SecretAccessKey]
s3 put /nogui [S3BucketName]/[TargetS3Folder]/ [FileToUpload]

 

So if we wanted to upload our new demodbn.dump file to S3, it would look something like this:

image

And hey, there’s our backup on S3. 

image

 

So we now have a 3-line database backup script:

Raven.Smuggler.exe out http://[ServerName]:[Port] [DatabaseName].dump --database=[DatabaseName]
s3 auth /nogui [AccessKeyID] [SecretAccessKey]
s3 put /nogui [S3BucketName]/[TargetS3Folder]/ [DatabaseName].dump

 

Just put that in a batch file and set a Window’s Scheduled Task to run whenever you want.  Simple enough, eh?

Restoring via Smuggler

So now that you have your RavenDB database backed up, what do you do with it?  You’re going to test your restore process regularly, right?  Right?

First you need to get a copy of the file to your machine.  You could easily write a script using the S3 tool to download the file, and I’ll leave that as an exercise for the reader.  I usually just pull it down with S3 Browser whenever I need it.

So once you have it downloaded, you just need to call Smuggler again to import it.  It’s the same call as the export, just change “in” to “out”.  For example, to import our demodb back into our local server, into a new DemoDBRestore database instance, we would say:

Raven.Smuggler.exe in http://localhost:8080 demodb.dump --database=DemoDBRestore

And we would see:

image

And then we have our restored database up and running in RavenDB:

image

Conclusion

Now I’m not a backup wizard.  I’m sure there are better ways to do this, with incremental builds and 3-way offsite backups and regular automated restores to a disaster recovery site and all sorts of fancy stuff like that.  The bigger and more critical your application becomes, the more important it becomes to have those solutions in place.  But day 1, starting out on your project, you need to have something place, and hopefully this helps you get started.

If you’re following along, I’ve recently named this effort “Parallax” for long and windy reasons.  You can read all about it here.

I’ve copied the project up to CodePlex, at http://parallax.codeplex.com.  Feel free take a look.

There’s not much there yet, just a few database tables, but it can get us started talking about the approach we’re taking to the database.

 

Here’s what we have so far:

ParallaxDB_1

 

There are a few things going on here:

Singular Table Names

I’ve used singular table names for all tables.  This is a long and well-worn debate, whether to use singular or pluralized table names.

For years, I was a strong proponent of the pluralized table names.  At the time, it was probably mostly because that’s what I had used before, and like most people I tend to come up with rational arguments to support my subjective preferences, because cognitive dissonance is yucky.

Plural Table Names Are Good

One of the main reasons for plural table names, at least for me, is that from one perspective it matches what the table actually represents.  It is a container that has a list of work items, or a list of users, or a list of whatever.  If we’re going to store a list of work items in a container, by golly we should call that thing “WorkItems”.  It’s the same logic that is applied to object models; if your Person object contains a list of addresses, you’re going to call the property something like “Addresses” or “AddressList”, not “Address”.  The idea of describing a whole list of customers as just “Customer” just rubbed me the wrong way.

Singular Table Names Are More Gooder

However, there are also several benefits to using singular table names.  First, it avoids a lot of confusion about the proper way to pluralize a name. 

Should the table that contains a list of person records be called “People” or “Persons”?  “People” is a more common name, but “Persons” has the benefit of simplicity, because it is just appending an “s” to the end.  I’ve heard this argument go back and forth, and it doesn’t matter who’s right or wrong because the whole fact that you’re discussing it just a damn waste of time.  You are introducing artificial and unnecessary complexity to your project, which you and every other developer will have to worry about for the life of the project.  Hidden productive costs like this add up significantly over time, and they suck the life out of your developers, of often times without them even realizing why they hate their job.

 

Also, assuming you follow some sort of standardized naming convention for primary and foreign keys, this can improve the readability and writability of your code, because you will not be mixing singular and plural names.  To me, this:


SELECT * FROM WorkItem
INNER JOIN WorkItemType
ON WorkItem.WorkItemTypeID = WorkItemType.ID

 

Flows a little better this:


SELECT * FROM WorkItems
INNER JOIN WorkItemTypes
ON WorkItems.WorkItemTypeID = WorkItemTypes.ID

 

Lastly, if you ever want to have any scripts for performing automated maintenance or generate code, it’s a lot easier to only have to deal with a single tense to deal with and not have to worry about pluralized names.  We’re going to be doing at least a little of that in Parallax.

Keep It Consistent

So like in most things, there are valid reasons for either approach, and in the end it probably doesn’t matter too much which you choose.  However, what does matter is that you do choose and stick to it.  If you start switching between plural and singular names, it is one of the first steps towards chaos.  Your developers will have to continuously waste a unacceptable of time to stop and think “was that table called Person, or Persons, or People, or Peoples, or Peeps, or…?”

 

Primary and Foreign Keys

When I create a table, the first thing I do is created a clustered integer primary key named “ID”.  99% of the time, I do it every time.

Naming Conventions

Could I include the table name in the primary key column, e.g. “WorkItemID” instead of “ID”?  Sure, but I chose not to, again for subjective reasons.  I feel that the primary key is a really important field on the table, and should jump out in a query as the single most important and identifying field in the table.  When you look at a query and are trying to visual exactly what it is doing, I feel that it makes it a just a little bit clearer to see the primary keys first and work out from there.  Of course, one shortcoming is that it can get a little unclear when you start using vague subquery aliases, like “subq.ID”, but I think that is just another good reason to not use vague aliases in the first place.

For foreign keys, I just add “ID” to the referenced table name, such as “WorkItemID”. This way, when you’re writing a query, you never really have to stop and think what the field name is, or worse yet, go look it up in schema.  Now there are cases that this doesn’t work, like when you have multiple independent foreign keys in table referencing the same table.  For example, the WorkItem table references the AlmUser table for two different fields, the reporting user and the assigned to user, and we’ll be adding several more shortly.  In this case, we can’t call them both “AlmUserID”, so we’ll add a prefix to them to identify their purpose, “ReportingAlmUserID” and “AssignedToAlmUserID”.  This can also be used in cases where the reason for the key is not immediately evident, but this starts down a slippery slope of naming things certain ways just because you feel like it on a certain day, which should be avoided as much as possible.

Clustered Index

What does the clustered part mean?  A table can have a single clustered index, and that means that every row in the table will actually be physically stored in that order, which means that retrieving records by that index faster than a standard index.  The tradeoff is that inserting new records into the middle of the index can be slow because it affects the physical layout of the table.  However, this works especially well for auto-incrementing integer fields, because the new records always go at the end of the table and the end of the index. 

Natural/Intelligent Keys

This extra, auto-generated ID column approach is called a surrogate key, which means that it is not derived from any application data.  So why reserve a whole extra field for this?  Another approach is to use a “natural” or “intelligent” key, which is something a naturally occurring piece of data that is sure to be unique across all the rows.  One of the more common example you see is using Social Security Number as the key for an Employee table. 

There are several reasons not to use natural keys, but the first and most important to me, is: Get Your Damn Business Rules Out Of My Schema.  I’m designing a database here, and my primary concern is that the data can be reliably and efficiently stored and retrieved.  You may say today that every employee has a Social Security Number and that it’s guaranteed to be unique, and then in 6 months we need to change the application because we need to enter employee records before we have their SSN, or somehow a dead employee’s SSN got reissued to a new employee, or a new government regulation restricts from storing SSNs in the database, or about a million other things that you won’t think of because you’re really not nearly as smart as you think you are.  If you’ve built your entire system around the assumption that a piece of data will be unique, you’ve really painted yourself into a corner when that assumption is inevitably proven to be wrong.  Wouldn’t you rather just start with a single ID field that you can always rely on to be unique?  It’s only purpose in life is to make sure that your data can be reliably identified, and that makes me feel warm and fuzzy.

And for a very real case of this, just this past fall I had to change SportsCommander to allow users to change their username.  Why would you ever allow users to do this?  Because they needed to.  The original implementation auto-generated cryptic usernames, and this was only done to match a legacy system it had replaced, but the users could never remember their username.  So despite how much it drove me nuts to change this after the fact, we accepted that was critically important to our user’s satisfaction with the site to allow them to select a new username.  In the end, the technology gets less important, and you either have fantastically happy users or you’re selling flowers by the highway.  If would have been easy to say up front that “users can never change their username, why would we ever allow that?”, and if we had used the username as a primary key, we’d really be screwed.  Luckily, we already knew that we were simpleminded fools who could not predict the future, and did our best to protected ourselves against that. 

Of course, in the original version of SportsCommander, we could have also used the USA Volleyball ID number to identify users, because every user in the system is a USAV member with their own number.  However, it turned out that some users had their USAV number change from one season to the next.  Also, we subsequently expanded SportsCommander to support organizations beyond USA Volleyball, so the identifier further ceased to be the grandly unique identifier it originally seemed to be.

 

Also, it’s slow.  Sure, if it’s indexed, it can be really fast, but a lookup with a string field will never be as fast as an integer field.  It also undermines any gain achieved by using clustered indexes, because your data is no longer sequential.  the difference may be negligible a lot of the time, and I definitely don’t want to start doing premature optimization or complicating the code to squeeze our a tiny bit of performance.  However, this is a small thing that can scale significantly depending on the amount of data and tables you have, because you are are doing constantly doing these lookups on against almost every row you touch. 

Of course, natural keys aren’t all bad.  One positive benefit of natural keys is that they can make bulk importing or synchronizing of data easier.  For example, imagine that you are importing a list of employee records and also a list of their timesheet hours.  You’ll make two passes through the data, once for the employee records themselves, and then a second pass for the timesheet hours.  During the second pass, you’ll need to correlate the hours with the employee records that you’ve already inserted, and it’s a lot easier and quicker if you can use a value you already have (like the SSN), than to have to use that value to lookup another arbitrary value (like an ID field).  However, in the end, I really don’t care about this all that much.  It’s a small subset of the usage of the database, and I feel it pales in comparison to the other factors.  Sure, it might make a DBA’s life a little more difficult for some data import they have to run in the background from time to time, but honestly I really don’t give a damn.  Life is full of compromises, and the total system is almost always better off, in terms of stability and total development time, with a more consistent and reliable surrogate key.

GUID Keys

So why not use Globally Unique Identifiers (GUIDs) for keys instead of numeric IDs?  GUIDs definitely have their place, but it really depends on what you’re trying to do.

GUIDs have a lot of benefits due to the fact that they are always globally unique, while your cute little identity (i.e. auto-incrementing)  ID is only unique within your table.  The identity isn’t even unique throughout the database, because each table maintains its own list (this of course assumes SQL Server, not Oracle, I don’t know how MySQL handles this because I really don’t care). 

Having globally unique keys can be useful if you are doing any replication from one database to another, or if you need to store the data in several different databases for other reasons, because you can assign a GUID ID to a record and be assured that it won’t conflict with ID from another database.  For example, if you had a database that stored a master list of users, and those users are represented in several satellite databases, maybe for several separate web applications at the same company, GUID keys would make sense because the data by its very nature should be unique beyond the database. 

Another benefit is that because they are non-sequential, you don’t have to worry about people guessing adjacent values.  If you are going to include any IDs in a query string, it’s usually important to use values that will not allow the user to guess additional values.  If user is on a page called “EditRecord.aspx?ID=12”, you know for sure that some user is going to try typing in “EditRecord.aspx?ID=13” just to see what would happen.  Ideally in these cases you application will restrict the user from getting access to records that he/she/it/heshe should not have access to, but it’s a good idea to not encourage your users to go looking for these types of vulnerabilities.  If the user sees “EditRecord.aspx?ID=ceb143ce-b4e9-4d59-bedb-bb07c5f0eb41”, good luck trying to guess the another value from that table.

However, GUIDs are slow.  Again, if it’s indexed properly, it should perform sufficiently, but much like string fields GUIDs will never be as fast as integers.

Also, GUIDs make debugging a lot harder.  You can’t yell over the cube wall “hey Barack, can you take a look at record 49bebb82-a96c-4bfc-b222-0b9df372ec2a, it looks like you screwed something up again!”.  Even when you are just running a few queries to track down a problem in the middle of the night, it goes a lot quicker and burns fewer brain cycles if you can reference an easily-remembered number, like 8675309, instead of a 36-character identifier.

So in the end, I usually just use integer keys for everything.  That way I get the best performance, the easily debugging, and it fits my needs 95% of the time.  There still are always cases that you could use GUID identifier, so I just put an extra GUID column on those tables.  That gives us the fast and easy integer key lookups for most queries, and the GUID uniqueness added-on when we need it.  We’ve done that here for the AlmUser table, because there are a lot of cases that you don’t want to have users guessing the user ID number.

 

Reserved Words

Often times when you are creating a schema, you use table or field names that are reserved by SQL.  You may name a table “User” or name a field “Description”.  You may not think anything of it at the time, and SQL will let you do it without complaint, but then when you try to write queries against it you’ll get syntax errors or weird behavior.  A lot of developers know the frustration of typing that last character of the field name and watching the font color turn blue.

Of course, SQL provides a way around this.  By wrapping your identifiers in brackets (like [User].[Description]), you can help SQL avoid this confusion.  Any database layer that automatically generates SQL code should ALWAYS do this, because you never know what you might encounter.

However, typing in the brackets is a pain, and it hampers readability, and people often forget to do it, and it’s one of the last things I want to have to worry about while I’m writing code.

Most other programming platforms, like C+ or C#, prevent you from even using reserved words for your own purposes, because really, who needs the headache of keeping them straight?  They are called “reserved” for a very good reason, and I hate that SQL doesn’t actually “reserve” these words.

Anyhow, just because SQL doesn’t enforce the rule doesn’t mean that we can’t enforce it ourselves.  That being the case, stay away from this list of SQL Server 2008 reserved words.  You can use them as any part of a name, such as “AlmUser” or “ItemDescription”, but just don’t use them by themselves.

 

AlmUser Table

Just about every database I’ve designed that drives a web application includes some sort of a User table.  It may store authentication information, or personal information, or may just be an intersection point for everything else.

In this case we will store some basic identifying information, like FirstName and LastName.  We’ll expand it over time to include additional fields.  There is also a GUID column, in case we need to pass a User ID down to the browser, that is the version we’d like to use so that people can’t guess the other user’s IDs as well.

However, you may notice that it does not include the necessary fields for authentication, like Password.  We’re going to outsource that for now using the ASP.NET Membership Provider interface.  The default ASP.NET Membership Provider just stores the information in some database tables, but you can also have providers to connect to Active Directory, Open ID, or some any number of other third-party authentication systems.  For now, we’ll use the default provider, and we may store the tables in the Parallax database just to keep everything together, but there will be NO connection or references between those membership tables and the rest of the Parallax schema.  At any point in time, we will want to be able to pull that out and replace it with another provider.

Granted, this approach may be overkill, and it is usually not necessary for most applications to keep their membership information completely abstracted.  However, as I’ve mentioned, I’m not pretending that we building a practical application here with realistic design constraints; this is a sandbox where we will explore several technologies, so membership providers is one of those topics I’d like to explore.

 

Audit Columns

Every table ends in 4 fields, indicating who created the record and when, and who last updated the record and when.  Granted, this only displays two dimensions of the history, rather than a full history of who changed what when, but I’ve found it comes in handy when trying to figure out when the data went awry.

You’ll notice that we use the user GUIDs to store the user.  I’ve seen implementations where they used actual foreign keys to the user table, and it caused enormous problems.  The checking of the key constraint can slow the performance of inserts and updates, albeit slightly, but that’s still a big no-no.  Your auditing code should have no detrimental impact on the database.  Also, what happens if for some reason the ID value is wrong?  You really don’t want to reject the change for this reason, because these are just audit fields, not real transactional information.  It’s important that the value be right, but it’s a lot less important than the rest of the record.  Lastly, and the most crippling, is that by the time the database had grown to hundreds of tables, it was virtually impossible to delete a user record.  Anytime we tried, it would run through every key referencing the table (2 per table) to determine if there were any other records referencing the deleted record.  Due to the number of relationships and the volume of data, this would timeout after a few minutes.

So clearly, foreign keys are out.  We could still use the ID from the table without a key, but I see a few issues with that.  First, every other place in the database that stores an ID reference field has a foreign key.  That’s a core rule of the database, and I don’t want to give anyone, especially myself, the opening to start getting creative with whether to enforce foreign keys.  Also, the value is not very intuitive.  Lastly, I don’t want anyone to see that field, think that someone else forgot a key constraint, and go ahead and start adding them.  If you don’t make it clear in the code how your code is supposed to be used, it’s your own fault when it is inevitably misused.

So in the past I’ve learned towards putting the username in there, and it worked great.  It was straightforward, unique, succinct, and human-readable.  However, as we mentioned above when discussing natural keys, even usernames can change.

So we’re trying the user GUID this time around.  It’s not nearly as friendly as the username, but it certainly is unique.  And it avoids the usage confusion that you get with ID fields.  I’m still not crazy about this one, but we’ll see how it goes.

 

Data Types

NVARCHAR vs. VARCHAR

For a long time I always used VARCHAR, because I didn’t have a reason to use NVARCHAR.  Then, I ran into a few apps that needed to have NVARCHARs here and there because we had some international customers, and it was a pain keeping track of which fields were which.  So I just started using NVARCHAR for everything, but in hindsight I think that was a mistake.  Doug Blystone (co-founder of MMDB Solutions and a guy who knows more about SQL databases that I could ever hope to) has finally convinced me to stop using them unless absolutely necessary.  I usually shrug off performance issues for the sake of simplicity until they are a demonstrable problem, but this is one of the cases where it can lead to a hidden and pervasive performance drag that won’t jump out at you, but will slow you down just a little bit every step of the way.  These are the types of performance issues that add a second or two to the page load here and there, which makes you users subconsciously hate your guts, even if they don’t really know why.  It takes up twice as much space, it makes comparisons of string values take longer because there are twice as many bytes to compare, and it’s just not necessary unless you’re dealing with Asian languages (or anything language that may have thousands of characters).  We are years removed from the day that we can afford to hire someone to translate our applications to Japanese, and I’m guessing we’ll never do business in China because I hate Commies.

Simple Types

That being said, we’re using VARCHAR.  We’ll also be using UNIQUEIDENTIFIER, DATETIME, DATE, TIME, BIT, INT, FLOAT, TEXT, and maybe VARBINARY.  And that’s it.  No smallint or real or money or any of that crap.  We are going to use the minimum number of logically distinct data types that meet our needs.  I have no interest in getting into data-type arms war, trying to save a byte here or there to create some sort of idealistically pure representation of the data’s inner beauty.  I, nor any other developer, will waste a minute of time worrying about if we can cast this bigint into an integer or that real into a float without data loss.  We’re going to keep this simple, so help me God, and so you and your sql_variant friends better get the hell off my lawn before I get my shotgun.

I actually have used the new DATE and TIME fields yet, but I’ve long felt the need for them.  A DATETIME field is really two different things, which usually works out OK, but when you need to store just a date or just a time, it really gets clunky.  Then when you start having to worry about time zones, it can get just plain weird.  We have a few cases in SportsCommander now where we have to do a little code gymnastics to store just a date or just a time in a  DATETIME field, and have the inactive part properly ignored without interfering with the active part.  This will be my first foray into using these data types, so we’ll see how well they work.

 

So that’s a start for the database stuff.  It ran a little long for 3 tables, but you need to state your ground rules up front, even if it’s just for your own benefit, lest you start getting lazy and breaking them.  So here we go.

I created a CodePlex project for hosting the ALM system.  I even gave it one of those fancy codenames the kids are always talking about: Parallax ALM

Parallax is a scientific term to describe how the relationships of items in space look different from different angles.  According to Wikipedia:

Parallax is an apparent displacement or difference in the apparent position of an object viewed along two different lines of sight, and is measured by the angle or semi-angle of inclination between those two lines.

 

A simple example is the speedometer in a car.  When you are driving you can read your speedometer accurately, because you are staring straight at the speedometer, the way it was designed to be looked at.  However, when your passenger looks at the speedometer, he/she/it will see something different; because the indicator is slightly raised off the surface, the relationship of the indicator to the surface will be distorted, and so it will appear that you are going slower than you actually are (assuming, of course, that you are driving on the correct site of the road).  I won’t even get into why objects in the mirror are closer than they actually appear, but for now you can just assume that it’s a government conspiracy.

In a car, this is a simple case of right and wrong.  The driver is right, and anyone who disagrees can get the hell out and walk.  However, on a software project, that’s not always the case.  As developers, we usually just assume that we are right and the business/executive/PM/QA/ people are idiots, but if anyone has the authority or persuasiveness to tell everyone else to get out and walk, it sure isn’t a bunch of sociologically-crippled developers.

Really, in a software project, everyone is usually a little bit right and wrong.  Sure, there are often gen-u-ine jackasses out there, but they are usually a lot less common than we think, and we are usually right far less than we want to believe.

So we need to get everyone on the same page.  Or reading the same book.  But everyone speaks a different language, especially from one job description to another, so everyone needs their own version.  We need to present the same information to everyone in a way to makes sense to them as individuals, and maybe even makes their life easier.  Just watch a executive’s eyes glaze over as you explain iterative development to him, but put yourself in his shoes as he’s trying to explain why he needs some sort of metric, anything really, to know what the status of a project is and whether it’s falling behind schedule.  All of that information is really the same stuff, just looking at it from a different angle to fill a different job need.  You cannot reliably count on people to solve these problem themselves, because most people had an enormous aversion to putting themselves in someone else’s position, mostly because it throws a giant spotlight on everything they themselves maybe doing wrong.  So let’s solve the problem with some software.  You’ll still need to have people want to work together, but hopefully it will actually be easy for them this time around.

So there you go.

Posted in ALM.
Technologies

So what technologies are we going to use for this thing?  Hopefully a little bit of everything, as one of my goals is to use this to learn some new things.

But to start, it will look like this:

  • SQL Server 2008
  • IIS/ASP.NET 3.5
  • LINQ-to-SQL

 

Moving forward, I’d like to introduce the following technologies at some point, just for fun:

  • Entity Framework (EF)
  • NHibernate
  • Windows Communication Foundation (WCF)
  • Windows Workflow Foundation (WWF)
  • Silverlight
  • ASP.NET MVC
  • Microsoft Reporting Services
  • IronPython
  • F# (eh, why not?)

 

Some of these may be used to complement what already exists, such as exposing a WCF layer or sprinkling some Silverlight here or there.

Other things may require replacing a whole layer of code, like changing the data layer from LINQ-to-SQL to the Entity Framework.  Hopefully I’ll be able to create side-by-side instances of the system running each type of data layer, which would be interesting for comparing the performance vs. development effort of each.

Some things I just want to learn, like ASP.NET MVC and IronPython, so I’m going to try to find a way to get that into the product somehow.

Obviously, very little of this experimentation would make any business sense in the real world, but that’s part of the fun of it.  I’ll build out the initial data layer in LINQ-to-SQL because that’s something I already know and have experience with, but then replacing it with NHibernate and/or the Entity Framework will be a great learning exercise, and hopefully to teach me a thing or two about the true pros and cons of each, beyond marketing fluff and fanboy crap.  And along the way, I’ll try to post everything I learn on this blog.

Development Tools

As far as tools, I’m using Microsoft Visual Studio 2008 Team Suite, mostly because we already have it through the Microsoft BizSpark program.  However, I realize that it’s pretty impractical for most people to purchase Team Suite, so I’ll try to keep everything limited to the basic tools available in the Standard or Professional editions.  Since one of my goals is to provide something that other people can learn from (whether it be from my successes or failures), I think it’s important that I stick to tools that are readily available to most developers.

As such, I’ll steer clear of things like Visual Studio testing suite, and I’ll make use of more open source tools like NUnit and WatiN instead. 

For databases, I’ll avoid the original Visual Studio database projects (because they are mostly useless), and I’ll avoid the new fancy new Visual Studio SQL Server database projects (because they aren’t readily available to most developers and they are too complicated in my opinion).  However, this is still a sticky problem, as I have yet to see a good and free/cheap tool for managing database development.  Doug Blystone (co-founder of MMDB and resident SQL Server whiz) has been working on a utility that consolidates a lot of the do’s and don’ts concurrent SQL development that we’ve learned over the years, and we plan to release this very soon.  If you find it useful, great; if not, you can at least follow along with everything I’m doing.

 

Anyway, I’m going to get started on the data model first, so stay tuned.

Posted in ALM.

We’re going to dig into the next major component the of the ALM, the work item tracking component.

Work Item Tracking Component

 

Not A Bug

 

What is a work item tracking system?  Essentially, it is a glorified bug tracker.  And if the world needs anything, it is yet another bug tracker.

The pet project of countless budding developers, the industry if drenched in bug tracking systems of all varieties, from the beautiful to the ghastly, and everything in between.

Over the last few years, there seems to have been a shift away from “bug tracking” towards “work item tracking.”  If you have ever watched or participated in the arguments between developers and QA about whether something is actually a “bug,” you may appreciate how “work item tracking” is a more accurate (albeit clunky) term.

 

Reinventing The Wheel

 

So what are these systems missing?  Not a whole lot, at least when you look at the whole ecosystem.  However, I’ve been hard time finding a simple and affordable solution that did exactly what I need.

Again, at MMDB, we use BugNET, and it works great.  However, there are a lot of features I’d like to see added, like better workflow control and the ability to make my custom fields first-class citizens in the system.  Obviously, as an open-source project, I could contribute these additional features, but that would not be nearly as much fun.

Many years ago at GHR Systems, we used Mercury’s Test Director (which has since been acquired by HP), and it was wonderful.  It included many of the features that I’ve been looking for ever since, and have yet to find.  It allowed us to create some simple workflows to for tracking our work, so that a bug needed a “fixed in version” before it was submitted to QA, and needed a “tested in version” before passing QA.  These things can all be enforced manually, but without the system enforcing these simple rules, they will never be followed.

 

The Silver Bullet

 

This leads to the one real feature I want to have.  It’s the catalyst for this whole thing, the one feature that got me thinking about doing this system in the first place.  Requirements association.

Again back at GHR, after a round of arguments about whether something was a bug or an enhancement, we decided yet again that we need a better way to manage whether something was a bug because it didn’t match the requirements, or whether it a enhancement because it was never covered in the requirements.

The development manager, having just finished reading Karl Weiger’s Software Requirements book, insisted that every new bug entered into the system much also reference the requirements item that it was conflicting with.

Obviously this was insane.  I don’t think the manager actually expected it to happen, but he was trying to prove a point.  It was very important that the requirements be closely followed while testing in order to truly determine whether an item was a bug, not because of the stigma of the bug itself, but because bugs were handled differently.  Bugs were prioritized in order to get them fixed quickly, while enhancements were scheduled to be reviewed and possibly implemented at a later date.  Allowing new features to be classified as bugs would cause the scope to creep out of control.

However, there was not really a good way to do this.  We could add a new field to the bug tracker, but what would go in that field?  Many of the requirements documents used Word’s outline format, which would create unique numbers for each line item, but those numbers were not consistent; if someone added something to the top of the document, the rest of the numbers would change.  Also, as we’ve described at length already, these documents were not immutable.

This got me thinking that we needed to track requirements and work items in the same system, so that there could be an inherent link between them. 

 

How It Works

 

This is the key feature.  Once the requirements are approved to some degree, the work items can be created for each feature.  Then, bugs can be logged against those features.  If there is ever a question about the requirements, the developer or QA resource can quickly view the associated requirements from inside the work item. 

If there was any conversation about the during the requirements phase about this item, it will all be there already.  If the user has additional questions, they can ask them right there, and the question and answer will be recorded as part of the history. 

By providing an easy way for people to get answers about the requirements, we trick them into building up a knowledge base about the system.  Over time, the number of repeated questions and recurring misunderstandings will decrease dramatically.

 

Here We Go

 

So that’s the idea.  This is the two major components of the system, and it’s probably enough to get started with some code.  I plan on dog-fooding this system the whole way, so at the very least I want a work item tracking system ASAP just to get myself organized.  Then I’ll build out the requirements piece and use that to define the requirements for the rest of the system.

Besides, I’m a developer.  I want to write some code.

Posted in ALM.

To get a little more detailed, here’s what I’m thinking of for each component.  We’ll start with the Requirements component.

Requirements Component

 

The Best of Intentions

 

Right now, in a company much like yours, Larry, a business analyst, is spending hours writing up a requirements document.  Because Larry has been yelled at so many times by developers about not providing sufficient requirements, he is being is being particularly anal, documenting every feature, expected inputs and outputs, validation conditions, etc.  Once they are done, the document is beautiful.  It is 87 pages of clearly defined requirement bliss.  The problem is that the document is not done, and very soon it is not going to be alone.

Larry sends the document to his supervisor (1 copy), who makes a few rough edits and sends it back (another copy).  Larry then cleans up these edits to produce a revised final version (yet another copy) and sends it back the supervisor, who says to run with it.  In this initial review, we have now seen 3 copies of the document, each of which was a little different, and will live forever in email histories, hard drives, dark corners of SharePoint, or printed out on someone’s desk.

Larry sends the latest copy out to the development team, and schedules a meeting to discuss.  By the time the meeting comes around, he realizes he needs to make a few more changes, and brings a new copy to the meeting.  Meanwhile some developers have absentmindedly filed a copy off to their hard drive, and will forever reference that copy anytime a requirements issue comes up.  The meeting generates a lot of discussion, which causes several more changes.  The result is several more drafts of the document. 

 
The Hydra Emerges

 

All the while, the supervisor has taken his “final” copy and shown it to the CEO “just to given him a rough idea of where we are at”.  The CEO, despite his best intentions, will remember this conversation as “here’s exactly what we’re committing to”.  He especially likes the Universal Data Widget feature, which the development team has just notified the Larry is not possible.

The next time the Larry meets with the supervisor, they both have different copies of the requirements, and so the supervisor doesn’t notice that the Universal Data Widget feature, which the CEO loved, is now missing from the latest document.  A few weeks later, when someone else points it out, and nobody can remember why it was removed, and the supervisor plays the “what the CEO wants” card, and so it is added back in.  Meanwhile, the development team has already started coding, and has no knowledge of the feature being added back into the requirements.

 

Reality Strikes

 

3 months later, after the testing phase, the CEO asks “where’s the Universal Data Widget?” and everyone panics.  The development team has no recollection of the feature, and Larry can’t remember when it as added back in.  In their own defense, everyone produces a different copy of the requirements, each of which supports their own case.

After much finger pointing, the supervisor says, “whatever, just add the damn Widget right now!”  The developer takes a stand, pointing to his copy of requirements in hand, and insists that he’s not allowed to make any changes without a documented change request.  So the supervisor grabs a yellow sticky note, writes “Add the damn Widget” on it, and sticks it on the developer’s copy, and storms away.

Everyone started out with the best of intentions, and now they are completely frustrated with each other.  Everyone wants the product to work well, and yet they no have a list of things to blame on everyone else.  So much useful communication was lost in verbal meetings and hallway conversations, and the product suffered as a result.  In the end, the developers still say that the analysts don’t produce sufficient requirements, the analysts still say that the developers will never be happy with whatever requirements are produced, and the management still seems them all as a bunch of finger-pointing whiners who can’t get a product finished.  And honestly, I can’t really blame any of them.

 

Are Requirements Counter-Productive?

 

I’ve seen one variation of this or another so many times over the years.  The sticky note part is actually true: on one project where the requirements were spinning out of control, the development lead was became fanatical about documenting change requests, and the PM actually resorted to placing sticky notes on his copy of the requirements. (The development lead was the same as this guy, the original less-embellished story is here).  The end result is that people stop writing requirements, because it just makes things worse.

There has been a push over the years to move away from requirements, usually as some sort of eXtreme Programming (XP) method.  Requirements get replaced with looser “user stories.”  In in some requirements are abandoned all together, in favor of having the customer “always available” to answer questions and fill in missing details. 

Given how hard it is to get specifications right when you’re really trying to be thorough, it’s hard to imagine how non-technical customers can be expected to make rational, thoughtful, and cohesive requirement decisions by the seat of their pants.  It is the customer’s job to want the best and coolest thing, without consideration to the overall design, because that’s what normal people do.  Conversely, it’s the job of the analyst/PM/developer/whoever to adjust those requests to reality and ensure that the customer’s seat-of-the-pants vision remains consistent with their business objective.  Otherwise, if the design is based on what the customer wants at a given time, Frankenstein’s monster arises.  All of the parts are individually beautiful, when when assembled, the sum result is grotesque.

A lot of great information has been provided about the importance of requirements.  Joel Spolsky has written a series of posts about why the are necessary, and Karl Weiger has written the definitive book on how to do it well.  Over the years, I’ve realized that a lot of people are not actually arguing that requirements are helpful.  Instead, they are arguing that they are not practical, because everyone is always changing their minds.

Again, this is not surprising, because when you have complex documents written in Word being email around, it is not practical to maintain.  The problem here is not the content of the documents, but the approach to maintaining them.

 
Is There A Solution?

This always seemed like a difficult problem to solve, but not an overly complicated problem.  I’ve written more systems than I can count with far more complicated workflows than this.  In my opinion, the idea that anyone, much less software developers, are using Word documents in this day and age is inexcusable.

I have been waiting for years for a solution to arise to solve this problem.  Many systems have come along, either bug trackers or planning tools, with a requirements component bolted on to the side, usually as a wiki or a document management tool.  There are also tools that focus primarily on requirements, and then have a project planning or bug tracking component bolted on as an afterthought. 

However, none of the systems I’ve seen systems really captured the evolving nature of requirements and tied them to work items and bugs as a core system function. 

 
So What’s The Solution?

 

The solution is that we do what we’re always paid to do.  We build a system that accurately reflects the business model, in a way that makes it easy for people to do their jobs well.  It will be a centralized, intuitive system that easily facilities and records the natural conversations that take place in the development of requirements.  It will be a like Word’s track changes feature, but it will actually be useful.

Here’s the new story:

Larry enters his requirements into the new system in a outline and line-item format.  Because it is naturally geared towards his specific needs of entering requirements, he is able to leverage features like reusing requirement sections and building mockups, and is done is half the time that it would take him to create the same requirements in Word.

Larry then clicks a button to submit the requirements to his supervisor, which sends a notification to the supervisor.  Instead of clicking on a document, the supervisor clicks on the link to the system, and reviews the requirements.  He asks a few questions and enters a few suggestions, each of which is attached to a specific line item in the system.  Once he’s done, he clicks a button to send the requirements back to Larry.

Larry reviews the feedback from his boss, makes the changes he suggested, enters a few clarifications of his own, and then forwards it back to the supervisor for final review.  The supervisor clicks the Approve button, and the system has tracked every step of the way.

The process proceeds the same one the developers get involved.  Everyone is looking at the the same copy on the “document.”  If anyone sees something they don’t understand, they can click a button to request clarification, and the analyst can respond right there in the system, so anyone else with that same question can see the same answers.  6 months later, when developer and QA person are arguing about how a feature should work, they can look back the sole latest version of requirements, see all the discussion that took place, and quickly get an answer; if the answer is not there, they ask it through he system

 

How Do You Get People To Use It?

 

People will need to want to use it, but we’re try to build it so usable that they actually want to use it.  We’ll do everything we can to keep the learning curve of the system as small as possible, so it’s easy for people to look at it and just start using it.  It must be an intuitive, pleasant, frictionless interface, or it will be a failure.

You usually lose people when they look at the system and don’t know where to start, or were to go next.  They’ll complain about the system, and all they get back is “well this is a very powerful system, so you need to spend a lot of time to really learn how to us it.” 

This answer always infuriated me, whether it was Oracle or Linux or TFS.  You can always build a simple that is powerful and easy to use a the same time.  It is not easy, but that is why it is so rare.  It is the last mile of the design, and one of the hardest parts, and it’s what separates powerful products that most people hate from great products that most people love. 

Take a look at stackoverflow.com.  When you get down to it’s guts, it is an extremely complicated system, but it is still and inherently easy system to use, because so much effort was put into making it so.

 

Get On Board

 

Just one more note about getting buy-in.  I pitched this idea to a friend of mine, with whom I worked with for years in the trenches, and he said that the business analysts and project managers would never go for it, because it puts too much accountability on them.  I was taken aback by this, as it had a lot of the pessimistic “us-versus-them” partisanship that I want this tool to solve.  The idea here is that people usually have at least somewhat good intentions in this process, and so far it has been the tools that have failed them. 

However, he was right in some cases, and often times you find yourself working with people who are actively dishonest and try to cover there tracks.  In that case, fire them if you have the authority, because they are a detriment to the company.  Or, if you’re lower on the totem pole, quit if you can, because that is an awful place to work in and there are always better options.  Of course, again this may not be practically in all cases, so I often fall back on calling them out and trying to shine the light of truth whenever possible, but that has usually just gotten me in trouble.  This too can be problematic, because often you can come across as a complainer (After one such incident, one of my recruiters suggested I read Dale Carnegie’s “How To Win Friends & Influence People”, which was actually far more useful than I expected).

 

Here We Go

 

So that’s hopefully what we’re going to build.  At least part of it. 

Am I qualified to define how a requirements system should work?  Good gravy, no.  But I’m going to do it anyway.  I have a rudimentary understanding of some basic principles, but I am by no means an expert in requirements analysis and collection.  I just see a broken system and want to improve it.  It may not have all the bells and whistles that some other requirements systems have, but that’s OK. 

And of course, this is an exercise in developing software, not building a marketable product, and so that grants me the latitude to pursue things that I’m completely unqualified to pursue.  Plus sometimes it’s just fun to let go and try to do something you suck at.

Posted in ALM.

For now, this working name for this project will be referred to as the MMDB ALM, until I can think of a better name, which may be never.  ALM stands for “Application Lifecycle Management”, which is what we’re building.  MMDB refers to MMDB Solutions, who will own the source code to this application, under some sort of open source license to be determined later.

So what are we going to build?  I touched on this a little bit in the last post, but here’s the general overview:

 

This application will be made of a series of integrated yet independent components:

    1. Requirements
    2. Project Planning
    3. Work Item Tracking
    4. Testing
    5. Customer Response

 

I like bullet lists.  Here are some high-level requirements for this application:

  • Integration: These components will all be integrated into a single aggregated data source.  This does not necessarily mean a single physical database, but all of the data must be able to be tied together and traced through easily.
  • Independence: These components can behave as silos, so that any one component can stand on its own.
  • Portability: These components will all have integrated points to existing systems, so that any part could be replaced with another third party system.
  • Performance: The performance must be sufficient.  Notice I didn’t say “as fast as possible”, just sufficient to meet the expected load of the system, as demonstrated by load test.  If there is to be a tradeoff between code elegance and performance, the integrity and readability of code will always win over performance unless there is documented evidence that the performance change will resolve an existing issue.
  • Usability: The system must be easy to use.  No end-user reference documents will be provided, for several reasons.  First, I hate writing them.  Second, nobody reads the damn things.  Third, they are perpetually out of date.  Lastly, and most importantly, they are a crutch to avoid creating usable software.  The mark of a truly usable system is that any person can start using it with no training and it never occurs to them to ask any questions.  It sounds like a lofty goal, and it’s really hard to do well, but it is definitely doable, and it’s a lot better than writing documentation.
  • Quality: All code will be run through automated tests, but not necessarily the type of unit testing you may be used to.  A common approach is to create a series of unit tests for every little method in every little component, which I find to be a poor investment of time, and gives a false sense of security because it doesn’t account for the variations you get when you string components together.  Instead, we will focus scripting out full end-to-end tests from the front-end UI (using WatiN or something similar), basically automating the QA regression process.  I feel that this approach gives you a better return on your time by providing more useful tests in less time.
  • Configurability: Administrators will be able to define their own rules for workflows, authorization, user defined fields, etc.  I don’t want to create a system so configurable that it’s useless, but every company has different needs, so many of the most common rules will be configurable.  For example, a company may want to say that anyone can enter a bug; but then it gets assigned to the development manager; who then can assign it to a developer, who must enter a fix description, number of hours, and version number to mark it ready for testing; etc.  However, it should be noted that usability will always win over configurability.

 

This is a first pass, and we’ll add more as we go.  You’ll notice that this approach will age this document, so that it will eventually be very out of date.  That is intentional, as this is one of the problems we’re trying to solve.  Yes, I know right now many of you are screaming “that’s what a wiki is for!”, but that’s just too free-form for what we’re trying to do.  I’m reminded of those label-maker commercials that imply they will make your life will be more organized, but you still get stuck doing the same damn organization, it just looks a little prettier.  I want to have an application to gently guide the users towards an organized solution, doing just enough to make it easy for them to do their job well, but not enough to shackle them into a subjectively awkward process.

Anyhow, we’ll get into these bullet points a little more in the next few posts.

Posted in ALM.

So I need something to write about.  I’ve considered starting a blog at several points over the years, but I never got around to it.  Not because I didn’t have things to say, but because I didn’t anything to talk about.  I didn’t want to throw together a bunch of random thoughts, or worse a rant about whatever was annoying me that day.

Then I figured if I was going to write about developing software, I should just go ahead and write some software, the only problem is what to write.  I have a few ideas for viable commercial products that could make a profit, but those aren’t really things that I want to publicize until they’re finished.  I could talk about one of our existing products, like SportsCommander, but again that is a closed-source product, and I don’t think talking about a product that has already been built frankly would be all that interesting.

Then it occurred to me that I should build something that I’ve always wanted to build but never had a good reason.  Something that includes features that I’ve always wanted in similar products but could never exactly find despite a ridiculously saturated market.  Something that I wish I could build and sell, but probably would not be profitable, again because of the market saturation.  Something that would inherently addresses many of the development management concepts that I’d like discuss in this blog.  How about a bug tracking system?  Or better yet, how about an application lifecycle management system (ALM).

Years ago, like most developers, I wanted to build my own bug tracking system, but obviously the last thing the world needs is yet another bug tracker.  At MMDB, we’ve been using BugNET, and it works pretty well for a free open-source system.  Every company has their own, some use classic open source freeware like Bugzilla or JIRA, others use expensive enterprise systems like HP Quality Center or Microsoft’s Team Foundation Server (TFS), and many companies just build their own. 

Many of these systems start out a bug tracker, and then people want to use them for tracking new development work and enhancements too, which is fair enough.  They want to track enhancements.  The want to attach requirements.  They want to plan projects.  Sometimes the system handles these things easily, sometimes not.  Bug tracking evolves from tracking just bugs, to tracking issues (bugs and other things that need to be changed), to tracking work items (any generic type of work that needs to be performed).  For years I had seen the evolution of these systems into something bigger and more integrated, and I wondered when the developer tool market would catch up.  I began to hear about Microsoft’s Team System, which sounded cool, but it was supposed to be extremely expensive and no place I was working would shell out for it any time soon. 

Then, in 2006, as I was reading Eric Sink’s blog (the only blog I really read at the time, unless you count www.thedailywtf.com), I saw his pitch for SourceGear’s new ALM product, Fortress.  Go read it right now.  It’s worth it.  I’ll wait.

I was fascinated.  Tying together requirements and work items and bugs into one integrated system.  My mind spun a vision of some fantasy system where entire when you open a bug, there is the original feature attached to it, and a link to the relevant requirements, including who wrote them, who approved them, any clarifications that have been made, and who to talk to if you have questions.  A system where requirements were a living, breathing, evolving system, rather than a stale Word document that has been tweaked a million times.  A system that solved all the problems I’d seen over the years, and actually made everyone’s job easier.  A system that covered requirements, project planning, QA testing, work item and bug tracking, build management, and change control.  Heck, maybe even a call center module that allows a company to build up a knowledge base of common issues and their resolutions, and allows the company to feed those statistics back into the requirements and planning phases so that they can be prioritized, so that you can be confident that you are actually solving the point-points that your customers facing.

So I downloaded Fortress and tried it out.  It was a very cool system, and certainly worth the money for many organizations, but it couldn’t really live up to the hype that I built up in my head.  Nothing could.  My expectations were too unrealistic.  I was disappointed.

So I looked at other systems.  Most were too complicated.  Most were too expensive.  Most were too difficult to maintain.  They didn’t capture the end-to-end integration the way that I had pictured.  No product met my vision.

So I should build my own, right?  Not so fast.  Building and distributing a commercial software product is really hard.  We always look at the code and think it’s easy, just code it and the money will come, but the reality is that the code is a very small part of the process.  You have QA, documentation, marketing, distribution, payment processing, customer support, and more legal issues than you could shake a stick at.  If you have a great focused product idea that is going to be easy to build and distribute, and will make a lot of money, and will fill a specific need in an underserved niche market, by all means plow ahead.  But if you want to make money by building yet another bug tracker, glorified as it may be, you’ll make yourself miserable.

So that’s where I found myself, with this possibly-great idea floating around in the back of my mind, knowing that it would probably never see the light of day because it was just not practical.  I moved from job to job, working with different systems, and remain moderately irritated that none of them, in my humble opinion, got it right.

I thought maybe I could build it just for MMDB’s internal use, and maybe release portions of it as open source portions of it, but it really wouldn’t be worth the time.  Between working full time consulting contracts during the day, working on MMDB projects at night and weekends, and a wonderful wife and two boys that I’d rather be spend my time with, I have an pretty clear idea of what my time is worth to me, and it’s quite a lot.  If I’m going to spend my time on a system that will never produce a profit, it will need to make my life a whole lot easier, or it will need to fill some other need.

So I’m multi-purposing here.  I’m writing the blog I always wanted to write about writing code and designing applications.  I’m writing about the process of software development, and I’m building the system that I always wanted to.  It will probably never get done, but that’s OK.  I may very well get bored and wander off, but that’s OK too.  The product may be a mess, but even that will teach me something.  The process itself is now the primary benefit.  I’ll get to experiment with some ideas and share what I find, and maybe get some feedback, and I’ll do it on whatever schedule I feel like.  I get to do something grossly impractical just for the sake of doing it, without the commitment of having to get it done.

I’ll go through the whole process here on this blog.  I’ll discuss what  I’m trying to accomplish, why I’m doing things certain ways, lessons I’ve learned over the years, and lessons I’ve learned just recently.  I’ll deal with specific issues, and I’ll go off on a few tangents.  Hopefully people find it interesting and useful.  If you get something out of it, great.  If not, too bad. 

I’ll try to use this as a testing ground to experiment with new technologies, because the only way I can really learn about something is to build something with it.  I’ll put all the code up on CodePlex or SourceForge so you can see it all for yourself. 

Here we go.