This is an ongoing series on Window’s Azure.  The series starts here, and all code is located on GitHub: https://github.com/mmooney/MMDB.AzureSample.  GitHub commit checkpoints are referenced throughout the posts.

In the last post we covered setting up a basic Azure-enabled web site.  In this post, we’ll show you how to setup your Azure account.  Next, we’ll cover deploying the project to Azure.

So head on over to http://www.windowsazure.com/ and click the Free Trial link:

image

Singe in with your Microsoft/Live/Password/Metro/Whatever user ID. If you don’t have on, create one:

image

Why yes, I would like the 90-Day Free Trial:

image

Once you do a little validation of your phone and credit card, they will go ahead and setup your account.

image

After a minute or two, click the Portal link:

image

This takes you to https://manage.windowsazure.com/, the main place you want to be for managing your Azure application.

image

And there we are.  Poke around a little bit, there is a lot of interesting stuff in here.  In the next post, we’ll create a Cloud Service and deploy our sample application there.

One of the hardest problems to solve when setting up a deployment strategy is how to handle the web.configs and exe.configs.  Each environment will have different settings, and so every time you deploy something some where you need to make that web.config look different.

The quick and dirty answer is to have a separate web.config for each environment.  Then during a deployment we drop the prod/web.config or staging/web.config into the web directory, and your good to go.  However, like a lot of problematic development strategies, this is really fast and easy to get going with, but it doesn’t age very well.  What happens when your DEV->STAGING->PRODUCTION environments evolve into LOCAL->DEV->QA->INTEGRATION->STAGING->PRODUCTION?  Or when you have machine-specific or farm-specific settings that change from one part of the production environment to another? 

Most importantly, what happens when that web.config changes for a reason other than configuration?  Then you have a whole bunch of web.configs to fix, and you’re going to put a typo in at least 2 of them, it’s guaranteed.

Let’s take a look at at VERY simple web.config, created from just a basic MVC 4 project:

<?xml version="1.0" encoding="utf-8"?>
<!--
  For more information on how to configure your ASP.NET application, please visit

http://go.microsoft.com/fwlink/?LinkId=152368

  -->
<configuration>
  <configSections>
    <!-- For more information on Entity Framework configuration, visit http://go.microsoft.com/fwlink/?LinkID=237468 -->
    <section name="entityFramework" type="System.Data.Entity.Internal.ConfigFile.EntityFrameworkSection, EntityFramework, Version=4.4.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" requirePermission="false" />
  </configSections>
  <connectionStrings>
    <add name="DefaultConnection"
         providerName="System.Data.SqlClient"
         connectionString="Data Source=(LocalDb)\v11.0;Initial Catalog=aspnet-MMDB.AzureSample.Web-20130117123218;Integrated Security=SSPI;AttachDBFilename=|DataDirectory|\aspnet-MMDB.AzureSample.Web-20130117123218.mdf" />
  </connectionStrings>
  <appSettings>
    <add key="webpages:Version" value="2.0.0.0" />
    <add key="webpages:Enabled" value="false" />
    <add key="PreserveLoginUrl" value="true" />
    <add key="ClientValidationEnabled" value="true" />
    <add key="UnobtrusiveJavaScriptEnabled" value="true" />
  </appSettings>
  <system.web>
    <compilation debug="true" targetFramework="4.0" />
    <authentication mode="Forms">
      <forms loginUrl="~/Account/Login" timeout="2880" />
    </authentication>
    <pages>
      <namespaces>
        <add namespace="System.Web.Helpers" />
        <add namespace="System.Web.Mvc" />
        <add namespace="System.Web.Mvc.Ajax" />
        <add namespace="System.Web.Mvc.Html" />
        <add namespace="System.Web.Optimization" />
        <add namespace="System.Web.Routing" />
        <add namespace="System.Web.WebPages" />
      </namespaces>
    </pages>
    <profile defaultProvider="DefaultProfileProvider">
      <providers>
        <add name="DefaultProfileProvider" type="System.Web.Providers.DefaultProfileProvider, System.Web.Providers, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" connectionStringName="DefaultConnection" applicationName="/" />
      </providers>
    </profile>
    <membership defaultProvider="DefaultMembershipProvider">
      <providers>
        <add name="DefaultMembershipProvider" type="System.Web.Providers.DefaultMembershipProvider, System.Web.Providers, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" connectionStringName="DefaultConnection" enablePasswordRetrieval="false" enablePasswordReset="true" requiresQuestionAndAnswer="false" requiresUniqueEmail="false" maxInvalidPasswordAttempts="5" minRequiredPasswordLength="6" minRequiredNonalphanumericCharacters="0" passwordAttemptWindow="10" applicationName="/" />
      </providers>
    </membership>
    <roleManager defaultProvider="DefaultRoleProvider">
      <providers>
        <add name="DefaultRoleProvider" type="System.Web.Providers.DefaultRoleProvider, System.Web.Providers, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" connectionStringName="DefaultConnection" applicationName="/" />
      </providers>
    </roleManager>
    <!--
            If you are deploying to a cloud environment that has multiple web server instances,
            you should change session state mode from "InProc" to "Custom". In addition,
            change the connection string named "DefaultConnection" to connect to an instance
            of SQL Server (including SQL Azure and SQL  Compact) instead of to SQL Server Express.
      -->
    <sessionState mode="InProc" customProvider="DefaultSessionProvider">
      <providers>
        <add name="DefaultSessionProvider" type="System.Web.Providers.DefaultSessionStateProvider, System.Web.Providers, Version=1.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" connectionStringName="DefaultConnection" />
      </providers>
    </sessionState>
  </system.web>
  <system.webServer>
    <validation validateIntegratedModeConfiguration="false" />
    <modules runAllManagedModulesForAllRequests="true" />
    <handlers>
      <remove name="ExtensionlessUrlHandler-ISAPI-4.0_32bit" />
      <remove name="ExtensionlessUrlHandler-ISAPI-4.0_64bit" />
      <remove name="ExtensionlessUrlHandler-Integrated-4.0" />
      <add name="ExtensionlessUrlHandler-ISAPI-4.0_32bit" path="*." verb="GET,HEAD,POST,DEBUG,PUT,DELETE,PATCH,OPTIONS" modules="IsapiModule" scriptProcessor="%windir%\Microsoft.NET\Framework\v4.0.30319\aspnet_isapi.dll" preCondition="classicMode,runtimeVersionv4.0,bitness32" responseBufferLimit="0" />
      <add name="ExtensionlessUrlHandler-ISAPI-4.0_64bit" path="*." verb="GET,HEAD,POST,DEBUG,PUT,DELETE,PATCH,OPTIONS" modules="IsapiModule" scriptProcessor="%windir%\Microsoft.NET\Framework64\v4.0.30319\aspnet_isapi.dll" preCondition="classicMode,runtimeVersionv4.0,bitness64" responseBufferLimit="0" />
      <add name="ExtensionlessUrlHandler-Integrated-4.0" path="*." verb="GET,HEAD,POST,DEBUG,PUT,DELETE,PATCH,OPTIONS" type="System.Web.Handlers.TransferRequestHandler" preCondition="integratedMode,runtimeVersionv4.0" />
    </handlers>
  </system.webServer>
  <runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="System.Web.Helpers" publicKeyToken="31bf3856ad364e35" />
        <bindingRedirect oldVersion="1.0.0.0-2.0.0.0" newVersion="2.0.0.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Web.Mvc" publicKeyToken="31bf3856ad364e35" />
        <bindingRedirect oldVersion="1.0.0.0-4.0.0.0" newVersion="4.0.0.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Web.WebPages" publicKeyToken="31bf3856ad364e35" />
        <bindingRedirect oldVersion="1.0.0.0-2.0.0.0" newVersion="2.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
  </runtime>
  <entityFramework>
    <defaultConnectionFactory type="System.Data.Entity.Infrastructure.LocalDbConnectionFactory, EntityFramework">
      <parameters>
        <parameter value="v11.0" />
      </parameters>
    </defaultConnectionFactory>
  </entityFramework>
</configuration>

 

Zoinks, that is a lot of configuration. 

Well, actually, exactly how much “configuration” is there?

Well, actually, the answer is one line:

<add name="DefaultConnection"
     providerName="System.Data.SqlClient" 
     connectionString="Data Source=(LocalDb)\v11.0;Initial Catalog=aspnet-MMDB.AzureSample.Web-20130117123218;Integrated Security=SSPI;AttachDBFilename=|DataDirectory|\aspnet-MMDB.AzureSample.Web-20130117123218.mdf" />

That is the ONLY line that is going to change from environment to environment.   Everything else there is not configuration, it’s code.

OK sure, it’s in a “configuration” file.  Who cares.  It is tied to your code, and it should change about as often as your code changes, not more.  It certainly should not change between environments.  My general rule is, “if any change to it should be checked into source control, it’s code.”  The sooner you stop pretending that they are different in a way that gives you an enhanced level of configuration flexibility, the sooner you’ll be a happier person.  Trust me.

The problem here is that the web.config is a confused little person.  It does try to configure stuff, but is actually two types of stuff.  As far as most people are concerned, it is for configuring their application, but the other 90% of it that they never touch and usually don’t understand is for configuring the underlying .NET and IIS guts, not your application directly.  And once you’ve coded your application, that 90% should never change from environment to environment, unless your code is changing as well.

And of course that code does change over time.  If you add a WCF web service proxy client to your application, it’s going to fill your web.config up with all sorts of jibber-jabber that you better not touch unless you know what you are doing.  But deep inside there is the endpoint URL that DOES need to change from environment to environment.

Again, this is where the “have a web.config for every environment” really breaks down, because now you have to go through and update every one those web.configs to add in all that crazy WCF stuff.  And try not to screw it up.

So What?

So what’s can we do about it?  We have a few options:

One option is to put all of the configuration in the database.  This can introduce a lot of issues, when you configure your application to point to database that configures your application, your run into all sorts of codependency issues that makes your systems environments really fragile.  The only time I’ve seen this be a good idea is when you have really specific change control rules about not being able to touch configuration files on the server outside of an official deployment, but configuring settings in the database through an administration page would be allowed.

I think these two options are preferable:

  • Drop a brand new web.config on every deployment and have your deployment utility and and reconfigure it, either using web.config transformations, XSLT, or a basic XML parser.
  • Use the configSource attribute on your settings.  This lets you put all of your connectionString or appSettings in different files, which is NOT updated from source control in every deployment.  This way you can always drop the latest web.config without having to worry about reconfiguring it.  (If you’re using Azure, this goes a step farther, by having a completely separate file for reserved for environment configuration, outside of your web application package)

Both of these options work well.  The first option works better if you have only a few options or if you need to update something that does not support a configSource attribute, like a WCF endpoint.  The second option works better if you have a whole list of settings and can consolidate them into the connectionStrings, appSettings, etc.

But either way, no matter what, ALWAYS drop a new web.config, and ALWAYS make sure you have a plan to treat YOUR configuration differently than the rest of the web.config.

Good Luck.

This is an ongoing series on Window’s Azure.  The series starts here, and all code is located on GitHub: https://github.com/mmooney/MMDB.AzureSample.  GitHub commit checkpoints are referenced throughout the posts.

Prerequisites

I’m assuming that you have Visual Studio 2012.  Now go install the latest Windows Azure SDK.  Go go go.

Getting Started

So the first thing we are going to do is build a simple Windows Azure web application.  To do this, we’ll create a new VS2012 solution:

image

 https://github.com/mmooney/MMDB.AzureSample/tree/76b9bbcd11146bca026b815314df907406b99048

And we’ll create a plan old MVC web application:

image

image

image 

Now we have a empty project, and add in a Home controller a view, we have ourselves an simple but working MVC web application.

image

image

 

Now Let’s Azurify This Fella

Now if we want to deploy this to Azure, we need to create an Azure project for it. Right-click on your project and select “Add Windows Azure Cloud Service Project”

image

That will add a bunch of Azure references to your MVC app, and will create a new wrapper project:

image

Now if you can still run the original web application the same as before, but you if you run the new Azure project, you’ll get this annoying, albeit informative error message:

image

Ok so let’s shut it all down and restart VS2012 in Administrator mode.

(Tip: if you have a VS2012 icon on your Windows toolbar, SHIFT-click it to start in Admin mode)

When we come back in Admin mode and run the Azure project, it’s going to kick up an Azure emulator:

image
image

And we get our Azure app, which looks much the same as a our existing app, but running on another port:

image

 

The idea here is to simulate what will actually happen when the application runs in Azure, which a little different than running in a regular IIS web application.  There are different configuration approaches, and the Web/Worker Roles get fired up.  This is very cool, especially when you are getting started or migrating an existing site, because it gives you a nice test environment without having to upload to Azure all the time. 

However, the simulator does have it’s downsides. First, requiring Administrator mode is annoying.  I forget to do this EVERY TIME, and so right when I’m about to debug the first time, I have to shut everything down and restart Visual Studio and reopen my solution in Admin mode.  Not the end of the world, but an annoying bit of friction.  Second, it is SLOW to start up the site in simulator; not unusably slow, but noticeably and annoyingly slow, so I guess it’s almost unusably slow. 

To combat this, I try to make sure that my web application runs fine all the time as a regular .NET web application, and then I just test from there.  Then before I release a new feature, I test it out in  simulator mode to sanity check, but being able to run as a vanilla web application makes everything a lot faster.

Also, and this is important, it forces you to keep your web application functioning independent of Azure.  Besides the obvious benefit of faster debuggability, it also ensures that your application has enough seams that if we had to move away from Azure, you can.  I’ve gone on and on about how great Azure is, but it might not be the right thing for everyone, or might stop being the right thing in the future, and you want to have the option to go somewhere else, so you really don’t want Azure references burned in all over the place.  Even if you stay with Azure, you might want to replace some of their features (like replacing Azure Table Storage with RavenDB, or replacing Azure Caching with Redis).  We’ve used a few tricks for this in the past that I’ll get into in some later blog posts.

https://github.com/mmooney/MMDB.AzureSample/tree/a56beb73e44b025d90570978541f83f3622e9eac

Next

Next we’ll actually deploy this thing to Azure, but first we need to setup an account, which we’ll do next.  Get your checkbook ready (just kidding).

If I had a nickel for every time our deployment strategy for a new or different environment was to edit a few config files and then run some batch files and then edit some more config files, and then it goes down in a steaming pile of failure, I would buy a LOT of Sriracha.

image

(Picture http://theoatmeal.com/)

Here’s a config file.  Lets say we need to edit that connection string:

<Setting name="ConnectionString" 
value="Data Source=(local); Initial Catalog=SportsCommander; Integrated Security=true;" />

Now let’s say we are deploying to our QA server.  So after we deploy, we fire up our handy Notepad, and edit it:

<Setting name="ConnectionString" 
value="Data Source=SCQAServ; Initial Catalog=SportsCommander; Integrated Security=true;" />

OK good.  Actually not good.  The server name is SCQASrv not SCQAServ.

<Setting name="ConnectionString" 
value="Data Source=SCQASrv; Initial Catalog=SportsCommander; Integrated Security=true;" />

OK better.  But wait, integrated security works great in your local dev environment, but in QA we need to use a username and password.

<Setting name="ConnectionString" 
value="Data Source=SCQASrv; Initial Catalog=SportsCommander; UserID=qasite; Password=&SE&RW#$" />

OK cool.  Except you can’t put & in an XML file.  So we have to encode that.

<Setting name="ConnectionString" 
value="Data Source=SCQASrv; Initial Catalog=SportsCommander; UserID=qasite; Password=&amp;SE&amp;RW#$" />

And you know what?  It’s User ID, not User ID.

<Setting name="ConnectionString" 
value="Data Source=SCQASrv; Initial Catalog=SportsCommander; User ID=qasite; Password=&amp;SE&amp;RW#$" />

OK, that’s all there is too it!  Let’s do it again tomorrow.  Make sure you don’t burn you don’t burn your fingers on this blistering fast development productivity.

I know this sounds absurd, but the reality is that for a lot of people, this really is their deployment methodology.  The might have production deployments automated, but their lower environments (DEV/QA/etc) are full of manual steps.  Or better yet, they have automated their lower environments because they deploy there every day, but their production deployments is manual because they only do it once per month.

And you know know what I’ve learned, the hard and maddeningly painful way?  Manual process fails.  Consistently.  And more importantly, it can’t be avoided. 

Storytime

A common scenario you see is a developer or an operations person (but of course never both at the same time, that would ruin the blame game)  is charged with deploying an application.  After many iterations, the deployment process has been clearly defined out as 17 manual steps.  This has been done enough times that the whole process is fully documented, with a checklist, and the folks running the deployment have done it enough times that they could do it in their sleep. 

The only problem is that in the last deployment, one of the files didn’t get copied.  The time before that, the staging file was copied instead of the production file.  And the time before that, they put a typo into the config.

Is the deployer an idiot?  No, as a matter of fact, the reason that he or she was entrusted with such an important role was that he or she was the most experienced and disciplined person on the team and was intimately familiar with the workings of the entire system.

Were the instructions wrong?  Nope, if the instructions were followed to the letter.

Was the process new?  No again, the same people have been doing this for a year.

At this point, the managers are exasperated, because no matter how much effort we put into formalizing the process, no matter how much documentation and how many checklists, we’re still getting failures.  It’s hard for the mangers to not assume that the deployers are morons, and the deployers are faced with the awful reality of going into every deployment knowing that it WILL be painful, and they WILL get blamed.

Note to management: Good people don’t stick around for this kind of abuse.  Some people will put up with it.  But trust me, you don’t want those people.

The lesson

The kick in the pants is, people are human.  They make mistakes.  A LOT Of mistakes.  And when you jump down their throat on every mistake, they learn to stop making mistakes by not doing anything.

This leads us to Mooney’s Law Of Guaranteed Failure (TM):

In the software business, every manual process will suffer at least a 10% failure rate, no matter how smart the person executing the process.  No amount of documentation or formalization will truly fix this, the only resolution is automation.

 

So the next time Jimmy screws up the production deployment, don’t yell at him (or sneer behind his back) “how hard is it to follow the 52-step 28-page instructions!”  Just remember that it is virtually impossible.

Also, step back and look at your day to day development process.  Almost everything you do during the day besides writing code is a manual process full of failure (coding is too, but that’s what you’re actually get getting paid for).  Like:

  • When you are partially checking in some changes to source control but trying to leave other changes checked out
  • When you need to edit a web.config connection string every time you get latest or check in
  • When you are interactively merging branches
  • When you are doing any deployment that involves editing a config or running certain batch files in order or entering values into an MSI interface, or is anything more than “click the big red button”
  • When you are setting up a new server and creating user or editing folder permissions or creating MSMQ queues or setting up IIS virtual directories.
  • When you are copying your hours from Excel into the ridiculously fancy but still completely unusable timesheet website
  • When, instead of entering your hours into a timesheet website, you are emailing them to somebody
  • When you are trying to figure out which version of “FeatureRequirements_New_Latest_Latest.docx” is actually the “latest”
  • When you are updating deploying database changes by trying to remember which tables you added to your local database or which scripts have or have not been run against production yet

It’s actually easier to find these things than you think.  The reason is, again, it is just about everything you do all day besides coding.  It’s all waste.  It’s all manual.  And it’s all guaranteed to fail.  Find a way to take that failure out of your hands and bath it in the white purifying light of automation.  Sure it takes time, but with a little time investment, you’ll be amazed how much time you have when you are not wasting it with amazing stupid busywork and guaranteed failure all day.

Overview

This is the first in a series of blog posts on getting started with building .NET applications in Windows Azure.  We’ve been a big fan of Azure for a lot of years, and we’ve used it for SportsCommander.com’s event registration site since the very beginning. 

I started off writing a blog post on automatically deploying web applications to Azure from TeamCity, but I ended up with too many “this blog assumes…” statements, so I figured should take care of those assumptions first.

What the deuce is Windows Azure any why should I care?

According to Wikipedia, Windows Azure is:

Windows Azure is a Microsoft cloud computing platform used to build, deploy and manage applications through a global network of Microsoft-managed datacenters. Windows Azure allows for applications to be built using many different programming languages, tools or frameworks and makes it possible for developers to integrate their public cloud applications in their existing IT environment. Windows Azure provides both Platform as a Service (PaaS) and Infrastructure as a Service (IaaS) services and is classified as the “Public Cloud” in Microsoft’s cloud computing strategy, along with its Software as a Service (SaaS) offering, Microsoft Online Services.

 

According to me, Windows Azure is:

A hosting platform, sort of line Amazon EC2, because you can deploy to virtual machines that abstract away all of the physical configuration junk that I don’t want to care about, but even better because it also abstracts away the server configuration stuff that I also don’t want to care about, so I can just build code and ship it up there and watch run without having to care about RAID drives or network switches or load balancers or whether someone is logging into these servers and running Windows Update on them.

 

Azure has grown into a lot of things, but as far as I’m concern Azure primary product is a Platform-as-a-Service (Paas) offering called Cloud Services.  Cloud Services lets you use combination of Web Roles and Worker Roles to run web applications and background services.

Glossary

These types of terms get thrown around a lot these days, so let’s defined them.

Before the cloud came in to overshadow our whole lives, we had the these options:

  • Nothing-as-a-Service: You went to Best Buy and bought a “server.”  You’re running it under your desk.  Your site goes down when your power goes out or someone kicks the plug out of the wall.  Or when your residential internet provider changes your IP because you won’t shell out the money for a business account with static IPs.  Then the hard drive fan dies and catches fire, your mom complains about the burning smell and tells you to get a real job.
  • Co-Location: This is a step up.   You still bought the server and own it, but you brought it down the street to hosting company that takes care of it for you, but you are still responsible for the hardware and software, and when the hard drive dies you have to shlep down to the store to get a new one.
  • Dedicated Hosting: So you still have a single physical box, but you don’t own it, you rent it from the data center.  This cost hundreds up to thousands per month, depending on how fancy you wanted to get.  You are responsible for the software, but they take care of the hardware.  When a network card dies, they swap it out for a new one.
  • Shared Hosting: Instead of renting a whole server, you just rent a few folders.  This option is very popular for very small sites, and can cost as little as $5-$10/month.  You have very little control over the enviornment though, and you’re fighting everyone else on that server for resources.
  • Virtual Hosting: A cross between Dedicated and Shared hosting.  You get a whole machine, but it’s a virtual machine (VM) running on a physical machine with a bunch of other virtual machines.  This is the ground work for Infrastructure as a service.  You get a lot more control of the operating system, and supposedly you are not fighting with the other VMs for resources, but in reality there can always be some contention, especially for disk I/O.  The cost is usually significantly less than dedicated hosting.  You don’t care at all about the physical machines, because if one piece of physical hardware fails, you can be transferred to another physical machine.

 

In today’s brave new cloudy buzzword world, you have:

  • Infrastructure-as-a-service: This is basically Virtual Hosting, where you get a virtual machine and all of the physical machine info is abstracted away from you.  You say you want a Windows 2008 Standard Server, and in a few minutes you have a VM running that OS.  Amazon EC2 is the classic example of this.
  • Platform-as-a-Service: This is one level higher in the abstraction.  It means that you write some code, and package it up in a certain way, give it some general hosting information like host name and number of instances, and then the hosting company takes it from there.  Windows Azure is an example of this, along with Google App Engine.
  • Software-as-a-Service (SaaS): This means that someone is running some software that you depend on.  Either you interact with it directly, or your software interacts with it.  You don’t own or write or host any code yourself.  The classic example of this is SalesForce.com.

 

So why is Azure and PaaS more awesome than the other options?

Because it let’s me focus on the stuff that really care about, which is building software.  As long as I follow the rules for building Azure web applications, I don’t have to worry about any of the operations stuff that I’m really not an expert in, like have I applied the right Windows updates and is my application domain identity setup up correctly and how do I add machines to the load balancer and a whole lot of other stuff I don’t even know that I don’t know.

Some IT balk at this and insist that you should control your whole stack, down to the physical servers.  That is a great goal once you get big enough to hire those folks, but when you are getting started in a business, your time is your most valuable asset, and you need a zero-entry ramp and you need to defer as much as possible to experts.  If you are spending time running Windows Updates on your servers when you are the only developer and you could be coding, you are robbing your company blind.

Shared hosting platforms were close to solving this problem.  As long as your website was just a website, and it’s small, you could host it on a shared hosting service and not worry about anything, until somebody else pegs the CPU or memory.  Or if you need to go outside the box a little and run a persistent scheduled background process.  Also, scaling across mulitple servers is pretty much out of the question, you are stuck with “less than one server” of capacity, and you can never go higher. 

But after you grow out of shared hosting and move up to dedicated hosting or virtual hosting, it costs a whole lot more per month (like 5x or 10x), and the increased maintenance effort is even worse.  It’s a pretty step cliff to jump off from shared to dedicated/virtual hosting.

Azure fills that gap more cleanly nicely.  You are still just focusing on the application instead of the server, but you get a lot more power with features like Worker Roles and Azure Storage, and you can even expand out into full blown VMs if you really need it.

Ah ha, VMs!  What about them?  And Azure Websites?

By the time you’ve read this blog post, I’m sure the Azure team will have come out with 27 new features.  Ever since the Scott Gu took over Azure, the rate at which Azure’s been releasing new features has gotten a little ridiculous.  Two for of the more interesting features are Azure VMs and Azure Websites.

Azure VMs were late feature that it seems like the Azure team didn’t even really want to add.  Every Azure web instance is actually a VM, so this lets you remote into the underlying machine like it was a regular Windows server, or even create new VMs by uploading the an existing VM image.  This was introduced so that companies could have an easier migration path to Azure.  If they app still needed some refactoring to fit cleanly into an Azure web or worker role, or it had dependencies on other systems that would not fit into an Azure application, it gives them a bridge to get there, instead of having to rewrite the whole world in one day.  But to be clear, this was not introduced because it’s a good idea to run a bunch of VMs in Azure, because it misses out on the core abstraction and functionality that Azure offers.  If you really just want VMs, just go to Amazon EC2, they are the experts.

Azure Website are a more recent feature (still in Beta) which mimics shared hosting in the Azure world.  While the feature set is more involved than your run of the mill shared hosting platform, it does not nearly give you the power that Azure Cloud Services provides.  They work best with simple or canned websites, like DotNetNuke, Orchard CMS, or WordPress.  In fact, right now we’re testing out moving this blog and the MMDB Solutions website to Azure Websites to consolidate and simplify our infrastructure.

The End…?

In the coming blog posts, I’ll cover some more stuff like creating an account, setting up an Azure web application, deploying it, dealing with SQL Azure, and lots more.  Stay tuned.

How to use SourceGear DiffMerge in SourceSafe, TFS, and SVN

What is DiffMerge

DiffMerge is yet-another-diff-and-merge-tool from the fine folks at SourceGear.  It’s awesome.  It’s head and shoulders above whatever junky diff tool they provided with your source control platform, unless of course you’re already using Vault.  Eric Sink, the founder of SourceGear, wrote about it here.  By the way, Eric’s blog is easily one of the most valuable I’ve read, and while it doesn’t get much love these days, there’s a lot of great stuff there, and it’s even worth going back and reading from the beginning if you haven’t seen it.

Are there better diff tools out there?  Sure, there probably are.  I’m sure you have your favorite.  If you’re using something already that works for you, great.  DiffMerge is just yet another great option to consider when you’re getting started.

You sound like a sleazy used car salesman

Yeah, I probably do, but I don’t work for SourceGear and have no financial interest in their products.  I’ve just been a very happy user of Vault and DiffMerge for years.  And it if increases Vault adoption, both among development shops and development tool vendors, it will make my life easier.

But when I go to work on long-term contracts for large clients, they already have source control in place that they want me to use, which is OK, but when I need to do some merging, it starts getting painful.  I want it to tell me not just that a line changed, but exactly what in that line changed.  I want to it actually be able to tell me the only change is whitespace.  I want it to offer me a clean and intuitive interface.  Crazy, I know.

Not a huge problem because DiffMerge is free, and it can plug into just about any source control system, replacing the existing settings.  However those settings can be tricky to figure out, so I figured I’d put together a cheat sheet of how to set it up for various platforms.

Adding DiffMerge To SourceSafe

Let’s start off with those in greatest need, ye old SourceSafe users.  First and foremost, I’m sorry.  We all feel bad that you are in this position.  SourceSafe was great for what it was, 15 years ago when file shares were considered a reliable data interchange format, but nobody should have to suffer through SourceSafe in this day and age.  But don’t worry, adding in DiffMerge can add just enough pretty flowers to your dung heap of a source control system to make it bearable.  Just like getting 1 hour of yard time when you’ve been in the hole for a week, it gives you something look forward to.

Anywho, let’s get started.  First, whip out your SourceSafe explorer:

DiffMerge_VSS_1

Here’s what we get for a standard VSS diff:

DiffMerge_VSS_2

Ugh.  So go to Tools->Options and go to the Custom Editors Tab.  From there, the following operations:

Operation: File Difference

File Extension: .*

Command:  [DiffMergePath]\diffmerge.exe –title1=”original version” –title2=”modified version” %1 %2

Operation: File Merge

File Merge: .*

Command: [DiffMergePath]\diffmerge.exe –title1=”source branch” –title2=”base version” –title3=”destination branch” –result=%4 %1 %3 %2

Now here’s our diff, much less painful:

DiffMerge_VSS_3

But merging is where it really shines:

DiffMerge_VSS_4

Thanks to Paul Roub from Source Gear for the details: http://blog.roub.net/2007/11/diffmerge_in_vss.html

Adding DiffMerge To Subversion

Obviously SVN is worlds better than VSS, but some of the standard tools distributed with TortoiseSVN are a little lacking.  You might say “you get what you paid for,” but you’d only say that if you wanted to tick off a lot of smart and helpful people.

So let’s take a look at a standard diff in SVN:

DiffMerge_SVN_1

Oof.  I’ve used SVN on and off for years, and I still don’t understand what is going on here.

So let’s get this a little mo’ better.  Right click your folder, and select TortoiseSVN->Settings.  Go to the External Programs->Diff Viewer, and enter this External tool:

 [DiffMergePath]\DiffMerge.exe /t1=Mine /t2=Original %mine %base

DiffMerge_SVN_2

Switch over to the Merge Tool screen, and enter this External Tool:

[DiffMergePath]\DiffMerge.exe /t1=Mine /t2=Base /t3=Theirs /r=%merged %mine %base %theirs

DiffMerge_SVN_3

And now our diffs look a little more familiar:

DiffMerge_SVN_4

Thanks Mark Porter for the details: http://www.manik-software.co.uk/blog/post/TortoiseSVN-and-DiffMerge.aspx

Adding DiffMerge To Team Foundation Server

For years I dreamed of using TFS.  I hoped that someday I would work at a company successful and cool enough to invest the money in a TFS solution.  And then I actually got it, and uh, it’s seems like a nice enough fella, but it seems that its tendencies towards megalomania have really had some negative consequences on the end-user experience.

Given that, after decades of technological advancement in source control, the TFS diff tool is pretty much just the same ugliness as SourceSafe:

DiffMerge_TFS_1

Get your spelunking helmet on, and we’ll go digging for the settings in TFS to change this.

  • Open up Visual Studio and select Tools->Options
  • Expand the Source Control group, and select Visual Studio Team Foundation Server
  • Click the Configure User Tools button

DiffMerge_TFS_2

Enter the following tool configurations:

Operation: Compare

Extension: .*

Command: [DiffMergePath]\DiffMerge.exe

Arguments: /title1=%6 /title2=%7 %1 %2

Operation: Merge

Extension: .*

Command: [DiffMergePath]\DiffMerge.exe

Arguments: /title1=%6 /title2=%7 /title3=%8 /result=%4 %1 %2 %3 (Corrected, thanks to Rune in the comments!)

Thanks to James Manning for the details: http://blogs.msdn.com/b/jmanning/archive/2006/02/20/diff-merge-configuration-in-team-foundation-common-command-and-argument-values.aspx

The End

So that’s all it takes to make your source control life a little bit easier.  Even if you don’t prefer DiffMerge, I’d suggest you find one you do like, because the built-in tools are usually pretty bad.  Diffing and merging is hard enough as it is, don’t waste precious brain cells on substandard tools.

Why?

Last year we launched a new version of SportsCommander.com, which offered volleyball organizations across the country the ability to promote their tournaments and accept registrations for a negligible fee.  Having grown out of our previous hosting company, we tried hosting the platform on Windows Azure, and for the most part it’s been great.  Also, the price was right.

We are also hosting our data in SQL Azure, which again for the most part has been great.  It has performed well enough for our needs, and it abstracts away a lot of the IT/DBA maintenance issues that we would really rather not worry about.

Of course, nothing is perfect.  We’ve had a few snags with Azure, all of which we were able to work around, but it was a headache. 

One of the biggest issues for us was the ability to run regular backups of our data, for both disaster recovery and testing purposes.  SQL Azure does a great job of abstracting away the maintenance details, but one of the things you lose is direct access to the SQL backup and restore functionality.  This was almost a deal-breaker for us.

Microsoft’s response to this issue is that they handle all of the backups and restores for you, so that if something went wrong with the data center, they would handle getting everything up and running again.  Obviously this only solves part of the problem, because many companies want to have their own archive copies of their databases, and personally I think doing a backup before a code deployment should be an absolute requirement.  Their answer has been “if you need your own backups, you need to build your own solution.”

Microsoft is aware of this need, and it has been the top-voted issue on their Azure UserVoice site for a while. 

In poking around the interwebs, I saw some general discussion of how to work around this, but very little concrete detail.  After hacking around for a while, I came up with a solution that has worked serviceably well for us, so I figured I’d share it with y’all.

 

What?

In order to address these concerns, Microsoft introduced the ability to copy a database in SQL Azure.  So, as a limited backup option, you can create a quick copy of your database before a deployment, and quickly restore it back if something fails.  However, this does not allow for archiving or exporting the data from SQL Azure, so all of the data is still trapped in the Azure universe.

Apparently another option is SSIS.  Since you can connect to Azure through a standard SQL connection, theoretically you could export the data this way.  Now I am no SSIS ninja, so I was just never able to get this working with Azure, and I was spending far too much time on something that I shouldn’t need to be spending much time on.

I’ve heard rumblings Microsoft’s Sync Framework could address the issue, but, uh, see the previous point.  Who’s got time for that?

So of course, Red Gate to the rescue.  Generally speaking their SQL Compare and SQL Data Compare solve this type of problem beautifully, they are excellent at copying SQL content from one server to another to keep them in sync.  The latest formal release of their products (v8.5 as of this writing) does not support SQL Azure.  However, they do have beta versions of their new v9.0 products, which do support SQL Azure.  Right now you can get time-locked beta versions for free, so get yourself over to http://www.red-gate.com/Azure and see if they are still available.  If you’re reading this after the beta program has expired, just pony up the cash and by them, they are beyond worth it.

 

How?

OK, so how do we set this all up?  Basically, we create a scheduled task that creates a copy of the database on SQL Azure, downloads the copy to a local SQL Server database, and then creates a zipped backup of that database.

First, you need a SQL Server database server.  And go install the Azure-enabled versions of SQL Compare and SQL Data Compare.

Also, go get a copy of 7-Zip, if you have any interest in zipping the backups.

The scheduled task will execute a batch file.  Here’s that batch file:

SET SqlAzureServerName=[censored]
SET SqlAzureUserName=[censored]
SET SqlAzurePassword=[censored]

SET LocalSqlServerName=[censored]
SET LocalSqlUserName=[censored]
SET LocalSqlPassword=[censored]

echo Creating backup on Azure server
sqlcmd -U
%SqlAzureUserName%@%SqlAzureServerName% -P %SqlAzurePassword% -S %SqlAzureServerName% -d master -i C:\SQLBackups\DropAndRecreateAzureDatabase.sql

echo Backup on Azure server complete

echo Create local database SportsCommander_NightlyBackup
sqlcmd -U %LocalSqlUserName% -P %LocalSqlPassword% -S %LocalSqlServerName% -d master -i C:\SQLBackups\DropAndRecreateLocalDatabase.sql

echo Synchronizing schema
"C:\Program Files (x86)\Red Gate\SQL Compare 9\SQLCompare.exe" /s1:%SqlAzureServerName% /db1:SportsCommanderBackup /u1:%SqlAzureUserName% /p1:%SqlAzurePassword% /s2:%LocalSqlServerName% /db2:SportsCommander_NightlyBackup /u2:%LocalSqlUserName% /p2:%LocalSqlPassword% /sync

echo Synchronizing data
"C:\Program Files (x86)\Red Gate\SQL Data Compare 9\SQLDataCompare.exe" /s1:%SqlAzureServerName% /db1:SportsCommanderBackup /u1:%SqlAzureUserName% /p1:%SqlAzurePassword% /s2:%LocalSqlServerName% /db2:SportsCommander_NightlyBackup /u2:%LocalSqlUserName% /p2:%LocalSqlPassword% /sync

echo Backup Local Database
for /f "tokens=1-4 delims=/- " %%a in (‘date /t’) do set XDate=%%d_%%b_%%c
for /f "tokens=1-2 delims=: " %%a in (‘time /t’) do set XTime=%%a_%%b
SET BackupName=SportsCommander_Backup_%XDate%_%XTime%
sqlcmd -U %LocalSqlUserName% -P %LocalSqlPassword% -S %LocalSqlServerName% -d master -Q "BACKUP DATABASE SportsCommander_NightlyBackup TO DISK = ‘C:\SQLBackups\%BackupName%.bak’"

"C:\Program Files\7-Zip\7z.exe" a "C:\SQLBackups\%BackupName%.zip" "C:\SQLBackups\%BackupName%.bak"

del /F /Q  "C:\SQLBackups\%BackupName%.bak"

echo Anonymize Database For Test Usage
sqlcmd -U %LocalSqlUserName% -P %LocalSqlPassword% -S %LocalSqlServerName% -d SportsCommander_NightlyBackup -i "C:\SQLBackups\AnonymizeDatabase.sql"

 

The first thing this does is run a SQL script against the SQL Azure server (DropAndRecreateAzureDatabase.sql).  This script will create a backup copy of the database on Azure, using their new copy-database functionality.  Here’s that script:

DROP DATABASE SportsCommanderBackup
GO
CREATE DATABASE SportsCommanderBackup AS COPY OF SportsCommander
GO
DECLARE @intSanityCheck INT
SET @intSanityCheck = 0
WHILE(@intSanityCheck < 100 AND (SELECT state_desc FROM sys.databases WHERE name=’SportsCommanderBackup’) = ‘COPYING’)
BEGIN
— wait for 10 seconds
WAITFOR DELAY ’00:00:10′
SET @intSanityCheck = @intSanityCheck+1
END
GO
DECLARE @vchState VARCHAR(200)
SET @vchState = (SELECT state_desc FROM sys.databases WHERE name=’SportsCommanderBackup’)
IF(@vchState != ‘ONLINE’)
BEGIN
DECLARE @vchError VARCHAR(200)
SET @vchError = ‘Failed to copy database, state = ”’ + @vchState + ””
RAISERROR (@vchError, 16, 1)
END
GO

 

A few notes here:

  • We are always overwriting the last copy of the backup.  This is not an archive; that will be on the local server.  Instead, this always the latest copy.  Besides, extra Azure databases are expensive.
  • For some reason SQL Azure won’t let you run a DROP DATABASE command in a batch with other commands, even though SQL 2008 allows it.  As a result, we can’t wrap the DROP DATABASE in an “IF(EXISTS(“ clause.  So, we need to always just drop the database, which means you’ll have to create an initial copy the database drop for the first time you run the script.
  • The CREATE DATABASE … AS COPY OF will return almost immediately, and the database will be created, but it is not done the copying.  That is actually still running in the background, and it could take a minute or two to complete depending on the size of the database.  Because of that, we sit in a loop and wait for the copy to finish before continuing.  We put a sanity check in there to throw an exception just in case it runs forever.

 

Once that is complete, we create a local database and copy the Azure database down into that.  There are several ways to do this, but we chose to keep a single most-recent version on the server, and then zipped backups as an archive.  This gives a good balance of being able to look at and test against the most recent data, and having access to archived history if we really need it, while using up as little disk space as possible.

In order to create the local database, we run a very similar script (DropAndRecreateLocalDatabase.sql):

IF(EXISTS(SELECT * FROM sys.databases WHERE Name=’SportsCommander_NightlyBackup’))
BEGIN
DROP DATABASE SportsCommander_NightlyBackup
END
CREATE DATABASE SportsCommander_NightlyBackup

 

In this case, we actually can wrap the DROP DATABASE command in a “IF(EXISTS”, which makes me feel all warm and fuzzy.

After that, it’s a matter of calling the SQL Compare command line to copy the schema down to the new database, and then calling SQL Data Compare to copy the data down into the schema.  At this point we have a complete copy of the database exported from SQL Azure.

As some general maintenance, we then call sqlcmd to backup the database out to time-stamped file on the drive, and then calling 7-Zip to compress it.  You might want to consider dumping this out to a DropBox folder, and boom-goes-the-dynamite, you’ve got some seriously backed-up databii.

Lastly, we run an AnonymizeDatabase.sql script to clear out and reset all of the email addresses, so that we can use the database in a test environment without fear of accidentally sending bogus test emails out to our users, which I’ve done before and it never reflected well on us.

Run that batch file anytime you want to get a backup, or create a scheduled task in Windows to run it every night.

Anyhoo, that’s about it.  It’s quick, it’s dirty, but it worked for us in a pinch.  Microsoft is just getting rolling on Azure and adding more stuff every month, so I’m sure they will provide a more elegant solution sooner or later, but this will get us by for now.

Have you had a similar experience?  How are you handling SQL Azure backups?

Prologue

So I was working for a company a few years ago. The company had been around for a while, and had a bunch of genuinely intelligent senior developers working for them.

Over the course of several years, inevitably a set of common problems arose. Writing data access code was repetitive and error-prone. Many teams were sharing the same database but using different code, so there were inconsistencies in how similar business data was treated. Each application’s log files where all over the place, and each had their own approach for error handling. Some teams would go off on their own and unnecessarily reinvent the wheel. Moving a developer from one team to another required as much ramp-up time as a new hire. Each team used a different build and versioning strategy, with the most common strategy being “none.” Setting up a test environment with multiple applications took days. Recreating the production environment was virtually impossible. Chaos. Dogs and cats living together. Mass hysteria.

To address these issues, some of the more senior architects took it up themselves to build a framework that would greatly simplify everyone’s life. By putting in a little design upfront, they could build a framework layer that would solve many of the problems that the developers had been muddling through over the years, while at the same time homogenizing the code base.

Of course, the company had attempted this before. In fact, there were several previous frameworks over the years. But those previous frameworks were not as good as they could be, either because of design flaws, or changes in the way that the company’s applications work, or because they used outdated technology, or because the previous no-longer-with-the-company designers were now generally considered to be idiots. Anyhow, the new architects had learned from these mistakes, and were designing a new framework that would do a much better job of solving the problems. Due to the scope of such a project, it had been a work-in-progress for about a year. Sure, they were still working out some kinks, and it was not completely finalized yet, but this is a technology investment, and some growing pains were to be expected.

Déjà vu

Well, OK, I lied. This wasn’t one company I worked for. It was several different companies, all with the same story. In fact, it’s a little eerie how much you see this exact scenario play out at companies all over the industry.

Some senior developers have identified some recurring pain points for the developers, and they want to do something about it. As the company has grown, more and more developers have come on board, each with less and less experience, and things need to be brought back under control. By providing a framework, you can lay out the boundaries for developers to operating in, which will encourage consistency, will encourage code reuse, and in the end will allow the company to produce higher-quality software in less time with fewer developers, which will require less maintenance cost over time. In other words, it will pursue the single ultimate goal that should be at the center of every design decision, namely that which will advance the overall long-term profitability of the company more than any other option.

It sounds like a brilliant idea. And if it were to be accomplished, it would be great. But the unfortunate truth is that it doesn’t work. Without exception, I have only seen this result in more work for the developers, longer development cycles, more bugs, poorly compromised designs, and (worst of all) excessive, unhealthily conflict between the development teams.

Admit it, you’ve seen it too. Maybe you were even involved.

So What’s The Problem?

So why does this go wrong?

The problem is usually not intent. I don’t want to make it sound like the people involved are morons or that they are have any desire to do harm to their company. In fact, usually they are excellent developers who are trying to do the best they can to solve a problem. I can’t fault them for that. I just think the approach is a little misguided.

And the problem is not the people down in the trenches, pushing back on every change the framework team wants to introduce. No, these people are trying to get a job done. Their marching orders are not to solve the whole company’s crosscutting problems, but to ship their product on time and in budget, and many of them believe, perhaps rightfully so, that the framework keeps them from doing that as efficiently as they could.

Again, the problem is the approach.

The Challenge Of Frameworks

So what is a framework? Generally, people think of a framework as something that helps you get your job done by providing access to new functionality that you didn’t have before. This is usually the selling point, used when convincing a manager or developer that this is all such a great idea, but the reality is that the true nature of a framework lies not in what it helps you to do, but rather in how it limits you.

For example, the Slightly-Almighty, Moderately-Informative, Usually-Reliable Wikipedia says:

A software framework, in computer programming, is an abstraction in which common code providing generic functionality can be selectively overridden or specialized by user code providing specific functionality…

Software frameworks have these distinguishing features that separate them from libraries or normal user applications:

1. inversion of control – In a framework, unlike in libraries or normal user applications, the overall program’s flow of control is not dictated by the caller, but by the framework.[1]

2. default behavior – A framework has a default behavior. This default behavior must actually be some useful behavior and not a series of no-ops.

3. extensibility – A framework can be extended by the user usually by selective overriding or specialized by user code providing specific functionality.

4. non-modifiable framework code – The framework code, in general, is not allowed to be modified. Users can extend the framework, but not modify its code.

 

One of the key points here is that the framework is dictating the application flow, rather than the developer who is using it. This is what the Martin Fowler (who literally wrote the book on refactoring) would describe as a Foundation Framework:

A Foundation Framework is … built prior to any application that are built on top of it. The idea is that you analyze the needs of the various applications that need the framework, then you build the framework. Once the framework is complete you then build applications on top of it. The point is that the framework really needs to have a stable API before you start work on the applications, otherwise changes to the framework will be hard to manage due to their knock-on effects with the applications.

While this sounds reasonable in theory, I’ve always seen this work badly in practice. The problem is that it’s very hard to understand the real needs of the framework. As a result the framework ends up with far more capabilities that are really needed. Often its capabilities don’t really match what that the applications really need.

 

He recommends instead a Harvested Framework:

To build a framework by harvesting, you start by not trying to build a framework, but by building an application. While you build the application you don’t try to develop generic code, but you do work hard to build a well-factored and well designed application.

With one application built you then build another application which has at least some similar needs to the first one. While you do this you pay attention to any duplication between the second and first application. As you find duplication you factor out into a common area, this common area is the proto-framework.

As you develop further applications each one further refines the framework area of the code. During the first couple of applications you’d keep everything in a single code base. After a few rounds of this the framework should begin to stabilize and you can separate out the code bases.

While this sounds harder and less efficient than FoundationFramework it seems to work better in practice.

 

I’m not sure I would even call this a framework, because all of the things that make it work best are the parts that take it further and further from being a conventional “framework”.

So Are All Frameworks Bad?

Sweet suffering succotash, no. In my humble opinion, the .NET Framework is a thing of beauty. Back in my Win32 C++ days, MFC was not perfect, but worked serviceably well for what it was intended for, namely abstracting away the Win32 API. CMS frameworks like DotNetNuke and Drupal and Joomla have been become very popular. Apparently there is a subset of people who don’t hate Flash with a passion, and apparently those people love it. MVC frameworks like Rails and Django have caught on like wildfire, with ASP.NET MVC picking up a lot of momentum as well. Microsoft Azure and Google AppEngine are in the process of changing how we will build scalable cloud-based applications into the next decade.

Have you noticed a pattern here? None of them were built by you or anyone you know. They were not built to solve a business need, they were built to reinvent a platform. They were not built to get everyone using a “Customer” object the same way, they were build to make it easier for you to do whatever you want with whatever data you need. They were not built by 3 architects for 20 developers, they were built by 30 or 300 architects for 20,000 or 200,000 developers or more. They were not designed and built and delivered and completed in a few months, they were talked design and dog-fooded and tested and tweaked and redesigned over years by some of the smartest computer science experts in the business. Any yet, despite all that, most of them still sucked, and the ones we use today are the select few that survived.

The thing is this: you and your internal development team of architects are not going to build the next great framework. You’re not going to build a good one. You’re not even going to build an acceptable one.

And the other thing is this: If a framework is not great, it is awful, counterproductive, and destructive.

Get In Line

By definition, most frameworks try to define a new way for your developers to develop software. The keep you from doing certain things that have been seen as problematic, and require you do things the “right way”, assuming of course that the architects have actually thought through the right way to do things.

The problem is that there are plenty of good ways already, ways that those developers are already trained in and have spend years mastering, and you are not really as clever as you think you are. You can’t think of everything. Even if you could, you can’t design for everything. And even if you could, you shouldn’t. Trying to shoehorn them into an incomplete, shoddy, and unnecessarily restrictive framework will only breed resentment, at which point you are bleeding money that you’ll never see on an expense report. The productivity difference between a happy developer and a disgruntled developer is enormous, and constantly underestimated. You will also alienate and drive away all of your good developers, leaving you only with the not-so-great developers that really don’t have any better options.

Atlas Shrugged

In order to create a framework, you are taking on a massive responsibility. It’s not a case of adding a few features. You are building an entirely new layer of abstraction. In doing so, it is your responsibility to ensure that your framework provides the developer with every possible thing he will need, otherwise he will be stuck. If you create a data access framework, but never quite could figure out how to get BLOBs working, you’re really leaving the developer up a creek when he needs to store BLOBs. Sure, it’s a growth process, and there are always bugs to be fixed and features to be added, but when you are forcing a development team to use this framework, and in the 11th hour they realize it doesn’t have a critical feature that they need, you are introducing more obstacles then you are removing.

But We Have To Create Reusable Code!

No you don’t. Reusable code is bad.

So What?

So do we give up? Maybe. It depends.

So how do you boil the ocean? There are two answers:

1. You don’t. It’s too big of a problem.

2. One pot at a time.

It all depends on your goal.  Is it critical that you actually boil then entire ocean, or do you benefit from every little bit?

Ask yourself, do you REALLY need a framework? Do you REALLY have a problem here that needs to be solved? Do you REALLY think you will save your company time and money by pursuing this? Be honest. Try to be objective. If you find yourself getting the slightest bit excited about the idea of building a framework, recuse yourself from the decision because you are not thinking clearly.

Sure, a well-design framework may save time and money once it is complete, but it may never be complete, and it may never be any good, and the small improvement may not save your company enough to justify the huge expense. As awful as it may seem, the honest answer may be that it is in your company’s best interest to plow ahead with the same busted up junk you’ve had all along. It may not be the most rewarding thing in the world, but you are probably not getting paid to fulfill your dreams, you are getting paid to write the damn code.

So what do we do? Are we doomed to mediocrity? Not necessarily. The other option is to get your head out of the clouds and solve a few small problems at a time. Keep your eye out for redundancies and duplicated code, make note of them, but don’t do anything right away. When you are building a new component, don’t pull your hair out if it slightly resembles some other code, you don’t have to reuse use everything. Once you’ve identified a few low-hanging redundancies, go back and build some small libraries to consolidate that code. Don’t think big picture. Keep it simple. Keep it low-impact. Keep it clean. Put the big guns away. Keep constantly refactoring to make it a little better every day, and before you know it you’ll have system that doesn’t completely suck.

Too much legacy code never gets cleaned up because everyone thinks it is too hard to throw it all out and rewrite it, and no project manager will allow the developers to waste time refactoring code that already works. They are all probably right. A huge refactoring project is probably a waste of money, and it will almost certainly fail. But small, steady, incremental improvements will almost certainly make your world a better place.

OK, maybe it’s not always inherently bad, but it is definitely not what it’s cracked up to be. And it’s definitely not the end-all-be-all that most of us were taught.

In my early years, I was dogmatically taught never to write the same code twice. Anytime you see duplicated code, or even similar code, refactor it into a function, right? What if the code was slightly different? Create some parameters. What if you need one extra field in some cases? Add a little logic around that to decide what to return. What if you run into a case where you can’t fix one caller without breaking another caller? Add another layer of abstraction and logical branching to your function. At this point, you started out trying to improve your code-hygiene a little bit, and all you’ve done is spend the whole day creating a monstrosity that is almost impossible to understand. Well sure, you don’t think so, but hey, nobody likes to admit their kids are ugly.

Always, always, always, stop and ask yourself, “Am I creating more problems them I am solving?”

Keep It Simple, [Obscenity]!

The fact of the matter is, when you try to create reusable code, you are inevitably making things a lot more complicated. It’s rarely a case of just taking what you have already written and exposing it to another component or team.

If you are using the code for yourself, you just have to make sure it works for you based on how you are going to use it, and you should be able to clearly visualize all of the possible scenarios. However, in order to make the code reusable, you are creating a black box to be called by another developer; as such you need to make it work no matter what, with any sort of crazy input, and you will need to make sure that it is clearly reporting back to the caller when something goes wrong. You need to make sure that the interface makes sense to everyone, not just you. You may even need to write up some documentation (ugh). And when you are all done and have covered every base, someone will call your component in a way you never expected, and of course it won’t work because all of the code we write is rife with bugs, and everyone will argue about how this component “should” be used, and someone will demand to see the unit test coverage stats, and someone else will schedule a meeting to define the scope, and someone else ramble on about how we need a framework, and the manager rolls his eyes and swears never to let these damn developers try to build something fancy again, and nobody is actually getting any real work done.

Worser Is More Gooder

Of course, some will argue that you should always try to make your components as reusable as possible, because it requires you to work through all of those quality issues no matter what, and will produce better software. This would be great if your sole goal was to create the best software in the world, but nobody is paying you to do that. No, you are being paid to write software of sufficient quality within the time and budget allotted. If you spend unnecessarily time gold-plating your code, it may make you feel cool, but it is outright wasteful. I’m not saying to cut corners when you should not, but I what I AM saying is that there definitely are times when it actually does make sense to cut comers. You need to draw a line in the sand of exactly how good this product really needs to be and stick to that, otherwise you’ll never finish. You can software that is really cool, or you can build software that ships. Only one of those options is actually useful to society.

But, But, But….

OK, so don’t make something reusable if you done need to. However, what if you think you are going to need it to be reusable? I have a secret to tell you: YAGNI.

I can’t tell you how many times I’ve been working on a very specific implementation to solve a very specific problem, and somebody (who usually has initials in his job title) chimes in saying “well, make this and that and the other thing configurable, so when we sell some other product to some other company, we can include it just by flipping a few switches.” This person is figuring that “it’s just an ‘if’ statement, what’s the big deal?”, but that person is not taking into account the increased code complexity and the exponential increase in testing requirements. Often times, this is even a former developer, who actually knows how code works, but has been out of the trenches for long enough that it has impaired their judgment about code complexity and its day-to-day impacts.

But then again, math solves everything. Say your QA department originally had 143 test scripts to run through before release, and now you’ve added some new configurable switch into the system. If you really want to be thorough, you should now have 286 test scripts, because you now have to retest everything with the switch turned on, and then again with the switch turned off. Say you had actually added to different switches? 572 test scripts. That’s a lot of work for a feature that will never get used. And come on, between you and me and the desk, we all know that Mr. Executive SVP COO is never going to close those magical deals with XYZ Bank and ABC Manufacturing, partly because he has no idea what he’s selling, but also partly because you guys can’t ship anything because you now have 1144 test scripts to run through to get a small release out the door.

So How Do I Know What To Reuse?

If you aren’t going stand around and guess what to reuse, how we will know? Easy, stop trying to prematurely reuse anything. Instead, refactor your existing code to reuse that which is already duplicated, where you can reasonably say will actually benefit from reuse and reduce overall maintenance and complexity.

How do you know when that is? Use that fancy experience and judgment of yours, it’s the only thing that separates you from the robots that will be doing your job some day.

Well what if I don’t have the necessary experience and judgment? Easy, pay me to write your code for you. mmooney@mmdbsolutions.com. </plug>

Anyhow, my rule of thumb is: Never build something to be reused until you actually have two systems that actually need to reuse it right now. If you only have one thing that will use it right now, and that other mythical feature will also use it some day, deal with it when that feature comes along (hint: never). Even if that feature comes along, the changes that you actually predicted and designed for it correctly is approximately zero, so you’re still going to have to make design changes anyway. However, if you kept it lean and simple, it’s simply a matter of adding the new code you need to a small lightweight base. But if you tried to build something big and fancy and configurable, you’ll now have to spend twice as much time disassembling all the unnecessarily complicated junk you build around it to support the thing that you were unsuccessfully trying to support in the first place. The easiest code to refactor is the code that doesn’t yet exist.

So yeah, YAGNI.

 

This post is part 2 of a 382-part series on how to manage database changes, primarily for SQL Server, starting here.

I figured this would be a good week to discuss ways that you can make your world a better place by making small changes to things you do in your everyday work.  No, this post is not about inconvenient truths or sustainability or hybrids or manbearpig.  This post is about the importance of local development databases. 

image

The situation you see all too often is that a development team has a single database server in their development environment that they share, and everyone is developing application code locally while simultaneously making changes to a shared database instance.  Bad, bad, bad.

Captain Obvious

These days, most everyone develops their code locally.  That’s just want you do.  Many developers have learned the hard way that this is important, and won’t tolerate any divergence.  And for the less experienced developers who are doing it just because they are told to, eventually they will make the same mistakes and learn from them too.  This is such an easy lesson to learn that you don’t see too many people violate it intentionally.

Even if you HAVE to develop on a server environment, you’ll probably at least find a way to isolate yourself.  For example, SharePoint developers don’t tend to install the SharePoint platform on their local machines, mostly because it requires a server OS, but also because SharePoint is pure, unadulterated evil that will steal the very soul of any machine it comes into contact with.  Nonetheless, in those cases where a local machine is not practical, the developer will install the SharePoint software onto a virtual machine so that they can still work in isolation.

This local development approach is critically important to any form of version control or change management.  For all practical purposes, developers must have a workable environment that they can fully control and work in peace.  From there, developers check their code into source control, and hopefully it gets built from source control before being deployed to another server.  This gives each developer a degree of control over how much the other developers can screw them up, and more importantly it ensures that every change is traceable back to a date and time and person responsible.

This approach is so ingrained in so many developers, that often we take it for granted.  Just try to remind yourself regularly how awful it was that time that everyone was working directly on the same developer server, and nobody can keep track of who changed what when.  Or better yet, how fired up everyone got the last time somebody sneaked directly into the production server and started mucking around.

The database is like the village bicycle

I am consistently surprised how often developers go to so much trouble to isolate their local application development environment, and then point their local application code to a shared development database server that the whole team is working on.

If you never need to make database changes, and nobody on your team needs to make database changes, this can certainly work.  In that case, the database behaves like a third party service, rather than an actively developed part of the system.

However, if you are ever making database changes, you need to isolate your database for the exact same reasons that you need to isolate your application code.

Imagine that you working on a project that involves several layers of DLLs communicating with each other.  Because you are in active development, you and your team are constantly making changes that affect the interfaces between those DLLs.  The result in is that you continually need to check in your changes in a whole batches; you can’t just check in a few files here and there because you will be breaking the interfaces for anyone else working in that code. 

The same rules must apply to the databases as well, for all of the same reasons.  At any given point in time, anyone should be able to pull the code that is in source control, build it, and run it.  However, if I’m making a series of changes to my local code and the shared development database, my crazy C# changes are isolated on my local machine, but coworkers are getting my database changes as they happen, so their systems will stop working all of the sudden, and they won’t even know why, or worse yet they will know exactly why and I’ll be the guy “who busted everything up.” 

Better yet, after a few days of wasting time on a bad design, I give up on it, and with one or two clicks I can undo all of my code changes and roll back to the main development code stream.  However, there is no one-click rollback to the database schema, and so now those changes need to be manually backed out.  Hopefully I kept a good list of the changes so I can do this without missing anything, but we all know that a few things will get missed, and now the development database becomes even more of a mutant branch of the true database schema, full of changes that nobody remember or owns, and it is all going to blow up and make us all look like fools when we are rushing to deploy it into QA next month.

DVCS isn’t all bad

Distributed Version Control Systems like Git and Mercurial are the fun new fad in version control, and everyone seems to think that they are so much cooler than more traditional and linear systems like Vault.  To me, it seems to grossly overcomplicate an already difficult issue by exacerbating the most problematic concepts, namely branching and merging.  But I’m a crusty old conservative who scoffs at anything new, so maybe (and even hopefully) I’m wrong.  I was quick to dismiss it as a new toy of bored astronauts, but some people at lot smarter than me have done the same and seem to be coming around to it, if not embracing it completely, so I will continue to believe that I am right for now, even though I know I’m probably wrong and will change my mind eventually.

But there is one idea in DVCS systems that I can get on board with, and that’s the idea that everyone is working in their own branch.  As we’ve discussed, you simply cannot be working in the same sandbox as everyone else, or you will have intractable chaos.  You should stay plugged into what everyone else is doing on a regular basis, usually through version control, but you must also isolate yourself, and you must do so thoroughly.

And here’s the thing (and this may very well be the idea that eventually opens my path to DVCS enlightenment): your local machine is branch.  Granted, it is not a very robust branch, because it only has two states (your current state and the latest thing in source control), but you are still essentially branched until you check in, in which case you will have to merge.  It might be a really small merge, because the changes were small or backwards compatible, or because you were not branched off locally for that long, or you are the only one working on a feature, or because you’ve been communicating with the rest of your team, or because you are the only person who actually does any work, but you are merging nonetheless.

What does this have to do with databases?  Branching is all about isolation.  You must isolate your development environment, and you must so thoroughly.  If you think of your machine as simply a branch of the source code, it crystallizes the idea that everything you are doing locally is a full stream of code, and it must contain everything needed to run that code, and must represent all of the changes in that code, including the database.  In a broader view, if you were to branch your code to represent a release or a patch or feature, you obviously should be branching your database code at the same time (assuming of course that your database is under version control).  If that is the case, and if the code on your local machine is nothing more than a primitive branch of what is in source control, then your local machine should also have its own copy of the database.

Database scripting is hard, let’s go shopping!

I know.  This makes pushing your code to the development server more difficult, because you have to script everything out, and honestly the last thing I really want to do is complicate the lives of developers.  In fact, I think the primary purpose of most development processes should be to reduce friction, and to make a developer’s life as worry-free as possible, so that they can focus on the real complicated business problems they are paid to solve, not the silly process crap that people invent to make themselves appear smart and organized.

That being the case, it may make your life a little harder to develop locally, and then write up all of the scripts necessary to push those changes to the dev server, but it is definitely worth it.  This is not a theoretical improvement that will hopefully save you time in the distant future, when design patters rule and everybody’s tasks are on index cards and you’ve achieved 100% code coverage in your unit tests.  No, this is a real, tangible, and immediate benefit, because you will save yourself effort when you deploy it to the next stage, namely QA or production.  At that point, you’ll already have everything organized and listed out, and you did so when you were still working on the code and everything is still fresh in your mind.  In my humble opinion, this is a much more maintainable process than everyone just trashing around in a wild-west development database, and then after it all spending days trying figure out which schema differences need to be included to release which features, because the change of getting that right consistently are almost non-existent.

And if this is really too much work for you to do well, maybe we can find you a ball to bounce with instead.  Or maybe some UML diagrams and crayons to draw with.  Either way, get the hell out of my code.

Beating a dead horse

Hopefully I’ve convinced you that you should have your own copy of the database for you local development.  I could go on forever giving locally reasons for this.  Or, I could give specific examples, like a recent client that did not follow this pattern, and our team was constantly breaking each other because the code and database was out of sync, even though we were just a small team of 3 experienced and practical developers sitting in a single office right next to each other, working together well and communicating all day, but the lack of database isolation made the issues unavoidable.

So yeah, I could go on and on.  But I won’t, because it’s getting boring.  If you’re agreeing with me by now, feel free to go read something else

I can’t run SQL Server locally, my machine sucks!

I’ve heard a lot of people say that they can’t run SQL Server locally, and sometimes they are right, but I think a lot of the time it is an excuse.

Maybe you don’t have a license to SQL Server.  That’s usually fine, SQL Server Express Edition is free.  Sure, it has some limitations, like the lack of a SQL profiler, but there are great free tools out there like the one from AnjLab.  And if you still need a full-featured copy, the developer edition costs less than $50.  Can you or your company not spare $50 for something like that?  Really?

Or maybe your machine doesn’t have enough memory.  It’s true, SQL will eat up memory like nobody’s business and if you have Visual Studio 2008 and Outlook 2007 running, it can be pretty heavy.  But I’ve found that as long as you have 3 or 4 GB of RAM, it works pretty well, and doesn’t everyone have that these days?  Sure, a lot of you are stuck with crappy old machines that your employer gave you because he considers you to be a high-priced janitor, and he can’t justify in his mind spending a few hundred extra to make help you be more productive, but in that case you have bigger problems than anything we’re going to solve here.  I would say, if possible, you should even shell out a few hundred and get some more memory for your machine, even if it’s a work machine and they won’t reimburse you for it.  I know plenty of people who would be opposed to this just out of principle, but those people and their principles can go have their own little pity party and see who comes; in the meantime I’d rather solve the problem and move on.

Too Beaucoup?

However there is certainly one potential problem that can be difficult to overcome.  What if your existing database is just too damn big to run locally?

One recent client had a production database which was used for a million unrelated purposes, and it was 30GB.  Another recent client had the vast majority of their of their business data spread across two databases that were each about 300 GB.  Sometimes, the database is just too big to copy down to your local machine.  There are a few ways to deal with the problem.

Sometimes the best option is to separate the schema and the data.  Strip down the data, get rid of the 300 GB, and get the minimum amount of sample data necessary to run your applications.  Maybe clear it out entirely, and have some scripts or automated tests that generate a batch of sample data.  Often times this will require a lot of analysis to determine what is necessary, and what all of the data is being used for, but that’s not an entirely bad thing.  If you get a good and portable development database out of it, while also getting a better understanding of how the data is being used, then that has a lot of benefits.  Granted, this is not easy by any stretch, but it may be doable.  It all depends on your situation.

Another option is to setup a single high-powered development database server, and give each developer their own instance of the database on the server.  This approach can have its own problems as well, such as people getting confused about which database instance belongs to who, and having enough disk space to store the inevitable terabytes of data.

So if you have any of these problems, my sympathies to you, and I hope that you can find a workable solution. 

The End

So hopefully you are now inspired to change your process at little bit, or just entertained for a little while. 

Or, if you’ve been trying to do the same thing but aren’t getting the buy-in you need, you have a little more reference material for saying “I read an article about this on the interwebs, and some guy said…”.  It’s one reason I make these things so long, as a favor to you, because the person you are arguing with isn’t actually going to read all of this crap, so maybe they will just give up and let you do whatever you want.  The person that wins the argument usually isn’t the person who is right, it’s usually the person who is willing to waste the most time arguing about it.

Good luck!