A while ago I had some posts about how to set up simple backups of SQL Azure to make up for a few holes in the tooling (here and here). I recently ran into a the same issue with RavenDB, and it required stringing a few pieces together, so I figured I’d write up the steps.
TL;DR
Yet again I started out to make a quick how-to, and ended up going into a lot of detail. Anyhow, here’s the short version:
- Download s3.exe from s3.codeplex.com
- Run this:
Raven.Smuggler.exe out http://[ServerName]:[Rort] [DatabaseName].dump --database=[DatabaseName] s3 auth /nogui [AccessKeyID] [SecretAccessKey] s3 put /nogui [S3BucketName]/[TargetS3Folder]/ [DatabaseName].dump
What is RavenDB?
RavenDB is a flat out awesome document database written in .NET. It’s sort of link MongoDB or CouchDB, but very Windows- and Microsoft-friendly, and has much better ACID and LINQ query support than your average document database.
Whether a document database is right for your project is a complicated question, well beyond the scope of this post, but if you decide you need one, and you’re doing .NET on Windows, RavenDB should be the first one you check out.
While document databases are not ideal for every situation, I’ve found them to be very good for message based applications, which pump data messages from one queue to another. I’ve used RavenDB as the default primary data store for SrirachaDeploy and MMDB.DataService, and besides a few bumps in the road, it’s worked great.
Types of backups in RavenDB
RavenDB offers two flavors of backups. One is the official “Backup and Restore” feature, which is very similar to a SQL Server backup/restore, including advanced options link incremental backups. This does a low-level backup of the ESENT files, including index data. Restores are all-or-nothing, so you can’t import a file if you’re going to overwrite data in the process.
The other type is the the “Smuggler” feature, which is more of a data import/export utility. This generates an extract file that contains all of the documents, index definitions, and attachments (controllable by command line parameters). It’s worth noting though, while Smuggler will backup the index definition, it does not backup the index data, so after you import via Smuggler you may have to wait a few minutes for your indexes to rebuild, depending on your data size. Since it’s just a data import, you can import it into an existing database, without deleting your existing records, and it will just append the new records, and override existing records if there is a matching ID.
The simplest way to get started with either option is to try them out in the RavenDB user interface. The RavenDB UI is continually evolving, but as of this writing, under the Tasks section there are Import/Export Database options that use Smuggler, and a Backup Database option as well.
Personally, I prefer Smuggler. It’s very easy to use, the defaults do what I want them to do most of the time, and it can do a low-impact data import to an existing database without blowing away existing test data. Also, because backup/restore feature uses some OS ESENT logic, it has some OS version portability limitations. In the end, I usually don’t want anything too fancy or even incremental, the first and foremost backup I want to get in place is “export ALL of my data in the most portable format possible on a regular schedule so I can always get another server running if this one fails, and I can restore it on my machine to recreate a production issue”, and Smuggler has fit that bill nicely.
RavenDB Periodic Backup
RavenDB does have a very cool feature called “Periodic Backup”. This option actually uses the Smuggler data export functionality, and runs incremental backups and uploads them to your hosting provider of choice (File system, Amazon Glacier, Amazon S3, or Azure storage).
The cool thing with this feature is that it’s easy to setup up without any many confusing options. My problem with this is that it doesn’t quite have enough options for me, or rather the defaults are not what I really want. Rather than doing incremental backups on a schedule, I want to be able to do a full backup any time I want. Unfortunately it doesn’t (yet?) offer the options to force a full backup, nor to force a backup on demand or a a specific time of day. I’m guessing that these features will continue to improve over time, but in the mean time this is not really what I’m looking for.
Smuggler Export 101
So how to get started with Smuggler? Of course, visit the documentation here, but here’s the short version for everything I usually need to do.
First, open a command line.
Yes a command line. What, you don’t like using the command line? Oh well, deal with it. I know, I have hated the command line through much of my career, and I fought against it, but complained about it. Then I gave up and embraced it. And guess what, it’s not that bad. There are plenty of things that are just plain easier to do in a command line and don’t always need a pointy-clicky interface. So please, just stop complaining and get over it, it’s all a part of being a developer these days. If you refuse to use a command line, you are tying a hand behind your back and refusing to use some of the most powerful tools at your disposal. Plus we are going to be scripting this to run every night, so guess what, that works a lot better with a command line. I’ll throw in some basic command line tips as we go.
Anyhow, in your command line, go to the Smuggler folder under your RavenDB installation (usually C:\RavenDB\Smuggler on my machines).
Tip: You don’t have type the whole line. Type part of a folder name and hit TAB, and it will autocomplete with the first match. Hit TAB a few times and it will cycle through all of matches. Even us a wildcard (like *.exe) with TAB and it will autocomplete the file name.
Type Raven.Smuggler.exe (or Raven + TAB a few times, or *.exe + TAB) to run Smuggler without and parameters, and you’ll get some detailed instructions.
The most common thing you want to do here is backup a whole database to a file. You do this with the command “Raven.Smugger.exe out [ServerUrl] [OutputFileName]”.
Note: the instructions here will dump the System database (if you use http://localhost:8080/ or something similar as your URL), which is almost certainly not what you want. It’s not entirely clear the documentation, but the way to export a specific database instance is to use a URL like “http://[servername]:[port]/databases/[databasename]”, or use the –-database variable at the end. For example, to backup my local demo database, I would use the command:
Raven.Smuggler.exe out http://localhost:8080/databases/demodb demodb.dump
or
Raven.Smuggler.exe out http://localhost:8080 demodb.dump --database=demodb
And off it goes:
Depending on the size of your database, this may take a few seconds to a few minutes. Generally it’s pretty fast, but if you have a lot of attachments, that seems to slow it down quite a bit. Once it’s done, you can see your output file in the same directory:
Tip: Are you in a command line directory and really wish you had a Explorer window open in that location? Run “explorer %cd%” to launch a new version of Explorer defaulted to your current directory. Note: sometimes this doesn’t always work, like if you’re running the command line window in administrator mode.
Yes, that’s not a very big file, but it’s a pretty small database to start with. Obviously they can get much bigger, and I usually see backups gettting up to a few hundred MB or a few GB. You could try to compress it with your favorite command line compression tool installed (I really like 7-Zip), but it’s not going to get you much. RavenDB already does a good job of compressing the content while it’s extracting it via Smuggler.
Amazon S3
Next, you have to put it somewhere, preferably as far away from this server as possible. A different machine is a must, a different data center or even different hosting company is even better. For me, one of the cheapest/easiest/fastest places to put it in Amazon S3.
There are a few ways to get the file up to S3. The first option is to upload it straight from Amazon’s S3 website, although that can require installing Java, and you may not be into that kind of thing. Also, that’s not really scriptable.
Or you could use S3 Browser, which is an awesome tool. For end user integration with S3, it’s great, and I’ve gladly paid the nominal charge for a professional license, and recommended all of my S3-enabled clients to do the same. However, while it’s a great UI tool for S3, it is not very scripting friendly. It stores your S3 connection information in your Windows user profile, which means if you want to script it you need to log in as that user first, setup the S3 profile in S3 Browser, and then make sure you run the tool under that same user account. That’s a lot of headache I don’t really want to worry about setting up, much less remembering in 6 months when I need to change something.
One great S3 backup tool is CloudBerry. It’s not free, but it’s relatively inexpensive, and it’s really good for scheduling backups of specific files or folders to S3. Depending on your wallet tolerance, this may be the best option for you.
But you may want a free version, and you’re probably asking, “why is this so hard to just push a file to S3? Isn’t that just a few lines of AWSSDK code?”. Well yeah, it is. Actually it can be quite a few lines, but yeah, it’s not rocket science. Luckily here is a great tool on CodePlex that lets you do this: http://s3.codeplex.com/. It’s a simple command tool with one-line commands like “put” and “auth” to do the most simple tasks. To push your new file to S3, it would just be:
s3 auth /nogui [AccessKeyID] [SecretAccessKey] s3 put /nogui [S3BucketName]/[TargetS3Folder]/ [FileToUpload]
So if we wanted to upload our new demodbn.dump file to S3, it would look something like this:
And hey, there’s our backup on S3.
So we now have a 3-line database backup script:
Raven.Smuggler.exe out http://[ServerName]:[Port] [DatabaseName].dump --database=[DatabaseName] s3 auth /nogui [AccessKeyID] [SecretAccessKey] s3 put /nogui [S3BucketName]/[TargetS3Folder]/ [DatabaseName].dump
Just put that in a batch file and set a Window’s Scheduled Task to run whenever you want. Simple enough, eh?
Restoring via Smuggler
So now that you have your RavenDB database backed up, what do you do with it? You’re going to test your restore process regularly, right? Right?
First you need to get a copy of the file to your machine. You could easily write a script using the S3 tool to download the file, and I’ll leave that as an exercise for the reader. I usually just pull it down with S3 Browser whenever I need it.
So once you have it downloaded, you just need to call Smuggler again to import it. It’s the same call as the export, just change “out” to “in”. For example, to import our demodb back into our local server, into a new DemoDBRestore database instance, we would say:
Raven.Smuggler.exe in http://localhost:8080 demodb.dump --database=DemoDBRestore
And we would see:
And then we have our restored database up and running in RavenDB:
Conclusion
Now I’m not a backup wizard. I’m sure there are better ways to do this, with incremental builds and 3-way offsite backups and regular automated restores to a disaster recovery site and all sorts of fancy stuff like that. The bigger and more critical your application becomes, the more important it becomes to have those solutions in place. But day 1, starting out on your project, you need to have something place, and hopefully this helps you get started.