Oh The Huge Manatee

A blog about technology, open source, and the web... from someone who works with all three.

Every Feature a Branch, Every Branch a Dev Environment... Without Breaking the Bank! The Perfect Git-flow Dev Environment for Small Shops

What and Why?

Everyone loves the git flow method for managing branches in development projects. If you’ve been living under a rock and don’t know what git flow is, here’s the original blog post that brought order to branching chaos. Honestly you can get a really good idea just by looking at the diagram:


Git-flow is a great branching methodology. It leverages git’s merge strength to the max, and lets you focus on the important things. The only problem is: it requires a lot of development environments.

This isn’t actually a problem for many projects. Most coding projects keep their important work - you guessed it - in code, and a new development environment is just a “git checkout -b branchname” away. But for Drupal projects it can be a real pain. Setting up a new environment means copying codebase, files, and db… not particularly hard to do, but enough effort that most people take short cuts. For this reason, most firms don’t really implement git flow in a complete way. I know of a few shops who, feeling guilty about not implementing git flow in their shared resources, tell their developers to do it on their localhosts. I don’t know any developers who listen, though. It always seems easier to just shortcut.

In the last few months, we’ve seen the release of a couple of Drupal-oriented hosting offerings specifically tailored to fill this gap. Commerce Guys launched the Commerce Platform and Pantheon launched Multidev within a few weeks of each other. If you haven’t heard of these products, go check them out now. They both offer development environments that clearly have git-flow in mind, where you get a perfect production-copy AWS instance for each new branch you create. This is really cool stuff: each branch gets its own virtual server, in seconds! An environment totally identical to live!

These are great products (as far as I’ve heard), but they’re not tailored to the needs of a Drupal shop. They only support single repositories, and just a handful of branches. The typical small or medium sized Drupal shop will have multiple features on the go for several active clients at once. We’re not talking about managing 4 or 5 branches here, like Multidev supports. Go check your ticketing system now. Across all your projects, you’re probably managing at least 50 features or bugfixes. In real git-flow, each of those is supposed to get a unique branch and development environment. I have no idea what kind of a deal Pantheon would cut me for enough multidev environments to manage 15 repos and 200 branches at a time, but I don’t think that’s going to be a standard package anytime soon.

So what’s the small or medium sized Drupal shop to do? We want a way to realize the benefits of git flow, without breaking the bank.

The truth is that you don’t really NEED a separate copy of the live server for each branch. You need a separate environment, but git flow accounts for a staging step in golive. There’s nothing wrong with building all your development branches on one shared environment, and maintaining a separate staging environment per client. That’s actually the Normal Way To Do It. And if all our dev environments are on the same server, suddenly spinning up a new environment doesn’t look so complicated after all.

How

We’re going to use git hooks to automatically provision a new Drupal environment every time a new branch is pushed. This means keeping current checkouts of each branch in a logical structure, while coordinating a database and a files directory for each. Of course you have to clean up when a branch is destroyed, too.

These scripts are based on my environment. If you’re a subdomain person, or if your environment is otherwise different from mine, these scripts shouldn’t be hard to manipulate for your needs… it’s just bash, after all. In this structure, each repo gets a subdirectory, and each branch gets a subdirectory underneath. We’ll use the Master branch as the main development branch, since that’s where developers are most likely to make accidental pushes, and I’d rather not have those get to live! This means your main development branch will be at http://development.example.com/reponame/master . Your dev copy of the live site will be at http://development.example.com/reponame/live , and so on.

While you’re testing these scripts, set up a special repo just for the testing. You can put the hook in that repo’s .git/hooks/ directory. Later on you can install it for all your repos, however your git host handles that.

There’s one caveat on this system: never let anyone develop directly in any of these directories. Any changes anyone needs to make to a site’s codebase should be made through the repository anyway, but this system actually relies on that rule. So make sure your devs don’t have access to the dev server’s live checkouts, give them stern talkings-to, whatever you have to do. Save yourself the trouble of tracking down the weird errors and loss of work that can result, and force people to use the repo properly.

To allow a working tree for your server’s git repo, you need to set the config for the repos this will apply to. In most git hosting environments you can add these configuration variables to your git user’s ~/.gitconfig file. While you’re testing, you can add them to .git/config in your testing repo.

[core]
        bare = false
[receive]
        denycurrentbranch = ignore

bare = false  This tells git that this repo DOES have a working tree associated with it.

denycurrentbranch = ignore This option is what normally protects a repository with a working tree from receiving pushes. Normally if your non-bare repository receives a push, your repo’s HEAD will end up out of sync with that working tree. That’s a problem waiting to happen. We set it to “ignore,” because we know that each push is going to be immediately followed by a checkout to bring that working tree back into sync. We also know that the only person touching these files is going to be the git script, right?

Now we add the actual git post-receive hook. This is where the magic happens. Note that I rely on the gitolite variable $GL_REPO here. If you don’t use gitolite, you’ll have to find another way to get the repository name. You’ll have to set a few variables at the top of the file before this is useful:

mysql_user / mysql_pw: The mysql user doesn’t have to be root, it can be anyone with access to create and drop databases.

worktree_root: The root directory where all your repo checkouts should go. On my server, this is the webroot.

http_root: The root URL for your development environment. Your repo and branch names will be appended to this to give devs their easy-to-click URL in email and commit messages.

support_files: Where you will keep the support files, like the clone-db-files.sh script and your template settings.php .

#!/bin/bash
# sets up a new working tree for every branch.
# cleans up after itself when branches are deleted
# sets up provisional DBs, too

mysql_user='root'
mysql_pw='your-mysql-password'
worktree_root="/var/www/public_html"
http_root="http://example.com"
support_files="/home/git"

while read oldrev newrev ref
do
  branch=`echo $ref |cut -d/ -f3`
  worktree="$worktree_root/$GL_REPO/$branch"
  repo_sanitized=`echo ${GL_REPO//[-._]/}`
  branch_sanitized=`echo ${branch//[-._]/}`
  db_name="$repo_sanitized"_"$branch_sanitized"

  # Exit nicely if this is the gitolite-admin repo.
  if [ $GL_REPO = 'gitolite-admin' ]; then
    exit 0
  fi

  # Did we delete a branch? Delete the DB and checkout
  if [ $newrev = '0000000000000000000000000000000000000000' ]; then
    if [ ! $branch = 'master' ]; then
      echo "Deleting working tree and DB for branch $branch."
      rm -rf $worktree
      mysqladmin -f -u "$mysql_user" --password="$mysql_pw" drop $db_name
    else
      echo "Refusing to delete working tree and db for master branch."
      exit 1
    fi
  else
    # Make the worktree if it doesn't exist
    if [ ! -d "$worktree" ]; then
      mkdir -p $worktree
    fi
    # Check out the latest version in place.
    git --work-tree=$worktree checkout -f -q $branch
    echo "Checked out updated code to $http_root/$GL_REPO/$branch"
    # Are we missing settings.php ? this could be a new branch or a fresh repo.
    if [ ! -e "$worktree/sites/default/settings.php" ]; then
      # Create settings.php from a template
      cp "$support_files"/template.settings.php "$worktree"/sites/default/settings.php
      sed -i -e "s/_placeholder_/$db_name/g" "$worktree"/sites/default/settings.php
    fi
    # Is it a new branch?
    if [ $oldrev = '0000000000000000000000000000000000000000' ]; then
      # Create the DB
      mysqladmin -u "$mysql_user" --password="$mysql_pw" create $db_name
      # Copy the DB from master branch. This is asynchronous because some DBs take a loooong time.
      if [ ! $branch = "master" ]; then
        email=`git log --format="%ae" "$newrev" -1`
        "$support_files"/clone-db-files.sh $GL_REPO $branch $email &
        echo "New branch detected, setting up db and files directories. This can take some time, depending on the size of the site. We'll email your git address ($email) when it's ready to use."
      else
        # New repo! Create the files directory and give them the install URL.
        mkdir $worktree/sites/default/files
        chmod ug+rwx $worktree/sites/default/files
        echo "New repo detected, creating empty DB and files directory. Your site is prepared; go and install Drupal at  $http_root/$GL_REPO/$branch/install.php"
      fi
    fi
    if [ ! $branch = 'master' ]; then
      # so that HEAD always sits at master
      git --work-tree=$worktree_root/$GL_REPO/master checkout -q -f master &
    fi
  fi
done

We detect the new branch by the fact that the old ref is all zeros, and a deleted branch by the fact that the new ref is all zeros. We use mysqladmin to create or delete the branch database, depending on the operation. We create settings.php from a template, that has a placeholder for the DB name. Since these are just development environments, you should have all sensitive information scrubbed anyway… so this script assumes you use one mysql username and password for all your development sites.

I separated out the script which actually duplicates the db and files directories, because that tends to take awhile. With a small Drupal site it’s a matter of seconds, but the largest site I tested with took almost 15 minutes to duplicate the 750MB DB. That’s a real problem if your developer is sitting there waiting for their commit to finish, but not so bad if it happens in the background and sends a notification when it’s done. So here is the duplication script. You’ll have to set the same set of variables as in the git hook above, plus a logfile location. This script is run as the git user, so I find it’s easiest to keep the logfile in the git user’s homedir.

#!/bin/bash
# Copy DB from master to a new branch, and email the results. The destination DB should already be created.
# Arguments: project name, branch, email address

mysql_user='root'
mysql_pw='mysql-password'
worktree_root="/var/www/public_html"
http_root="http://ec2-54-226-103-245.compute-1.amazonaws.com"
logfile="/home/git/new-branches.log"

project=$1
branch=$2
email=$3
repo_sanitized=`echo ${project//[-._]/}`
branch_sanitized=`echo ${branch//[-._]/}`
db_name="$repo_sanitized"_"$branch_sanitized"
mysql_start_vars="SET AUTOCOMMIT=0; SET UNIQUE_CHECKS=0; SET FOREIGN_KEY_CHECKS=0; SET GLOBAL INNODB_FLUSH_LOG_AT_TRX_COMMIT=2;"
mysql_stop_vars="SET AUTOCOMMIT=1; SET UNIQUE_CHECKS=1; SET FOREIGN_KEY_CHECKS=1; SET GLOBAL INNODB_FLUSH_LOG_AT_TRX_COMMIT=1;"

# Copy the DB from master branch
cat <(echo "$mysql_start_vars") <(mysqldump --opt --quick -u "$mysql_user" --password="$mysql_pw" "$project"_master) <(echo "$mysql_stop_vars") | mysql -u "$mysql_user" --password="$mysql_pw" $db_name >> $logfile &

# Copy the files directories - all sites except "all"
cd $worktree_root/$project/master
umask 002
if [ `find ./sites -type d -not -path "*/sites/all/*" -name 'files' -print0` ]; then
  find ./sites -type d -not -path "*/sites/all/*" -name 'files' -print0|xargs -0 -I{} cp -R --no-preserve=mode,ownership --parents "{}" $worktree_root/$project/$branch/ >> $logfile &
  wait
  # Certain systems don't respect no-preserve in copy, so this is a just-in-case to make sure file modes are OK
  cd $worktree_root/$project/$branch
  find ./sites -not -path "*/sites/all/*" -name 'files' -print0 |xargs -0 -i{} chmod -R ug+rw "{}"
else
  echo "WARN: No files directories found in Master branch checkout."
fi

# wait for the previous commands to finish processing
wait


# Send confirmation email
message="Your new branch environment for $branch is ready to use. The database and files have been copied from the master branch, and code is checked out in place. You can access the new environment at $http_root/$project/$branch."

echo $message | /usr/bin/mail -s "Environment is ready for $project branch: $branch" $email

I opted not to use drush for the cloning, and I expect I’ll get some flak for it. Basically I feel that this isn’t really Drush’s use case. We don’t need the flexibility Drush provides, and we can trade it off for speed optimizations that are only acceptable in our very specific use case.

The only other file you need is the template settings.php file, which is not exactly rocket science. If you’re following best practices, settings.php is never committed into your repo. You can see our solution for keeping settings.php customizations for dev in the repo at the bottom of our template file here. Note that since this is a development environment, it’s assumed that you’ve already pruned sensitive information out of the database. That means it’s safe to use a single username and password for all those dev databases (don’t you DARE do this on production sites, though!). The up shot of this is that you can have a nice template settings.php that applies to all of the sites in your dev environment. If you’re uncomfortable with it, there’s nothing wrong with adding random password generation to the script above… just make sure to post your solution for that in the comments. :)

<?php

// Git flow settings.php template

$databases = array (
  'default' =>
  array (
    'default' =>
    array (
      'database' => '_placeholder_',
      'username' => 'your_mysql_username_here',
      'password' => 'your_mysql_password_here',
      'host' => 'localhost',
      'port' => '',
      'driver' => 'mysql',
      'prefix' => '',
    ),
  ),
);

# 5rings settings include. If you got custom settings, put em in here.
if (file_exists('sites/default/5rings.settings.php')) {
  include('sites/default/5rings.settings.php');
}

Note the last line theres - that’s how we include any settings that we need for dev environments. We just commit a file called 5rings.settings.php with customizations like disabling securepages, setting stage_file_proxy target, etc. That won’t do any damage on live (though it’s available there for easy reference), and it ensures that any special configuration needs are still tracked in our repo. Not to mention, it makes it possible for us to easily clone repos like this!

Enjoy your new git flow development environment. I welcome any questions or suggestions in the comments.

Comments