ZooKeepers, Puppet Masters and Clouds…

Like many companies today, at Nextdoor we make heavy use of cloud computing to develop and host our service. Leveraging the “cloud” has given us the ability to be flexible and develop rapidly as our user base has grown in the last 12 months. This same flexibility has also forced us to consider how we handle configuration management in a fluid environment where our servers sometimes run for months, and other times run for only hours.

Having come from Netflix, where their use of cloud resources (in conjunction with their own data centers) is legendary, I have seen first hand the value of rapid and predictable configuration management. In the cloud, configuration management becomes extremely critical because servers fail quite frequently. Its critical to have a way for our environment to dynamically re-configure itself as servers come up or go down.

At Nextdoor, one of our fundamental operational beliefs is that we provide services to our engineers, rather than providing servers. That means that rather than providing our developers a hard-coded list of database servers, we provide a location that can be queried to get the current list of servers offering a given service … dynamically, and quickly. To manage the ever-changing list of servers and services, we use Apache ZooKeeper.

The ZooKeeper project provides a stable, fast and replicated database geared towards live configuration state. Today, ZooKeeper is pretty much the gold-standard in this area … but lack of non-developer-friendly tools has limited its adoption to primarily custom home grown applications.

Apache ZooKeeper is great for app developers… it’s easy to write support into your application for both retrieving lists of servers (or data), as well as registering as a service provider. If you have an environment where every service you run is home-grown, this is great! The problem is … most environments are a mix of home-grown software as well as off-the-shelf software.

We found ourselves with two problems to solve:

  • registering off-the-shelf services with ZooKeeper
  • leveraging that dynamic configuration data in our Puppet manifests

Introducing: zk_watcher

Once we had chosen to use ZooKeeper, we had to solve the problem of registering lists of services that were off-the-shelf tools. Memcached, Postgres, PGPool … just to name a few. After scouring the tubes of the Internet, we couldn’t find a simple tool for this, so we wrote our own.

zk_watcher is a simple Python daemon that we run on almost all of our servers. This daemon has a configuration file that lists various services that a host is offering, how to check whether that service is operational or not, and potentially any unique configuration data that a client of that service might need. Here’s a simple configuration:

[memcached]
cmd: pgrep memcached
refresh: 60
service_port: 11211
zookeeper_path: /services/staging/uswest2/memcached
zookeeper_data: zone=us-west-2b

This registers the following ‘node’ in ZooKeeper:

/services/staging/uswest2/memcached
      /staging-mc1-uswest2-i-23t5fdas.mydomain.com:11211
        pid = 17013
        zone = u'us-west-2b’

This simple daemon is available via the PyPI service, as well as directly available via GitHub.

Puppet ZooKeeper Integration

At Nextdoor, our entire infrastructure is configured with Puppet. Everything from our dev environments to our live production servers are configured with Puppet manifests that define rules that our servers live by. We believe so strongly in strict configuration management that we do not have a single dev/staging or production server that is not entirely built from scratch with Puppet.

Registering our services with ZooKeeper doesn’t do us a whole lot of good unless we have a method for accessing that data. In our actual home-grown application code, we can do this easily with the help of a few Python libraries. It turns out that we can do the same thing in Puppet, allowing Puppet to dynamically configure our systems with up-to-date server lists for different services.

A colleague once reminded me that configuration state and configuration management are not the same thing. The list of servers that provide a particular service are configuration state, while the process of building a configuration file containing this list of servers is configuration management. Pulling hard-coded lists of servers out of Puppet helps us stay agile as our environment changes, without needing to make code changes all day long.

We’ve written a small Puppet plugin that allows our Puppet Master servers to ask our ZooKeeper service for a list of servers for a particular path, and then dynamically generate the server configuration based on that information. Getting a list of servers that are running Memcached, guaranteeing that we get at least 1 server, and no more than 5:

$servers = zkget(’/services/production/uswest1/memcached’, 1, 5)

This returns an array in Puppet that we can use in our manifests or templates.

To round out our integration, we’ve also built a simple Puppet class for installing the zk_watcher daemon and configuring it. Configuring a host to register with ZooKeeper as a memcached provider is as simple as this:

zk_watcher::add { 'memcached’:
  port    => 11211,
  cmd     => 'service memcached status’,
  path    => ’/services/production/uswest1/memcached’,
  refresh => '30’,
  data    => 'zone=us-west-2b’;
}

The Puppet plugin is available on GitHub.

Matt Wise
Sr. Systems Architect