Patch all the things!

Summary

I recently did a customer call/demo around using Chef to patch Windows systems and I thought this would make a great post. However, I’m going to change one thing, we’re going review patching as a fundamental process and cover more than Windows.

Objectives

What are our objectives in patching, e.g. why do we do this seemingly futile task? What business value do we provide? There’s really only a couple things IT can do for a business; things that allow the business to move faster and things that reduce risk to the business. In the case of patching, we’re really trying to reduce the risk to the business by minimizing the possible attack surfaces. According to most security firms, roughly 80% of all data breaches occurring the past few years would have been prevented if those businesses had an effective patching process.

Ok, cool, so our goal is to reduce risk; however, making low level code changes on basically everything in a relatively short time sounds pretty risky too, right? In the post DevOps world, we wouldn’t patch. When new updates are available, they are shipped through the pipeline like every other change. Ephemeral nodes are provisioned, integration tests run, new OS images prepped, new infrastructure provisioned, etc.

I can read your mind at this point; “We aren’t all start-ups, some of us still have a lot of legacy systems to manage.” And you’d be totally correct. Almost all enterprises (and a lot of smaller businesses) have legacy systems that are critical to the business and will be around for the forseeable future. Having said that, this doesn’t mean that you can’t have a well-understood, mostly automated patching process.

Basic tools

Repositories

Below are some notes on each major platform. The main thing to remember is the repository tool is the source for our patches. It’s also the main control point for what’s available in our environment. By default, we want to automatically synchronize with the vendor periodically. We then black list any patch or update that we can’t use (or busted patches that break stuff).

Essentially, we want to create one or more local repositories for all of the available packages for each platforms/distribution we want to support. The total number required will vary depending on your network topology, number of supported platforms/distributions, etc. Patching can be extremely network intensive. If you have a large number of systems or multiple systems across one ore more WAN links, plan accordingly and don’t DDoS yourself. I can’t emphasize this enough, if you don’t take load (network, OS, hypervisor, etc.) into account, you will cause severe, possibly even catastrophic problems for your business.

Now that the gloom and doom warnings are out of the way, let’s look at some OS specific info.

Linux

Each Linux distribution has it’s own repository system. For RHEL based systems, we use Yum, and for Debian based systems, we use Apt.

RHEL has some licensing implications for running local repositories. Talk to your Red Hat rep for more info.

MacOS

Until recently, macOS Server had a Software Update Services component. Some folks are using the new caching service but it’s not quite the same.

Windows

Windows Server has an Update Services role. WSUS can provide local source for the majority of Microsoft’s main products. WSUS also has a replica mode for supporting multiple physical locations or very large environments.

Windows users that are running newer versions of Server and Client can also take advantage of the Branch Cache features. This allows clients on a single subnet to share content and drastically reduce the WAN utilization without needing downstream servers.

Configuration

I’m going to use Chef for my examples (full disclosure, I work at Chef), but these principles will work with any configuration management platform. The main thing to keep in mind is the CM tool doesn’t do the actual patching. This is a pretty common misconception. Yes, it’s entirely possible to use a CM tool to deliver the actual patch payload and oversee the execution, but why would you want to do all that extra work? This is about making our lives better so let’s use existing tools and functionality and use CM to control the operation.

Orchestration

Orchestration means different things to different people. To me, orchestration is the process of managing activities in a controlled behavior. This is different from CM in that CM is about making a system or object act a certain way whereas orchestration is about doing multiple things in a certain order or taking action based on one or more inputs.

You will need an orchestration tool if you have any of the following needs (or similar needs):

I need to reboot this system first, then that system for my app to work correctly.
I want no more than 25% of my web servers to reboot at any given time.
I want to make this DB server primary, then patch the secondary DB server.

If any of these sound familiar, you aren’t alone. These are common problems in the enterprise; however, there’s no common solution. With the magnitude of various applications, systems, and platforms out there, there’s no way to provide prescriptive guidance for this topic. A lot of folks use a outside system as a clearing house for nodes to log their state then check the state with a cookbook on all the nodes to make decisions.

In regard to patching, if I have a group of nodes that has a specific boot order, I tend to stagger their patch windows so the first nodes patch and reboot, then the second group, and so on. I may also patch them in one window and leave them pending reboot, then reach out to them separately and reboot them in the required order.

Process

As you can see, the names of the tools within a group may be different; however, they all function using similar patterns. These similarities are what allow us to build a standardized process regardless of platform.

Repositories provide the content and content control
Config. management tools control the configuration of the client (I know… Mr. Obvious…). This is how we assign maintenance windows, reboot behaviors, etc.
Orchestration tools handle procedural tasks

We want this to be as automated as possible so everything will be set to run on a scheduled basis. The only time it requires manual intervention is when there’s a patch we need to exclude. In the examples below, we’ll configure the repositories to sync nightly and we’ll configure the clients to check on Sunday mornings at 2AM. I can hear what you are thinking again, you’re saying “We have a bunch of systems, we can’t reboot all of them at once!” And you’d be right. Even if you don’t have a large fleet, you still don’t want to patch all the things at once.

In reality, you want at least one test system for each application, service, or role in your environment. Preferably, you want complete replicas of production (although smaller scale) for testing. Patches should be shipped and tested just like any other code change. Most enterprises have some process similar to Dev -> QA -> Stage -> Prod for shipping code changes so patching should follow that same pattern. Remember, the earlier we discover a problem, the easier and cheaper it is to resolve.

Technical Setup

Below are sample instructions for building the required components. These are not 100% production ready and only serve as examples on where to start. Each company has it’s own flavor of doing things so it’s not possible to account for all the variations.

Server Side

First thing we need is to set up our repositories. We’ll set up a basic mirror for CentOS 7 and Ubuntu 16.04, then set up a Windows WSUS Server.

APT Repository

# metadata.rb

# attributes/default.rb

# recipes/default.rb

Yum Repository

# metadata.rb
depends 'yum'

# attributes/default.rb
default['yum']['repos']['centos-base'] = 'http://mirror.centos.org/centos/7/os/x86_64'
default['yum']['repos']['centos-updates'] = 'http://mirror.centos.org/centos/7/updates/x86_64'
default['yum']['combined'] = false

# recipes/default.rb
package 'createrepo'
package 'python-setuptools'
package 'httpd'

# https://github.com/ryanuber/pakrat
execute 'easy_install pakrat'

repos = ''
node['yum']['repos'].each do |name, baseurl|
  repos += "--name #{name} --baseurl #{baseurl} "
end

repos += '--combined ' if node['yum']['combined']

directory '/var/www/html/' do
  recursive true
end

#
#
# Convert to a cron resource to schedule nightly sync's
#
#########################################################
execute 'background pakrat repository sync' do
  cwd '/var/www/html/'
  command "pakrat #{repos} --repoversion $(date +%Y-%m-%d)"
  live_stream true
end
#########################################################
#
#
#
#

service 'httpd' do
  action [:start, :enable]
end

WSUS Server

# metadata.rb
depends 'wsus-server'

# attributes/default.rb
default['wsus_server']['synchronize']['timeout'] = 0
default['wsus_server']['subscription']['categories']      = ['Windows Server 2016',]

default['wsus_server']['subscription']['classifications'] = ['Critical Updates',
                                                             'Security Updates']
default['wsus_server']['freeze']['name']                  = 'My Server Group'

# recipes/default.rb
include_recipe 'wsus-server::default'
include_recipe 'wsus-server::freeze'

Client Side

Now that we have repositories, let’s configure our clients to talk to them.

CentOS Client

# metadata.rb

# attributes/default.rb

# recipes/default.rb
cron 'Weekly patching maintenance window' do
  minute '0'
  hour '2'
  weekday '7'
  command 'yum upgrade -y'
  action :create
end

Ubuntu Client

# metadata.rb

# attributes/default.rb

# recipes/default.rb

Windows 2016 Client

# metadata.rb
depends 'wsus-client'

# attributes/default.rb

default['wsus_client']['wsus_server']                              = 'http://wsus-server:8530/'
default['wsus_client']['update_group']                             = 'My Server Group'
default['wsus_client']['automatic_update_behavior']                = :detect
default['wsus_client']['schedule_install_day']                     = :sunday
default['wsus_client']['schedule_install_time']                    = 2
default['wsus_client']['update']['handle_reboot']                  = true

# recipes/default.rb
include_recipe 'wsus-client::configure'

Validation

Now the real question: “Did everything get patched?” How do we answer this question? Apt and Yum have no concept of reporting and the WSUS reports can get unwieldy in a hurry. Enter InSpec. InSpec is an open source auditing framework that allows you to test for various compliance and configuration items including patches. Patching baselines exist for both Linux and Windows. InSpec can run remotely and collect data on target nodes or you can have it report compliance data to Chef Automate for improved visibility and dashboards.

Closing

Congratulations! If you are reading this, then you made it through a ton of information. Hopefully you found at least a couple nuggets that will help you.

Have questions about patching strategies? Find me on GitHub or LinkedIn.

Thanks to @ncerny and @trevorghess for their input and code samples.

Helpful tips

Here are some tidbits that may help you.

WSUS memory limit event log error. You will almost 100% hit this at some point. This is a scaling config so the more clients you have, the more memory WSUS will need.
Use attributes to feed required data (maintenance window, patch time, reboot behavior, etc.) to the node. This allows you to have one main patching cookbook that get’s applied to everything. You can deliver attributes different ways:
- Environments
- Roles
- Role/wrapper cookbooks
Remember we are using our CM system to manage the configs, not do the actual patching.