James Massardo - Geek of all things Technology

Rego Unit Testing

|

Summary

This post is going to outline some basics, interesting tidbits, and caveats on unit testing rego policies.

Unit tests

Let’s start with an obvious question. What is a unit test? Compared to some of the other types of testing in tech, this one is pretty self-explanatory. A unit test is a way to test individual components (or units) of a system. For example, if we had a rego policy that is supposed to deny requests that allow port 80, we’d write a unit test that sent an input that included that port with the expectation that we’d get a deny message back. This gives us a way to validate our policies without having to deploy them to a real system. It also gives us a way to validate that policy changes don’t introduce any insecurities into our environment.

Generally speaking, unit tests are only needed for custom logic and shouldn’t be used for prebuilt rules/functions of a tool. Also, while we’re making general assumptions, do not strive for 100% test coverage. If you are using a provided library (e.g. Styra DAS), the provider already does testing so there’s no need to repeat it. Unit tests can also add considerable delays to CI/CD when testing new policy changes.

Example

Let’s take a look at an example policy. This policy allows requests to POST to the users path.

package authz

allow {
    input.path == ["users"]
    input.method == "POST"
}

Now let’s take a look at the accompanying unit test.

test_post_allowed {
    allow with input as {"path": ["users"], "method": "POST"}
}

Let’s break this down.

  • We have a test definition named test_post_allowed
  • The test calls the specific policy definition and passes input to it.
  • allow is the name of the policy definition
  • with is a rego keyword that allows queries to access values in an input object.
  • {"path": ["users"], "method": "POST"} is the test data being used as the input.

You can also store more complex data in a variable in the test definition.

test_post_allowed {
    in := {"path": ["users"], "method": "POST"}
    allow with input as in
}

Now that we have a test, let’s actually run it. Let’s look at two ways we can accomplish this.

  • Here’s our example in Rego Playground. It’s easy enough, load the page and click Evaluate.
  • Now let’s try it with OPA.
    • Let’s put our policy definition in a file example.rego and our test definition in a file example_test.rego
    • Now let’s execute the tests by running:
      % opa test . -v
      data.authz.test_post_allowed: PASS (3.697875ms)
      

Testing conventions

There are a few conventions for writing rego tests.

  • Tests should be named <policyname>_test.rego. E.g. if your policy is ingress.rego, then your test should be named ingress_test.rego
  • All definitions in the test file should start with test_ have a descriptive name. E.g. if you policy definition is allow {...}, then your test might be named test_post_allowed {...}

Unit testing in Styra DAS

If you are using Styra DAS there are a couple things to consider. The main thing is all of the policy definitions are summed up into a single policy. Let’s take a look at an example.

Here we have 2 definitions, but notice that both at named enforce. Recall that with our unit tests, we call the definition by name to execute a test.

enforce[decision] {
  not excludedNamespaces[input.request.namespace]
  data.library.v1.kubernetes.admission.workload.v1.block_privileged_mode[message]

  decision := {
    "allowed": false,
    "message": message
  }
}

enforce[decision] {
  data.library.v1.kubernetes.admission.audit.v1.require_auditsink[message]

  decision := {
    "allowed": false,
    "message": message
  }
}

So how do we test this? Well, we have options:

  1. Don’t write any tests at all. Since we’re only consuming pre-built content, there’s really no value in writing tests.
  2. Write your tests so they test the policy as a whole. Provide “known good” input data in the test so all the definitions pass. This way, if a definition is changed, the test will fail.
  3. If we really need to test individual definitions, we can give them specific names so we can call them separately. We loose some of the GUI functionality in DAS by doing this as the definitions become completely custom and not DAS managed. We also need to add an additional definition to include the result of our now custom one into the main DAS policy.

Let’s look at an example of the last option. We’ll use the same enforce definitions above but rename them so we can test them individually.

block_priv_mode[decision] {
  not excludedNamespaces[input.request.namespace]
  data.library.v1.kubernetes.admission.workload.v1.block_privileged_mode[message]

  decision := {
    "allowed": false,
    "message": message
  }
}

require_audit[decision] {
  data.library.v1.kubernetes.admission.audit.v1.require_auditsink[message]

  decision := {
    "allowed": false,
    "message": message
  }
}

enforce[decision] {
  block_priv_mode[decision]
  require_audit[decision]
}

Closing

Hopefully this post has been helpful getting started. The Open Policy Agent documentation has a lot more info on policy testing

If you have any questions or feedback, please feel free to contact me: @jamesmassardo

HTTPS for Multiple domains using GitHub pages

|

Summary

In today’s modern Internet, nearly all browsers show a big warning when viewing an insecure/non-ssl site. If you own a single domain name and use GitHub Pages to host your site, this is a simple matter to resolve. One can easily navigate to the project settings (https://github.com/<USERNAME>/<USERNAME>.github.io/) and tick the Enforce HTTPS setting.

So what happens if you own multiple domains that point to the same site? Now we have a bit of a mess to untangle.

If this was a high volume or mission-critical site, I would use fit-for-purpose tooling with load balancers, CDNs, geo-redundant hosts, etc. But what about a simple page that doesn’t get a lot of traffic?

How can we do it on the cheap?

Problem

Let’s take a look at my needs:

  • My primary site, dxrf.com, is Jekyll based and is hosted as a GitHub Pages site.
  • I own 4 different domains that have served in various capacities over the years. As it stands right now, I want all 4 to redirect to my primary site.
  • I want SSL certificates in place so HTTPS requests are redirected without warning/errors
  • Low cost. I don’t mind paying a few bucks for my needs but I don’t want any expensive load balancers, app gateways, or CDN subscriptions.
  • I want to learn something through the process.

Solution

I settled on running a micro VM in a cloud subscription with Nginx serving redirects. The VM is super cheap, can handle multiple domains, and Nginx is compatible with Certbot so I automate the SSL enrollment/renewal process.

Let’s look at the high level steps needed. This assumes you have GitHub pages working with your primary custom domain.

  1. Provision VM with public IP address and ports 80-443 open
  2. Install Nginx and provision virtual hosts for each domain name (Virtual Server How-to)
  3. in the virtual host root directory /var/www/domain.tld/html, create an index.html page with a meta-refresh (see example below)
  4. Update apex and www DNS records to point to VM’s public IP.
  5. Run Certbot to generate SSL certificates

Example index.html

<!DOCTYPE html>
<html>
<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>Redirecting to https://dxrf.com</title>
    <meta http-equiv="refresh" content="0; URL=https://dxrf.com">
  </head>

  <link rel="canonical" href="https://dxrf.com">
</html>

Notes/To-do’s

There are a few caveats to this pattern a few things I still need to iron out.

  • If the VM goes down, all but the primary domain will stop working as there’s nothing to redirect the request. If downtime isn’t something your site can tolerate, consider other hosting methods/patterns.
  • Let’s Encrypt certificates are only good for 90 days so we’ll need something to auto-renew
  • Since this is a single VM exposed to the Internet, make sure it stays up to date to reduce the risk of someone hijacking your server and sending your traffic elsewhere (or worse, making your site serve malware).

To-do List

  • Automate VM and cloud infra provisioning (e.g. terraform)
  • Automate Nginx config (e.g. chef/ansible)
  • Automate certbot auto-renewal
  • Add Nginx config to redirect 404’s to primary site (e.g. http://www.otherdomain.com/FileThatDoesntExist.html -> https://dxrf.com)

Closing

While these aren’t step-by-step directions, hopefully it’s enough for someone else to solve the same problem I had. If you have any questions or feedback, please feel free to contact me: @jamesmassardo

Habitat pipelines in Azure DevOps

|

Summary

I’ve been doing some work related to building and deploying Habitat artifacts using Azure DevOps so I thought I’d blog the basic steps so future me can set it up again…

Notes

For this post, I’ll be storing my code in GitHub then using the pipeline components in ADO. Also, we’ll be using ADO’s new Multi-stage pipelines for these examples as they let us store all of our steps in a single yml file.

While the pipelines are stored in a single yml file, there are really two main parts: Build and Deployment. The build tasks are handled in a single stage while the deployment tasks are broken into three separate stages. This gives us the ability to follow a standard dev, stage, prod deployment model.

pipeline stages

The build stage is very simple. It uses the Habitat extension for almost all its steps. The three deployment stages only differ in one spot. The DeployToDev stage performs a hab pkg upload... while the DeployToStg and DeployToProd stages execute a hab pkg promote ... for their respective channels.

The pipelines are generic as they source the last_build.ps1/last_build.env files from the build. This allows us to pull the artifact name and the package identifier from the build.

Usage

To consume these pipelines, copy the appropriate yml file from this repository into the root of your project. See my Wildfly repo for an example. The Habitat extension expects the hab specific components to be in a habitat/ subdirectory within your project.

Example

Let’s walk through an example. Before we get to the actual ADO setup, let’s do a couple prep steps.

  • First, we need to make sure our repo is set up correctly.
    • The generally accepted best practice is to put your Habitat related files in a habitat/ subdirectory.
    • We also need to put our azure-pipelines.yml file in the root of our repo.
    ├── README.md
    ├── azure-pipelines.yml
    └── habitat
        ├── config
        │   ├── standalone.conf
        │   └── wildfly.xml
        ├── default.toml
        ├── hooks
        │   ├── init
        │   └── run
        └── plan.sh
    

    NOTE: Here are the links to the example repo and the example pipeline files.

Cool. Now that we have our source repo tuned up, let’s move onto setting up the ADO bits.

  • If this is your first foray into ADO, you’ll need to create an organization first.
  • Now that we have an org, let’s create a project. Click on the New Project button. pipeline stages
  • Name your project, select the appropriate visibility, and click Create.
  • Since BLDR requires signing keys and a token to build and upload packages, we need to give that info to ADO.
  • Click on the Project Settings in the lower left corner, then scroll down and select Service Connections. pipeline stages
  • Most of these attributes should be straightforward. The only one that isn’t obvious is the Revision. It’s the timestamp bit in your public key. pipeline stages

    NOTE: You may be wondering about the connection name. You aren’t required to use bldr for the name, however, I would recommend that you make it something very obvious. If you do elect to use something different, you’ll need to edit your azure-pipelines.yml file and update the tasks that reference the bldr Habitat Origin.

  • The final prep step is to create the pipeline environments. Since we’re doing a three stage deployment, we’ll need to create three environments. The reason being, environments have the option for a manual approval check. We want the pipeline to build and automatically promote the new package to our dev channel, but we don’t want it to immediately promote to stg or prod without approval.
    • On the left-hand sidebar, hover over pipelines, the click Environments.
    • Click New environment, type in devthen click create. Repeat for stg and prod.
    • Click on the stg environment, then click the vertical ellipsis icon in the top right, and select Checks.
    • Click the Create button, select the appropriate user, then click the Create button.
    • Repeat for the Prod environment.

We’re almost there, the final steps set up the actual pipeline.

  • Now click on Pipelines from the left-hand sidebar, then click Create Pipeline
  • As you can see, we have several options: repo options
  • In our case, we’ll select GitHub (YAML). Then select our Wildfly repo.
  • Since we already added the azure-pipelines.yml file to our repo earlier, there’s actually very little to do on the Review screen. I normally skim the code to make sure there aren’t any errors from ADO’s linter. If everything looks good, go ahead and click Run.
  • From here, we should see the pipeline kick off and start a build of our package.

Closing

Hopefully, you’ve found a couple things that will help you be successful. If you have any questions or feedback, please feel free to contact me: @jamesmassardo

Effectively Managing Audits and Compliance

|

Summary

Greetings! Today, I’ll be sharing some of my experiences working with audit and compliance. In a previous life, I was responsible for providing compliance data and responses for a portion of multiple internal audits and external examinations.

Discussion

In my experience, audits are really about proving that you did what you said you were going to do. There are sets of standards but they are really only guidance, it’s up to each individual company to define their specific policies. For example, the standard may say that all critical patches must be applied within 30 days of release. However, a company can reclassify a critical browser patch to a lower criticality if they have a mitigating control, like no internet access in the datacenter.

So how do we define what we’re going to do? What I’ve found to be most successful is having the following:

  • Policies - These define what you will and won’t do. I.e. All critical patches are installed within 30 days of release, or Administrative user passwords are rotated every 90 days.
  • Processes - These are normally a visual representation of your procedures. This is also a good place to show any dependencies on other teams and/or processes.
  • Procedures - These define how you do things. These should be step by step and provide detail on everything to complete the task.
  • Mitigating controls - This is where you provide additional detail about anything you may have that would lessen the risk and/or attack surface.
  • Reporting/Validation - Have reports or accessible data that demonstrates your compliance. Also, have documentation for how the report is produced and validated. I.e. Here’s the source code repository for the report code and the profiles that collect the data.

Story time. A few years ago, we had a new external examiner come in to review some of our processes. We gave him the reports that we had automated and they weren’t in the format he wanted. He told us that we’d had to run his data collection script or we wouldn’t pass because our reports weren’t acceptable. He handed me a USB stick with a set of batch files and VB scripts. I told him we’d need to peer review the script and do some testing with it before we could run it.

He got mad and demanded that we run it THAT day or we would fail. Needless to say, this didn’t sit well with us. I asked him “So you want us to take an untrusted USB drive and plug it into a server in our PCI datacenter; then you want us to run an untested, unverified script without following our peer review/ code testing process; and you want us to violate our change management processes by deploying something straight to production without following our deployment process?”

It was quite funny to watch him look for a hole to crawl in as he realized that there were multiple directors and sr. directors in the room and he had lost all of his credibility.

I’ve had other auditors and examiners push folks to see if they will hold to their processes but some will try to push you around just because they think they can.

Implementation

For the policies and procedures, I normally keep these in documents that support change tracking (Word Docs stored in SharePoint or Google Docs). This allows you to show changes over time in your docs. It also allows you to show the auditor/examiner that you regularly update documentation.

The same things apply to the process docs/diagrams as well. I’ve used Visio, Draw.io, and LucidChart but any tool should work. Standard flow charts work for some processes but in general, I’ve found that cross-functional flow charts (sometimes called swimlanes) work the best as they allow you to accurately represent how requests flow between the various teams. These have a tremendous value outside of the audit process as well. It helps all the teams involved understand their parts. It also helps when you onboard new employees. It’s easy for them to understand how work gets done.

In larger enterprises, mitigating controls are normally tracked along with the risk acceptance documentation in a GRC tool. For smaller organizations, you can track these in documents and store it with your other documentation. Really, the main thing is to explicitly call out the specific requirement (i.e. PCI 6.1) and identify what you’re changing and why you’re changing.

For the reporting piece, a code driven process is definitely a plus. Store the profiles you use to test the audit items version control. Follow a standardized SDLC process. Use this information to demonstrate your processes.

Example:

  • Profile changes are stored in version control
  • Changes are peer reviewed before merging
  • Changes follow an approval process to merge
  • Merged changes flow through a standardized pipeline that requires testing/validation
  • After testing, changes have an approval process for release to production.

With this, you can show that every change was tracked, reviewed, and had the appropriate approvals.

One other recommendation is to track the requirement information in the metadata of the control/test. This allows you to quickly find information about a specific control when you are working with the auditor/examiner.

Metadata example:

control 'sshd-8' do
  impact 0.6
  title 'Server: Configure the service port'
  desc 'Always specify which port the SSH server should listen.'
  desc 'rationale', 'This ensures that there are no unexpected settings' # Requires Chef InSpec >=2.3.4
  tag 'ssh','sshd','openssh-server'
  tag cce: 'CCE-27072-8'
  ref 'NSA-RH6-STIG - Section 3.5.2.1', url: 'https://www.nsa.gov/ia/_files/os/redhat/rhel5-guide-i731.pdf'

  describe sshd_config do
    its('Port') { should cmp 22 }
  end
end

Closing

In closing, be prepared, be confident, and be thorough and you’ll do fine. If you have any questions or feedback, please feel free to contact me: @jamesmassardo

Habitat for Windows in Production

|

Summary

Greetings! Today, I’ll be sharing some notes about running Habitat for Windows in Production.

Notes

Great! So we’ve done some awesome work to package all our apps in Habitat and we’ve done a ton of testing. Now what?

First we need to answer some questions:

  • Where are we going to store our artifacts?
    • BLDR or On-Prem Depot?
  • How are we going to get Habitat on our nodes?
    • Newly provisioned systems? existing fleet?
  • How are we going to run Habitat on those nodes?
  • How do we want to share data amongst the supervisors?
    • Everything in one big ring? Multiple rings? Bastion rings?
  • How do we secure the rings?
  • How will we update artifacts?
    • How do we decide on a channel structure?
  • How do we update configurations when there’s a change?
  • How do we monitor the supervisors?

Artifact Storage

With Habitat, we have two options for storing artifacts: BLDR and the Depot.

BLDR is Chef’s cloud service for storing artifacts. BLDR makes it super easy and fast to get started, however, it lacks some of the RBAC and security that some enterprises want for production usage.

The Habitat Depot is an on-prem version of BLDR. Since it’s running in your environments, you have more control over the authentication and access. If you have proprietary code in your packages, this also allows you to keep your code inside your environments.

My personal guidance would be: if you are producing open source code, use BLDR; if you are running proprietary or protected code, use the Depot.

Provisioning

First step is to get the Habitat binaries on our nodes. So how do we do that? Well… If you use Terraform, you’re in luck. We have a Habitat provisioner for Terraform to simplify this task.

Terraform

Windows takes an extra step because the provisioner isn’t directly in Terraform yet.

Install plugin:

wget https://github.com/chef-partners/terraform-provisioner-habitat/releases/download/0.1/terraform-provisioner-habitat_dev_v0.1_darwin_amd64.tar.gz

tar -xvf terraform-provisioner-habitat_dev*.tar.gz

mkdir -p ~/.terraform.d/plugins/

mv terraform-provisioner-habitat_dev*/terraform-provisioner-habitat_dev* ~/.terraform.d/plugins/

NOTE: This example fetches the macOS binary. If you are on another platform, you’ll need to fetch the appropriate release.

Now that we have the plugin installed, we can use something like:

provisioner "habitat_dev" {
    peer = ""
    service {
      name = "core/sqlserver"
      topology = "standalone"
    }
    connection {
      type = "winrm"
      timeout = "10m"
      user = "${var.win_username}"
      password = "${var.win_password}"
    }
}

Other provisioning

Ok, so what about all the folks that can’t use Terraform. In that case, we need to install Habitat directly on the nodes. There are two ways: if you use Chef, you can use the Habitat cookbook on your existing nodes; if not, you can use your existing provisioning system to add in steps to install the binaries. If you don’t have a provisioning or configuration management tool, check out Chef Workstation. It has the ability to run cookbooks in an ad-hoc fashion, either manually or scripted.

Note: You’ll notice the stark absence of sample code for this section. Unfortunately, there are so many permutations of infrastructure management tools that it’s impossible to cover them. Fortunately, Habitat is quite easy to install. If you build a way to solve this problem, contact me and I’ll add a link to your solution.

YAY!!! We’re done, right? Well… yes and no. We now have Habitat installed but we still need to do a few things…

Running the Supervisor

If you are using a Terraform provisioner, most of these items are taken care of for you. If you are managing your own installation process, you’ll need to decide how you want to run the Habitat services.

As with most things Habitat, we have options. Let’s compare our options:

Option Pros Cons
Windows Service Built-in log management Not viable for apps with UI (see note in the Windows Service section.)
Scheduled Task Good for running the supervisor as a specific user Requires additional setup. Same logging requirements as Startup script
Startup script Useful with apps that require User Interaction No built-in logging. Script must redirect stdout to file and handle log rotation independently

Tasks and Scripts

For the scheduled task and start-up script, you’ll need to use either your provisioning system or config. mgmt. tool to create it.

If you use Chef, here’s a simple resource to get you started:

windows_task 'Start Habitat Supervisor' do
  command 'hab sup run'
  cwd 'c:/habitat/'
  run_level :highest
  frequency :onstart
end

NOTE: As I mentioned earlier, you’ll need to add additional bits to this resource to log the supervisor output. You’ll also need to rotate those logs.

Window Service

The Windows Service option is the simplest option. It’s also the recommended option for most applications. If you have Chef, you can use the Habitat cookbook to manage the service. Otherwise, use your provisioning/config tools to run the following commands:

hab pkg install core/windows-service
hab pkg exec core/windows-service install
Start-Service -ServiceName Habitat

NOTE: Although Windows services are the preferred method, Windows services are not able to interact with a user session so if you need to launch any type of UI, consider using a scheduled task or other method.

Rings

If you aren’t familiar with the concept of supervisor rings, essentially, a ring is a way for supervisors to share data amongst each other. This information is normally configuration data that’s exported by supervised services or data that’s injected into the ring via something like hab config apply ....

So why would we want a ring? Well… thanks to Josh Hudson and Andrew Defour, I can give you several reasons you want a ring:

1) Supervisors in the same service group that need to peer for leader election. 2) Supervisors across dependent services that rely on service bindings for config/service coordination. 3) Service discovery (I spun up in this environment – give me a database) 4) Perceived ease of delivering global config changes.

This las one sounds a little condescending however, as we talk about building a singular shape for shipping change, configuration changes should follow that same shape just like any other change.

Now that we know some of the reasons why we might want a ring, how do we get one? Technically, we could use one big ring assuming we aren’t gossiping a ton of data. In reality, we’ll most likely use multiple rings.

Most Habitat deployments naturally partition by:

  • A combo of Application or service boundaries
  • Logical environments (dev, stage, prod)
  • Network failure domains (regional data centers)
  • Per-tenant environments (managed services teams, SaaS platforms)
  • Micro-segmentation / East-West partitioning (netsec)

So what about Bastion rings? This is definitely something you should do in large deployments. These should actually be called Permanent Peers because they are still part of a given ring, however, they are there to manage the gossiping of data and don’t serve actual content. Let’s say we have a web farm, we may designate a trio of servers to host the ring. These systems would be joined with the --permanent-peer option so they always attempt to stay connected. Every other supervisor would be peered to these 3 servers using the --peer option.

Let’s look at an example. We have 3 supervisor systems (192.168.0.1-3). We would run the following commands to start the supervisor and peer them to each other.

# Command for 192.168.0.1
hab sup run --peer=192.168.0.2 --peer=192.168.0.3

# Command for 192.168.0.2
hab sup run --peer=192.168.0.1 --peer=192.168.0.3

# Command for 192.168.0.3
hab sup run --peer=192.168.0.1 --peer=192.168.0.2

Additional reading on peering here.

Security

Now that we have a basic ring, how do we secure it? There are several pieces, so let’s dive right in.

What about rotating keys? how we do deliver new keys and restart supervisors without downtime

  • We need encryption keys to protect our data on the wire so let’s generate one:

    hab ring key generate yourringname
    

    NOTE: You’ll need to distribute this key to each of your supervisors in the ring. If you have multiple rings, you’ll need to do this step for each ring. Having separate keys for each ring limits your exposure if one of your keys is exposed or compromised.

  • We also need a service group key for protecting service group configuration:

    hab svc key generate servicegroupname.example yourorg
    

    NOTE: We haven’t discussed Service Groups yet although we’ll briefly touch on them in the next section. You can read more about them here.

    Start the Supervisor, specifying both the service group and organization that it belongs to:

    hab start --org yourorg --group servicegroupname.example yourorigin/yourapp
    

    NOTE: Only users whose public keys that the Supervisor already has in its cache will be allowed to reconfigure this service group.

  • To protect your hab http gateway (holds sensitive config and ring data), set the HAB_SUP_GATEWAY_AUTH_TOKEN environment variable to a value on all of your supervisors. Once this is set, the supervisor will use http auth to access the endpoint.

    export HAB_SUP_GATEWAY_AUTH_TOKEN=<mygatewaypassword>
    

Additional reading on encryption here.

Channels/ Package Updates

How will we update artifacts? How do we decide on a channel structure?

Configuration Changes

How do we update configurations when there’s a change?

Monitoring

How do we monitor the supervisors?

Closing

As you can see, there are a number of things to consider; however, most all of them are small, easy to complete chunks of work. It feels like a ton of work to start with, but it normally only takes a few days to work through all of it.

Special thanks to Josh Hudson and Andrew Defour for letting me use info from their ChefConf 2019 session. I highly recommend checkout out their presentation.

As always, if you have any questions or feedback, please feel free to contact me: @jamesmassardo