Overview
In the introductory post about the new
support for source => "http://..."
parameters in Puppet’s file
resource type,
I promised a follow-up that would explain the technical details of the implementation.
This is not that post.
Instead, seeing as Puppet 4.4
will hopefully be released
soon and make this feature available to you, I will explain the caveats in greater
detail first and add some perspective on how to best circumvent them.
Apparent incoherence
So you switched your first files to common HTTP sources. You will likely notice that Puppet keeps presenting a message like the following to you:
ffrank@fflaptop:~/git/puppet$ bundle exec puppet apply -e'
file { "/tmp/http-test":
source => "http://ffrank.github.io/articles/2015/12/21/building-http-support/"
}'
Notice: Compiled catalog [...]
Notice: /Stage[main]/Main/File[/tmp/http-test]/content: \
content changed '{md5}0e5a75aecc19936a3a80c9e05040269e' \
to '{mtime}2015-12-27 23:51:55 +0100'
Notice: Applied catalog in 0.97 seconds
This does not appear to be sensible and might actually look like a bug. Why is an MD5 sum compared to a time stamp? Well, it’s not a bug - it is indeed a feature.
The fundamental problem (see the previous post)
is that the HTTP server will not supply
a full set of metadata to the agent. On the contrary - most web servers will not even
include the MD5 sum of the server-side file in the response headers. Fortunately, the
Last-Modified
header is quite popular, so Puppet can fall back to that.
Still, without explicit instructions, Puppet will use md5
for checksumming the local copy regardless.
Going from there, it cannot help but compare these incompatible sums and fail.
The undesirable result is that Puppet will download such a file during each transaction.
The workaround
For a clean run with the same resource, add the checksum => "mtime"
parameter to select
this “checksumming” strategy for the file in question.
ffrank@fflaptop:~/git/puppet$ bundle exec puppet apply -e '
file { "/tmp/http-test":
source => "http://ffrank.github.io/articles/2015/12/21/building-http-support/",
checksum => "mtime",
}'
Notice: Compiled catalog for [...]
Notice: Applied catalog in 0.76 seconds
This is not even too bad. Comparing time stamps is a quick way to detect upstream changes. It assumes that clocks are sufficiently synchronous and that there is no tampering with file contents on the agent side. If either assumption is broken, Puppet will not be able to recognize diverged state.
This is why you probably don’t want to synchronize your files from random remote HTTP servers.
The Right Way™
Getting Puppet to synchronize files with plain old HTTP servers with the usual convenience
is possible, once we bring MD5 checksums back into the game. Of the more popular HTTP servers,
Apache is the only one that will readily provide the Content-MD5
header. Puppet will consume
this header to verify local file content.
Still, even with Apache’s popularity, most if not all public sites don’t choose to enable this header. How is this helpful then? Well, the best way to take advantage of the HTTP support feature is to run some local Apache servers to distribute those large tarballs and binary blobs to your agents.
Setting up an appropriate server is easy. Here is a minimalistic manifest that will do the footwork:
include apache
apache::vhost {
'puppet':
docroot => '/var/lib/puppet-files',
custom_fragment => "ContentDigest on\n",
}
Apache will now serve files from /var/lib/puppet-files
and include their MD5 sums in the
Content-MD5
header. Using appropriate URLs as Puppet file sources will Just Work.
But why?
You gained the ability to serve files to your Puppet agents without using masters as file servers. But now you need to provision Apache servers. How is this beneficial?
In most setups, this adds a convenient place to store those pesky large archives on which some manifests rely for bootstrapping your software. Doing that outside your masters has at least two benefits.
- Most users have all local Puppet modules in version control. This is not where you want large tarballs.
- Serving large files to agents binds valuable resources that your masters cannot use for meaningful work until transfers finish.
This small list is not exhaustive. There are a number of further operational advantages.
Summary
Use http://
URLs for Puppet files responsibly. Only point them at upstream servers if the
URL is unambiguous (e.g. pointing at a versioned file. Avoid puppet-latest.tar.gz
in favor
of puppet-4.4.0.tar.gz
). If you do this, make sure to specify
checksum => 'mtime'
.
In most cases, you will want to use this support to download large files from your own
local Apache servers. The vhosts that are responsible for this task will likely be dedicated to it.
Make sure to set the ContentDigest
setting for the appropriate vhost or directory.
Do not set Puppet to use mtime
in this scenario.