Infrastructure As Code: Automated Deployments With Ansbile
Automate, automate, automate.
In this chapter we’re going to spin up an actual server, make it accessible on the Internet with a real domain name, and then we’re going to install our app on it, using our container.
We could do all these things manually, but a key insight of the modern infrastructure management is that automation really pays off in reducing maintenance burdens.
It’s also key to making sure our tests give us true confidence over our deployments. If we go to the trouble of building a staging server,[1] we want to make sure that it’s as similar as possible to the production environment. By automating the way we deploy, and using the same automation for staging and prod, we give ourselves much more confidence.
The buzzword for automating your deployments these days is "Infrastructure as Code".
Why not ping me a note once your site is live on the web, and send me the URL? It always gives me a warm and fuzzy feeling… [email protected]. |
Getting a Domain Name
We’re going to need a couple of domain names at this point in the book—they can both be subdomains of a single domain. I’m going to use superlists.ottg.co.uk and staging.ottg.co.uk. If you don’t already own a domain, this is the time to register one! Again, this is something I really want you to actually do. If you’ve never registered a domain before, just pick any old registrar and buy a cheap one—it should only cost you $5 or so, and you can even find free ones. I promise seeing your site on a "real" website will be a thrill.
Manually Provisioning a Server to Host Our Site
We can separate out "deployment" into two tasks:
-
Provisioning a new server to be able to host the code
-
Deploying a new version of the code to an existing server
Ultimately, infrastructure-as-code tools can let you automate both of these, but for the purposes of this book, we can live with manual provisioning.
I should probably stress once more that deployment is something that varies a lot, and that as a result there are few universal best practices for how to do it. So, rather than trying to remember the specifics of what I’m doing here, you should be trying to understand the rationale, so that you can apply the same kind of thinking in the specific future circumstances you encounter.
Choosing Where to Host Our Site
There are loads of different solutions out there these days, but they broadly fall into two camps:
-
Running your own (probably virtual) server
-
Using a Platform-As-A-Service (PaaS) offering like Heroku or my old employers, PythonAnywhere.
Particularly for small sites, a PaaS offers a lot of advantages, and I would definitely recommend looking into them. We’re not going to use a PaaS in this book however, for several reasons. The main reason is that I want to avoid endorsing specific commercial providers. Secondly, all the PaaS offerings are quite different, and the procedures to deploy to each vary a lot—learning about one doesn’t necessarily tell you about the others. Any one of them might radically change their process or business model by the time you get to read this book.
Instead, we’ll learn just a tiny bit of good old-fashioned server admin, including SSH and manual server config. They’re unlikely to ever go away, and knowing a bit about them will get you some respect from all the grizzled dinosaurs out there.
Spinning Up a Server
I’m not going to dictate how you do this—whether you choose Amazon AWS, Rackspace, Digital Ocean, your own server in a data centre, or a Raspberry Pi in a cupboard under the stairs, any solution should be fine, as long as:
-
Your server is running Ubuntu 22.04 (aka "Jammy/LTS").
-
You have root access to it.
-
It’s on the public internet.
-
You can SSH into it.
I’m recommending Ubuntu as a distro because it’s popular and I’m used to it. If you know what you’re doing, you can probably get away with using something else, but you’re on your own.
If you’ve never started a Linux server before and you have absolutely no idea where to start, I wrote a very brief guide on GitHub.
Some people get to this chapter, and are tempted to skip the domain bit, and the "getting a real server" bit, and just use a VM on their own PC. Don’t do this. It’s not the same, and you’ll have more difficulty following the instructions, which are complicated enough as it is. If you’re worried about cost, have a look at the link above for free options. |
User Accounts, SSH, and Privileges
In these instructions, I’m assuming that you have a nonroot user account set up, and that it has "sudo" privileges, so whenever we need to do something that requires root access, we use sudo, (or "become" in ansible terminology), and I’m explicit about that in the various instructions that follow.
My user is called "elspeth", but you can call yours whatever you like! Just remember to substitute it in all the places I’ve hardcoded it below. See the guide linked above if you need tips on creating a sudo user.
Configuring Domains for Staging and Live
We don’t want to be messing about with IP addresses all the time, so we should point our staging and live domains to the server. At my registrar, the control screens looked a bit like Domain setup.
In the DNS system, pointing a domain at a specific IP address is called an "A-Record". All registrars are slightly different, but a bit of clicking around should get you to the right screen in yours. You’ll need two A-records: one for the staging address and one for the live one. No need to worry about any other type of record.
DNS records take some time to "propagate" around the world (it’s controlled by a setting called "TTL", Time To Live), so once you’ve set up your A-record, you can check its progress on a "propagation checking" service like this one: https://www.whatsmydns.net/#A/staging.ottg.co.uk.
I’m planning to host my staging server at staging.ottg.co.uk
Installing ansible
TODO:
suggests pipx. could also install it in the local virtualenv? may need to add docker-sdk
A first Cut of an Ansible Script
Infrastructure-as-code tools, also called "configuration management" tools, come in lots of shapes and sizes. Chef and Puppet were two of the original ones, and you’ll probably come across Terraform, which is particularly strong on managing cloud services like AWS.
We’re going to use Ansible, because it’s relatively popular, because it can do everything we need it to, because I’m biased that it happens to be written in Python, and because it’s probably the one I’m personally most familiar with.
Another tool could probably have worked just as well! The main thing to remember is the concept, which is that, as much as possible we want to manage our server configuration declaratively, by expressing the desired state of the server in a particular config syntax, rather than specifying a procedural series of steps to be followed one by one.
Let’s dip our toes into ansible, and see if we can get it to run a simple "hello world" container on our server.
Here’s what’s called a "playbook" in ansible terminology. It’s in a format called YAML (Yet Another Markup Language), which, if you’ve never come across before, you will soon develop a love-hate[2] relationship with.
---
- hosts: all
tasks:
- name: Install docker (1)
ansible.builtin.apt: (2)
name: docker (3)
state: latest
update_cache: true
become: true
- name: Run test container
community.docker.docker_container:
name: testcontainer
state: started
image: busybox
command: echo hello world
become: true
1 | An ansible playbook is a series of "tasks"
(so in that sense it’s still quite sequential and procedural),
but the individual tasks themselves are quite declarative.
Each one usually has a human-readable name attribute. |
2 | Each tasks uses an ansible "module" to do its work.
The next few use the builtin.apt module which provides
a wrapper around the apt Debian & Ubuntu package management tool. |
3 | Each module then provides a bunch of parameters which control how it works.
Here we specify the name of the package we want to install ("docker")
and tell it update its cache first, which is required on a fresh server. |
Most ansible modules have pretty good documentation,
check out the builtin.apt
one for example.
I often skip to the
Examples section.
$ ansible-playbook --user=elspeth -i staging.ottg.co.uk, infra/ansible-provision.yaml -vv ansible-playbook [core 2.16.3] config file = None [...] No config file found; using defaults Skipping callback default, as we already have a stdout callback. Skipping callback minimal, as we already have a stdout callback. Skipping callback oneline, as we already have a stdout callback. PLAYBOOK: ansible-provision.yaml 1 plays in infra/ansible-provision.yaml PLAY [all] TASK [Gathering Facts] * task path: ...goat-book/superlists/infra/ansible-provision.yaml:2 ok: [staging.ottg.co.uk] PLAYBOOK: ansible-provision.yaml * 1 plays in infra/ansible-provision.yaml TASK [Install docker] task path: ...goat-book/superlists/infra/ansible-provision.yaml:6 ok: [staging.ottg.co.uk] => {"cache_update_time": 1708981325, "cache_updated": true, "changed": false} TASK [Install docker] * task path: ...goat-book/superlists/infra/ansible-provision.yaml:6 changed: [staging.ottg.co.uk] => {"cache_update_time": [...] "cache_updated": true, "changed": true, "stderr": "", "stderr_lines": [], "stdout": "Reading package lists...\nBuilding dependency tree...\nReading [...] information...\nThe following additional packages will be installed:\n wmdocker\nThe following NEW packages will be installed:\n docker wmdocker\n0 TASK [Run test container] task path: ...goat-book/superlists/infra/ansible-provision.yaml:13 changed: [staging.ottg.co.uk] => {"changed": true, "container": {"AppArmorProfile": "docker-default", "Args": ["hello", "world"], "Config": [...] PLAY RECAP ** staging.ottg.co.uk : ok=3 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
I don’t know about you, but whenever I make a terminal spew out a stream of output, I like to make little brrp brrp brrp noises, a bit like the computer Mother, in Alien. Ansible scripts are particularly satisfying in this regard.
SSHing Into the Server and Viewing Container Logs
Time to get into some good old-fashioned sysadmin! Let’s SSH in to our server and see if we can see any evidence that our container has run.
We use docker ps -a
to view all containers, including old/stopped ones,
and we can use docker logs
to view the output from one of them:
$ ssh [email protected] Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-67-generic x86_64) [...] elspeth@server$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3a2e600fbe77 busybox "echo hello world" 2 days ago Exited (0) 10 minutes ago testcontainer elspeth@server:$ docker logs testcontainer hello world
Look out for that elspeth@server
in the command-line listings in this chapter.
It indicates commands that must be run on the server,
as opposed to commands you run on your own PC.
|
SSHing in to check things worked is a key server debugging skill! It’s something we want to practice on our staging server, because ideally we’ll want to avoid doing it on production machines.
Getting our image onto the server
Typically, you can "push" and "pull" container images to a "container registry" — Docker offers a public one called DockerHub, and organisations will often run private ones, hosted by cloud providers like AWS.
So your process of getting an image onto a server is usually
* Push the image from your machine to the registry
* Pull the image from the registry onto the server.
Usually this step is implicit,
in that you just specifying the image name in the format registry-url/image-name:tag,
and then docker run
takes care of pulling down the image for you.
But I don’t want to ask you to create a DockerHub account, or implicitly endorse any particular provider, so we’re going to "simulate" this process by doing it manually.
It turns out you can "export" a container image to an archive format, manually copy that to the server, and then re-import it. In ansible config, it looks like this:
---
- hosts: all
tasks:
- name: Install docker
ansible.builtin.apt:
name: docker
state: latest
become: true
- name: Export container image locally (1)
community.docker.docker_image:
name: superlists
archive_path: /tmp/superlists-img.tar
source: local
delegate_to: 127.0.0.1
- name: Upload image to server (2)
ansible.builtin.copy:
src: /tmp/superlists-img.tar
dest: /tmp/superlists-img.tar
- name: Import container image on server (3)
community.docker.docker_image:
name: superlists
load_path: /tmp/superlists-img.tar
source: load
state: present
become: true
- name: Run container
community.docker.docker_container:
name: superlists
image: superlists
state: started
recreate: true
1 | We export the docker image to a .tar file by using the docker_image module
with the archive_path set to temp file, and setting the delegate_to attribute
to say we’re running that command on our local machine rather than the server. |
2 | We then use the copy module to upload the tarfile to the server |
3 | And we use docker_image again but this time with load_path and source: load
to import the image back on the server |
$ ansible-playbook --user=elspeth -i staging.ottg.co.uk, infra/ansible-provision.yaml -vv [...] PLAYBOOK: ansible-provision.yaml 1 plays in infra/ansible-provision.yaml PLAY [all] TASK [Gathering Facts] * task path: ...goat-book/superlists/infra/ansible-provision.yaml:2 ok: [staging.ottg.co.uk] TASK [Install docker] task path: ...goat-book/superlists/infra/ansible-provision.yaml:5 ok: [staging.ottg.co.uk] => {"cache_update_time": 1708982855, "cache_updated": false, "changed": false} TASK [Export container image locally] * task path: ...goat-book/superlists/infra/ansible-provision.yaml:11 changed: [staging.ottg.co.uk -> 127.0.0.1] => {"actions": ["Archived image superlists:latest to /tmp/superlists-img.tar, overwriting archive with image 11ff3b83873f0fea93f8ed01bb4bf8b3a02afa15637ce45d71eca1fe98beab34 named superlists:latest"], "changed": true, "image": {"Architecture": "amd64", [...] TASK [Upload image to server] * task path: ...goat-book/superlists/infra/ansible-provision.yaml:18 changed: [staging.ottg.co.uk] => {"changed": true, "checksum": "313602fc0c056c9255eec52e38283522745b612c", "dest": "/tmp/superlists-img.tar", [...] TASK [Import container image on server] task path: ...goat-book/superlists/infra/ansible-provision.yaml:23 changed: [staging.ottg.co.uk] => {"actions": ["Loaded image superlists:latest from /tmp/superlists-img.tar"], "changed": true, "image": {"Architecture": "amd64", "Author": "", "Comment": "buildkit.dockerfile.v0", "Config": [...] TASK [Run container] * task path: ...goat-book/superlists/infra/ansible-provision.yaml:32 changed: [staging.ottg.co.uk] => {"changed": true, "container": {"AppArmorProfile": "docker-default", "Args": ["--bind", ":8888", "superlists.wsgi:application"], "Config": {"AttachStderr": true, "AttachStdin": false, "AttachStdout": true, "Cmd": ["gunicorn", "--bind", ":8888", "superlists.wsgi:application"], "Domainname": "", "Entrypoint": null, "Env": [...]
For completeness, let’s also add a step to explicitly build the image locally.
This means we don’t have a dependency on having run docker build
locally.
- name: Install docker
[...]
- name: Build container image locally
community.docker.docker_image:
name: superlists
source: build
state: present
build:
path: ..
platform: linux/amd64 (1)
force_source: true
delegate_to: 127.0.0.1
- name: Export container image locally
1 | Having this step also allows us to work around an issue
with compatility between Apple’s new ARM-based chips and
our server’s x86/amd64 architecture.
You could also use this platform: to cross-build docker images
for a rasbperry pi from a regular PC, or vice-versa. |
In any case, let’s see if it works!
$ ssh [email protected] Welcome to Ubuntu 22.04.4 LTS (GNU/Linux 5.15.0-67-generic x86_64) [...] elspeth@server$ docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3a2e600fbe77 busybox "echo hello world" 2 days ago Exited (0) 10 minutes ago testcontainer elspeth@server:$ docker logs testcontainer [2024-02-26 22:19:15 +0000] [1] [INFO] Starting gunicorn 21.2.0 [2024-02-26 22:19:15 +0000] [1] [INFO] Listening at: http://0.0.0.0:8888 (1) [2024-02-26 22:19:15 +0000] [1] [INFO] Using worker: sync [...] File "/src/superlists/settings.py", line 22, in <module> SECRET_KEY = os.environ["DJANGO_SECRET_KEY"] ~~~~^^^^^^^ File "<frozen os>", line 685, in getitem KeyError: DJANGO_SECRET_KEY [2024-02-26 22:19:15 +0000] [7] [INFO] Worker exiting (pid: 7) [2024-02-26 22:19:15 +0000] [1] [ERROR] Worker (pid:7) exited with code 3 [2024-02-26 22:19:15 +0000] [1] [ERROR] Shutting down: Master [2024-02-26 22:19:15 +0000] [1] [ERROR] Reason: Worker failed to boot.
Ah woops, we need to set those environment variables on the server too.
Using an env File to Store Our Environment Variables
When we run our container manually locally, we can pass in environment variables with the -e
flag.
But we don’t want to hard-code secrets like SECRET_KEY into our ansible files
and commit them to our repo!
Instead, we can use ansible to automate the creation of a secret key, and then save it to a file on the server, where it will be relatively secure (better than saving it to version contorl and pushing it to GitHub in any case!)
We can use a so-called "env file" to store environment variables,
which are essentially a list of key-value pairs using shell syntax,
a bit like you’d use with export
.
One extra subtlety is that we want to vary the actual contents of the env file, depending on where we’re deploying to. Each server should get its own unique secret key, adn we want different config for staging and prod, for example.
So, just as we inject variables into our html templates in Django, we can use a templating language called "jinja2" to have variables in our env file. It’s a common tool in ansible scripts, and the syntax is very similar to Django’s.
Here’s what our template for the env file will looks like:
DJANGO_DEBUG_FALSE=1
DJANGO_SECRET_KEY="{{ secret_key }}"
DJANGO_ALLOWED_HOST="{{ host }}"
And here’s how we use it in the provisioning script:
- name: Import container image on server
[...]
- name: Ensure .env file exists
ansible.builtin.template: (1)
src: env.j2
dest: ~/superlists.env
force: false # do not recreate file if it already exists. (2)
vars: (3)
host: "{{ inventory_hostname }}" (4)
secret_key: "{{ lookup('password', '/dev/null length=32 chars=ascii_letters,digits') }}"
- name: Run container
community.docker.docker_container:
name: superlists
image: superlists
state: started
recreate: true
env_file: ~/superlists.env
1 | We use ansible.builtin.template to specify the local template file to use (src ),
and the destination (dst ) on the server |
2 | force: false means we will only write the file once.
So after the first time we generate our secret key, it won’t change. |
3 | The vars section defines the variables we’ll inject into our template. |
4 | We actually use a built-in ansible variable called inventory_hostname .
This variable woul actually be available in the template already,
but I’m renaming it for clarity. |
Using an env file to store secrets is definitely better than committing it to version control, but it’s maybe not the state of the art either. TODO: mention other secret management tools. vault |
Let’s run the latest version of our playbook and see how our tests get on:
$ ansible-playbook --user=elspeth -i staging.ottg.co.uk, infra/ansible-provision.yaml -v [...] PLAYBOOK: ansible-provision.yaml 1 plays in infra/ansible-provision.yaml PLAY [all] TASK [Gathering Facts] * ok: [staging.ottg.co.uk] TASK [Install docker] ok: [staging.ottg.co.uk] => {"cache_update_time": 1709136057, "cache_updated": false, "changed": false} TASK [Build container image locally] changed: [staging.ottg.co.uk -> 127.0.0.1] => {"actions": ["Built image [...] TASK [Export container image locally] * changed: [staging.ottg.co.uk -> 127.0.0.1] => {"actions": ["Archived image [...] TASK [Upload image to server] changed: [staging.ottg.co.uk] => {"changed": true, [...] TASK [Import container image on server] changed: [staging.ottg.co.uk] => {"actions": ["Loaded image [...] TASK [Ensure .env file exists] changed: [staging.ottg.co.uk] => {"changed": true, [...] TASK [Run container] changed: [staging.ottg.co.uk] => {"changed": true, "container": [...] PLAY RECAP staging.ottg.co.uk : ok=8 changed=6 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Looks good! What do our tests think?
More debugging
We run our tests as usual and run into a new problem:
$ TEST_SERVER=staging.ottg.co.uk python manage.py test functional_tests [...] selenium.common.exceptions.WebDriverException: Message: Reached error page: about:neterror?e=connectionFailure&u=http%3A//staging.ottg.co.uk/[...]
That neterror
makes me think it’s another networking problem.
Let’s try curl
locally:
$ curl -iv staging.ottg.co.uk [...] curl: (7) Failed to connect to staging.ottg.co.uk port 80 after 25 ms: Couldn't connect to server
Now let’s ssh in and try curl
from the server itself:
elspeth@server$ docker logs superlists [2024-02-28 22:14:43 +0000] [7] [INFO] Starting gunicorn 21.2.0 [2024-02-28 22:14:43 +0000] [7] [INFO] Listening at: http://0.0.0.0:8888 (7) [2024-02-28 22:14:43 +0000] [7] [INFO] Using worker: sync [2024-02-28 22:14:43 +0000] [8] [INFO] Booting worker with pid: 8
No errors in the logs…
elspeth@server$ curl -iv localhost * Trying 127.0.0.1:80... * connect to 127.0.0.1 port 80 failed: Connection refused * Trying ::1:80... * connect to ::1 port 80 failed: Connection refused * Failed to connect to localhost port 80 after 0 ms: Connection refused * Closing connection 0 curl: (7) Failed to connect to localhost port 80 after 0 ms: Connection refused
Hmm, curl
fails on the server too.
But all this talk of port 80
, both locally and on the server, might be giving us a clue.
Let’s check docker ps
:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1dd87cbfa874 superlists "/bin/sh -c 'gunicor…" 9 minutes ago Up 9 minutes superlists
This might be ringing a bell now—we forgot the ports.
We want to expose port 8888 inside the container as port 80 (the default web/http port) on the server:
- name: Run container
community.docker.docker_container:
name: superlists
image: superlists
state: started
recreate: true
env_file: ~/superlists.env
ports: 80:8888
That gets us to
selenium.common.exceptions.NoSuchElementException: Message: Unable to locate element: [id="id_list_table"]; [...]
Mounting the database on the server and running migrations
Taking a look at the logs from the server, we can see that the database is not initialised.
$ ssh elspeth@server docker logs superlists [...] django.db.utils.OperationalError: no such table: lists_list
$ ansible-playbook --user=elspeth -i staging.ottg.co.uk, infra/ansible-provision.yaml -v [...] TASK [Run migration inside container] * changed: [staging.ottg.co.uk] => {"changed": true, "rc": 0, "stderr": "", "stderr_lines": [], "stdout": "Operations to perform:\n Apply all migrations: auth, contenttypes, lists, sessions\nRunning migrations:\n Applying contenttypes.0001_initial... OK\n Applying contenttypes.0002_remove_content_type_name... OK\n Applying auth.0001_initial... OK\n Applying auth.0002_alter_permission_name_max_length... OK\n Applying [...] PLAY RECAP ** staging.ottg.co.uk : ok=9 changed=2 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
Here’s how
- name: Ensure db.sqlite3 file exists outside container
ansible.builtin.file:
path: /home/elspeth/db.sqlite3
state: touch (1)
- name: Run container
community.docker.docker_container:
name: superlists
image: superlists
state: started
recreate: true
env_file: ~/superlists.env
mounts: (2)
- type: bind
source: /home/elspeth/db.sqlite3
target: /src/db.sqlite3
ports: 80:8888
- name: Run migration inside container
community.docker.docker_container_exec: (3)
container: superlists
command: ./manage.py migrate
1 | We use file with state=touch to make sure a placeholder file exists
before we try and mount it in |
2 | Here is the mounts config, which works a lot like the --mount flag to
docker run . |
3 | And we use the API for docker exec to run the migration command inside
the container. |
It workssss
Hooray
$ TEST_SERVER=staging.ottg.co.uk python manage.py test functional_tests Found 3 test(s). [...] ... --------------------------------------------------------------------- Ran 3 tests in 13.537s OK
Deploying to Live
TODO update this
So, let’s try using it for our live site!
$ fab deploy:[email protected] Done. Disconnecting from [email protected]... done.
Brrp brrp brpp. You can see the script follows a slightly different path,
doing a git clone
to bring down a brand new repo instead of a git pull
.
It also needs to set up a new virtualenv from scratch, including a fresh
install of pip and Django. The collectstatic
actually creates new files this
time, and the migrate
seems to have worked too.
Git Tag the Release
One final bit of admin. In order to preserve a historical marker, we’ll use Git tags to mark the state of the codebase that reflects what’s currently live on the server:
$ git tag LIVE $ export TAG=$(date +DEPLOYED-%F/%H%M) # this generates a timestamp $ echo $TAG # should show "DEPLOYED-" and then the timestamp $ git tag $TAG $ git push origin LIVE $TAG # pushes the tags up
Now it’s easy, at any time, to check what the difference is between our current codebase and what’s live on the servers. This will come in useful in a few chapters, when we look at database migrations. Have a look at the tag in the history:
$ git log --graph --oneline --decorate [...]
Anyway, you now have a live website! Tell all your friends! Tell your mum, if no one else is interested! And, in the next chapter, it’s back to coding again.
Further Reading
There’s no such thing as the One True Way in deployment, and I’m no grizzled expert in any case. I’ve tried to set you off on a reasonably sane path, but there’s plenty of things you could do differently, and lots, lots more to learn besides.q Here are some resources I used for inspiration:
-
The 12-factor App by the Heroku team
-
Solid Python Deployments for Everybody by Hynek Schlawack
-
The deployment chapter of Two Scoops of Django by Dan Greenfeld and Audrey Roy
Comments