VPS Security Foundations — create.borecraft.com

Every self-hosted app guide on this site starts with “harden the box first.” Most of them then either gloss the details or copy-paste the same paragraph about UFW and PasswordAuthentication no. This is the long-form version. The output is a base box: a fresh Debian 13 VPS taken to a state where the only thing the public internet sees is a closed door, SSH is reachable only over Tailscale, Docker is installed in a way that doesn’t quietly bypass the firewall, and an Ansible playbook can rebuild it from scratch when (not if) you need to.

Future app guides on this site (DocuSeal, Documenso, Immich, Vaultwarden, Paperless-ngx, etc.) start by pointing here. Once you have the base box, dropping a docker-compose.yml plus a Cloudflare Tunnel sidecar onto it is straightforward. The hard work is the foundation, and the foundation is the same regardless of what app you stack on top.

Step 1

Threat Model and End State

The threat model fits in one paragraph: the box has zero public inbound ports after setup. The provider firewall (Hetzner Cloud Firewall in the example, but equivalents on any provider) is default-deny inbound. The host firewall (UFW) is default-deny inbound, default-allow outbound, with one explicit allow for SSH on the tailscale0 interface. SSH is reachable only over Tailscale; password auth is off, root login is off. Apps that need to be reachable from the internet do so by initiating outbound connections (Cloudflare Tunnel runs cloudflared outbound on TCP/UDP port 7844), not by exposing a port. The remaining attack surface is the running daemons themselves (sshd over Tailscale, Docker, any cloudflared sidecar an app installs), the application code, and any credentials that leave the host.

End state checklist, which the rest of the guide produces:

Provider firewall default-deny inbound, no steady-state allow rules.
UFW default-deny inbound, default-allow outbound, allow tailscale0:22. No public-interface allow rules.
OpenSSH listening on all interfaces but reachable only over Tailscale (firewalls do the access control).
Password auth and root login disabled.
Unattended security upgrades enabled.
Tailscale joined to your tailnet, tagged appropriately.
Docker CE from the official repo. Operator account in the docker group.
rclone, restic, age, lynis installed for backups and audit.
Curated sysctl hardening drop-in surviving boot.
Kernel module blacklist for protocols you don’t use.

What you explicitly do not have at the end: a deployed app, a configured Cloudflare Tunnel, an SMTP provider, a database, or any backup destinations. Those are downstream concerns. This guide is the chassis; the apps go on top.

A note on Tailscale connectivity: zero-public-inbound means Tailscale may use DERP relays instead of direct peer-to-peer in some NAT scenarios. That’s fine for SSH and admin work since latency stays low and the throughput requirements are tiny. Readers who want better direct connectivity can optionally allow inbound UDP 41641 in the provider firewall (Tailscale’s direct connection port), but that breaks the zero-public-inbound baseline and is therefore not the default.

Step 2

Choose and Buy the VPS

Default recommendation: Hetzner CPX11 in a US region (Ashburn or Hillsboro), $6.99/month, 2 vCPU, 2 GB RAM, 40 GB NVMe, 20 TB included egress. Hetzner sells out of specific regions periodically, so have a fallback in mind (Falkenstein, Helsinki, Singapore are commonly available). For European workloads, the EU-region CX22 lands around $5/month for similar specs.

Equivalents at the same tier: DigitalOcean Basic 2 GB ($12/month), Vultr Cloud Compute 2 GB ($10/month), Linode/Akamai Nanode 2 GB ($12/month), OVH VPS Starter (~$7/month). Anything 2 GB RAM with 40 GB NVMe is fine for a base box plus a small compose stack.

Buying checklist before you click Provision:

Provider account 2FA enabled.
SSH key uploaded to the project. The matching private key on the operator machine you’ll connect from.
Debian 13 (trixie) selected deliberately. This guide targets Debian 13; Ubuntu 24.04 LTS works with minor tweaks but the playbook later assumes Debian.
IPv6 stance decided: configured and firewalled (default), or explicitly disabled. Hetzner shows IPv6 as a /64 prefix; the actual host address is normally ::1 within that prefix.
Snapshots-vs-backups understood. Snapshots are point-in-time disk images you create on demand and pay storage for. Backups are an automated paid feature. The base box doesn’t need either; the apps you put on top will.
Rescue path verified before you need it. Hetzner’s Rescue System and Web Console are the two recovery channels covered later in Step 12.

Step 3

Provider Firewall and Bootstrap SSH

Before the box exists, create a Hetzner Cloud Firewall with one inbound rule: TCP/22 from your current public IP only. This is the temporary admin path until Tailscale is up.

Three things to watch:

Hetzner ships default rules. New firewalls may come with a default-allow SSH from 0.0.0.0/0 and ICMP from anywhere. Don’t assume “default-deny inbound + your one allow rule.” Delete every default rule before saving, then add only your bootstrap SSH-from-WAN-IP rule.

Verify the firewall is actually attached after provision. “I selected the firewall in the create form” is not the same as “the firewall is attached and enforcing.” After the server reaches Running, check the Firewalls list view: the firewall’s “Applied to” count must include the new server. If it doesn’t, attach manually. The cause of any mismatch is less important than catching it.

Run an external-IP enforcement test on both stacks. From a network other than the bootstrap-allowed IP (phone hotspot, a friend’s network, a cloud shell), run:

ssh -o ConnectTimeout=10 root@<public-ipv4>
ssh -6 -o ConnectTimeout=10 root@<public-ipv6>

Both should return Connection timed out. If you can’t reach the box from an IPv6-capable outside network, either find one that is (a cloud shell or a known-IPv6-capable VM) or disable IPv6 on the box until you can validate it. This is the only test that proves the firewall is enforcing what the UI claims, on the stack you actually have. Skipping it means a “protected” box may be globally reachable without you knowing, and “globally reachable on IPv6 only” is the easiest version of that to miss.

Step 4

First-Boot Baseline

SSH in as root via the bootstrap path. The first pass writes the things every later step depends on.

Timezone: timedatectl set-timezone UTC. Server best practice. Log timestamps are unambiguous, no DST glitches, every distributed system expects UTC.

Swap file: 2 GB at /swapfile for a CPX11 (2 GB RAM). Enough headroom for a memory spike without giving the kernel rope to thrash. Mode 600, persisted in /etc/fstab.

Base packages: curl gnupg ca-certificates git sqlite3 jq needrestart debsums. needrestart reports services that need restarting after apt upgrade (complements unattended-upgrades). debsums does on-demand package-integrity checks. Set NEEDRESTART_MODE=l NEEDRESTART_SUSPEND=1 in your shell before running apt upgrade if you want a fully non-interactive flow; DEBIAN_FRONTEND=noninteractive alone does not suppress needrestart’s whiptail prompts.

Verify a trixie-security suite is configured before enabling unattended-upgrades. Debian 13 ships sources in deb822 format (.sources files in /etc/apt/sources.list.d/); a fresh Hetzner image has no legacy /etc/apt/sources.list at all. Run grep -rh 'trixie-security' /etc/apt/sources.list /etc/apt/sources.list.d/ 2>/dev/null; you should get at least one match. A misconfigured image produces “enabled” auto-updates that silently pull from nothing.

Blacklist unused network protocol kernel modules: dccp, sctp, rds, tipc. None are touched by Docker, Tailscale, Cloudflare Tunnel, or any planned app stack. Drop a file at /etc/modprobe.d/hardening-blacklist.conf using the stricter install <mod> /bin/true form (blocks both auto-load and manual modprobe) rather than the weaker blacklist <mod> (only blocks auto-load).

Curated sysctl hardening drop-in at /etc/sysctl.d/60-hardening.conf, loaded with sysctl --system. Docker-and-Tailscale-safe values:

fs.protected_fifos = 2
dev.tty.ldisc_autoload = 0
kernel.kptr_restrict = 2
kernel.sysrq = 4
kernel.yama.ptrace_scope = 1
net.core.bpf_jit_harden = 2
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

rp_filter = 1 (strict) is fine for a plain Tailscale node receiving SSH and originating outbound. If the box is later promoted to a Tailscale subnet router or exit node, drop both all and default to loose mode (=2): subnet routers handle asymmetric paths that strict reverse-path filtering will silently drop. bpf_jit_harden = 2 is fine for default Docker bridge networking; hosts running heavy eBPF dataplanes (Cilium, Calico in eBPF mode) take a measurable performance hit and would want =1 instead. Notably not set: net.ipv4.ip_forward is left at its default because Docker flips it to =1 for container networking and forcing it to =0 breaks containers; host-level forwarding policy belongs in the provider firewall and UFW, not sysctl.

Set UMASK 027 in /etc/login.defs. pam_umask reads this value, so newly created files get 640 and directories 750 by default. Single source of truth; no need to also edit /etc/profile. The Hetzner Debian 13 image ships with the UMASK directive entirely absent (no commented placeholder), so use append-or-replace logic rather than a simple sed-replace.

Legal banner files at /etc/issue (local console) and /etc/issue.net (pre-auth network banner). Default text along the lines of “Authorized access only. All activity is monitored and logged.” sshd’s later drop-in (Step 7) sets Banner /etc/issue.net so the banner actually displays on connection; without that directive the file is decoration.

Enable unattended-upgrades last. On a stock Hetzner Debian 13 image, unattended-upgrades is already installed and /etc/apt/apt.conf.d/20auto-upgrades is preconfigured with both Update-Package-Lists "1" and Unattended-Upgrade "1". Verify rather than configure on Hetzner. The default 50unattended-upgrades Origins-Pattern enables only Debian-Security (the ${distro_codename}-updates line is commented out), so general stable bug-fix updates are NOT auto-applied; only security is. Worth knowing; flip the line if you want broader auto-updates.

How AI can help

This step is mostly typing and following a checklist, which is the highest-leverage place for an AI assistant. Hand it the bullet list above and ask for a single shell script that does everything in order with a verification command at the end. Cross-check the script against a second model before running it on a real box. Have it explain the rp_filter caveat, the IPT_SYSCTL trap that comes up in Step 8, and the difference between blacklist and install /bin/true. The script writes itself; the value is in the second model catching the one-line bug that would otherwise lock you out.

Step 5

Create the Deploy User

Single sudo-capable operator account named deploy. Authorized keys for whichever machines you’ll SSH from (your laptop, your WSL controller for the playbook, etc.). Drop a sudoers fragment at /etc/sudoers.d/deploy with deploy ALL=(ALL) NOPASSWD:ALL (mode 0440, validated with visudo -cf). Confirm ssh deploy@<public-ip> works before anything else changes, especially before you start rotating sshd config.

For kept boxes (not throwaway test VMs): also set a login password on the deploy account (passwd deploy) and store it in a password manager. The login password unlocks the Hetzner Web Console as a recovery fallback if both Tailscale and SSH fail; without it, the console can’t be used and you’re down to the rescue-system path. NOPASSWD sudo stays unchanged; the login password is for console-tty/getty auth, not sudo.

Step 6

Install Tailscale

Install via the official apt repo. (Tailscale is the one place on Debian 13 still shipping a one-line legacy .list source rather than deb822; leave it as Tailscale ships.)

curl -fsSL https://pkgs.tailscale.com/stable/debian/trixie.noarmor.gpg | sudo tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
curl -fsSL https://pkgs.tailscale.com/stable/debian/trixie.tailscale-keyring.list | sudo tee /etc/apt/sources.list.d/tailscale.list
sudo apt update && sudo apt install -y tailscale

Generate an auth key in the admin console with the right shape:

Throwaway test VMs: ephemeral auth key (auto-removes the node from the tailnet when offline). Reusable off, shortest expiry, tagged.
Kept boxes: non-ephemeral auth key (node persists across reboots and downtime). Reusable off, shortest expiry, tagged. After the node authenticates, immediately disable key expiry in the admin console (Machines → device → ⋯ → Disable key expiry). Default 180-day Tailscale node-key expiry is designed for user devices that re-auth via SSO; on a server with no human re-auth path, expiry silently drops the box off the tailnet, locking the operator out. Disable expiry; replace “rotate on a timer” with “revoke explicitly if compromised” plus tag-based ACLs as the compensating control.

Bring the box up:

sudo tailscale up --authkey=tskey-auth-... --advertise-tags=tag:server --ssh=false

--ssh=false is deliberate. This guide uses standard OpenSSH reachable only over Tailscale, not Tailscale SSH (a separate mechanism that replaces parts of the OpenSSH key workflow and requires Tailscale access-policy entries). Tailscale SSH is mentioned at the end of this guide as a “want to go further” option, not the documented default.

Important: The tag you advertise must exist in your tailnet ACL tagOwners block before this command will accept it. tag:server here is an example name; pick whatever fits your tailnet conventions (tag:vps, tag:prod-host, tag:homelab, etc.) and adjust both the tailscale up command and the ACL JSON to match. Whatever name you pick, define it in the ACL editor like "tagOwners": { "tag:<your-name>": ["autogroup:admin"] }, plus an ACL rule allowing autogroup:member to reach tag:<your-name>:* (tagged devices are not members and don’t inherit member-to-member access). Save before generating the auth key.

Step 7

Move SSH to Tailscale-Only

Verify SSH-over-Tailscale works from a separate device. Then drop the public-IP allow rule in the provider firewall. UFW (next step) handles the second layer.

Re-test the public-IP path immediately after removing the bootstrap rule. Do this before UFW comes up, so a failure points at the provider firewall (the only thing supposed to be blocking) rather than at UFW after the fact:

ssh -o ConnectTimeout=10 deploy@<public-ipv4>
ssh -6 -o ConnectTimeout=10 deploy@<public-ipv6>

Both should time out. If either connects, the provider firewall isn’t actually enforcing what the UI claims (re-check the Applied-to count and the rule list before continuing). Catching that here, with UFW still off, makes the failure mode obvious; if you wait until after UFW is up to test, a successful connection might be UFW failing and the provider firewall failing, and you can’t tell which.

Disable password auth and root login. The cleaner path is a drop-in at /etc/ssh/sshd_config.d/60-hardening.conf rather than editing the main sshd_config directly. Drop-ins survive Debian package upgrades cleanly and keep diffs reviewable:

PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3

AllowTcpForwarding no
AllowAgentForwarding no
X11Forwarding no

MaxSessions 2
ClientAliveInterval 300
ClientAliveCountMax 2
TCPKeepAlive no

LogLevel VERBOSE

Banner /etc/issue.net

A few notes. ClientAliveInterval 300 + ClientAliveCountMax 2 produces a ~10-minute idle disconnect using SSH-protocol-level keepalives. MaxAuthTries 3 reduces brute-force log noise. AllowTcpForwarding no and AllowAgentForwarding no close lateral-movement paths if a key is ever compromised; ad-hoc port forwarding for debugging can be re-enabled per-key via authorized_keys options when actually needed. LogLevel VERBOSE records the key fingerprint used on each successful auth, which is the most useful field for post-incident forensics.

sshd -t validates the config without restarting; run it before systemctl reload ssh. Existing connections survive a reload (forked child processes keep going); new connections during the reload window fail until the daemon is back up. Worth knowing: this guide leaves SSH listening on all interfaces and lets the firewalls do the access control. Binding sshd directly to the Tailscale IP is optional advanced hardening covered in the closing notes. The Tailscale interface may not exist when sshd starts on boot, and a bad ListenAddress produces a host that’s reachable only via the rescue console until you fix it. Default is “let the firewalls deny.”

Step 8

UFW

Default-deny inbound, default-allow outbound, one explicit allow for SSH on the Tailscale interface:

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw default deny routed
sudo ufw allow in on tailscale0 to any port 22 proto tcp

The hard step before ufw enable: blank UFW’s sysctl management. UFW ships with IPT_SYSCTL=/etc/ufw/sysctl.conf in /etc/default/ufw, which causes UFW to apply its own sysctl values when the service starts, after systemd-sysctl has already applied the curated drop-in from Step 4. The bad part is the silent override and the loss of a single source of truth, not any one specific value. In the manual test run that produced this guide, the most obvious symptom was log_martians reverting to 0 every boot; UFW’s file also imposes its own redirect-related defaults. Either way, the curated drop-in is no longer authoritative once UFW takes over. Fix:

sudo sed -i 's|^IPT_SYSCTL=.*|IPT_SYSCTL=|' /etc/default/ufw

After this, /etc/sysctl.d/60-hardening.conf is the single source of truth for kernel tunables on this box. Then enable:

sudo ufw enable

Spot-check the values you care about after boot completes, not just after sysctl --system returns successfully:

sudo sysctl net.ipv4.conf.all.log_martians net.ipv4.conf.all.send_redirects net.ipv4.conf.all.rp_filter

Should return 1 / 0 / 1.

Important: ufw enable may drop your existing public-IP SSH session despite UFW’s RELATED,ESTABLISHED allow rule. The iptables-flush window during ufw enable can drop in-flight packets and break the conntrack state of an idle session. Have the Tailscale path verified working in another window before running ufw enable. After UFW activation, public-IP SSH is unreachable: that is the desired final state, not a malfunction.

Step 9

Install Docker Safely

Docker CE from the official apt repo. Add the deploy user to the docker group. Hard rule: docker group membership is root-equivalent, per Docker’s own post-install docs. The deploy user is the operator; nobody else gets docker group membership.

The bigger rule that bites every self-hosted Docker box at some point: Docker bypasses UFW for published ports. Container ports published as 0.0.0.0:PORT:PORT are reachable on the public interface even with UFW set to default-deny, because Docker’s iptables rules sit upstream of UFW. The defense is the binding pattern, not the firewall:

# Wrong: reachable on public interface, bypasses UFW
ports:
  - "8080:80"

# Right: reachable only on loopback. Public surfaces go through a tunnel.
ports:
  - "127.0.0.1:8080:80"

The provider firewall is the safety net for accidental public publishes; the binding pattern is the primary defense. App guides on this site uniformly bind to loopback and front public surfaces with a Cloudflare Tunnel sidecar in the same compose file. The tunnel container reaches the app over the compose network; nothing crosses the host’s network namespace boundary.

Validation commands worth running after Docker is installed and before any app stack lands (check both stacks; people forget IPv6):

sudo ss -tulpn -4
sudo ss -tulpn -6
sudo docker ps --format 'table {{.Names}}\t{{.Ports}}'
sudo ufw status verbose

Nothing on 0.0.0.0:* or [::]:* for ports you don’t intend public. UFW status shows your single tailscale0:22 allow.

Step 10

Backup and Audit Tooling

Install rclone, restic, age, and lynis from apt. No remotes or repositories configured here; those are per-app concerns and live in the app guides that actually need them.

sudo apt install -y rclone restic age lynis

lynis and sysctl install to /usr/sbin/, which is not in a regular user’s PATH on Debian. Always invoke as root: sudo lynis audit system. Capture a baseline once and keep the file for drift comparison on future audits:

sudo lynis audit system --quiet --no-colors > /var/log/lynis-baseline.dat

The two commands you’ll run periodically as drift checks: sudo lynis audit system (compare its hardening index to the baseline) and sudo debsums -c (flags packages whose installed files don’t match the package’s expected checksums). Neither is automated by default; debsums has a high false-positive rate on conffiles, and lynis runs as a one-shot, not a daemon.

Step 11

Automating with Ansible

Doing all of the above once teaches you what each piece does and why. Doing it twice is when you write the playbook. Doing it three times manually is the moment you’ll regret not having written the playbook the second time.

The playbook’s shape, in build order (each role gated on the prior working):

baseline: timezone, swap, base packages, security-repo verification, kernel module blacklist, sysctl drop-in, UMASK, banners, unattended-upgrades.
deploy_user: operator account, sudoers fragment, authorized_keys. Optional login password gated behind a vault variable for kept boxes (Hetzner Web Console fallback). The password value sits in group_vars/all/vault.yml (encrypted at rest by Ansible Vault) and the role hashes it via password_hash('sha512') before applying. Plaintext passwords never touch disk in the repo, and a vault leak still requires the vault password to read. NOPASSWD sudo is unaffected.
tailscale: install via apt, idempotent tailscale up (skip if already authed), advertise the configured tag.
firewall (UFW): install, blank IPT_SYSCTL, set defaults, allow tailscale0:22, enable, then verify the baseline sysctl values survived.
ssh_hardening: wraps geerlingguy.security for the role-managed bits, lays down the sshd 60-hardening.conf drop-in, restarts sshd via handler.
docker: geerlingguy.docker for Docker CE from the official repo, deploy user added to the docker group.
backup_tools: install rclone, restic, age. No remotes.
audit: install lynis, capture the baseline using creates: so the task is a no-op on idempotency runs.
provider_firewall (optional, last): drive the Hetzner firewall via the hetzner.hcloud collection: ensure steady-state empty inbound rules, attach to the server, run external SSH probes against both the public IPv4 and IPv6 addresses and fail if either succeeds. Last on purpose; it’s the easiest place to lock yourself out.

Build order matters. The order above is also the run order during initial build, with two manual moments deliberately separating the lockout-prone steps:

Bootstrap firewall created manually with one rule (SSH from operator IP). Not in the playbook.
Run baseline and deploy_user.
Run tailscale, then confirm SSH-over-Tailscale works from a separate device. This is the gate.
Manually remove the bootstrap allow rule from the provider firewall.
Run firewall (UFW). Public-IP SSH path is now closed.
Run ssh_hardening, docker, backup_tools, audit.
Run provider_firewall last with an external SSH-from-controller probe to prove enforcement.

Variables and secrets layout: non-secret defaults in group_vars/all/main.yml, encrypted secrets (Tailscale auth key, hcloud API token, optional deploy login password) in group_vars/all/vault.yml via Ansible Vault, per-host overrides (provider firewall server name and public IP) in host_vars/<inventory_hostname>.yml. Pin external roles and collections in requirements.yml to specific versions, not floating tags; re-pin during quarterly review.

A few practical notes from building this:

update_cache: true in apt is silently a no-op under --check. Subsequent install tasks then fail to find packages on a fresh image. Drop --check for the first apt run on a new box; idempotency runs against an already-converged box are fine.
Multi-step swap creation breaks --check. The fallocate “would-create” doesn’t actually create the file in check mode, and any later task that tries to chmod or mkswap the still-absent file fails. Collapse allocate/chmod/mkswap/swapon into one shell task with creates: /swapfile.
group_vars/vault.yml (top-level) only loads for a literal group named vault. If you want the variables available to every host, put them in group_vars/all/vault.yml (directory-style group_vars).
pre_tasks need tags: always to survive role-tagged runs. Otherwise --tags <role> skips your safety asserts.

How AI can help

AI gets you to a working playbook shape quickly: hand it your manual setup notes, this guide, and the role outline above, and it'll pick the right modules over hand-rolled command: calls (ansible.posix.mount for fstab, community.general.ufw for firewall rules, ansible.builtin.apt for packages, hetzner.hcloud.firewall for the provider firewall) and structure variables sensibly. What makes the result safe, though, is the validation gates, not the model: ansible-lint on the generated code, --check --diff against a throwaway test box, real convergence, then a second full run expecting changed=0 as the idempotency proof. Each gate catches a different class of bug. The manual run that produced this guide caught a removed callback plugin, an apt-cache check-mode no-op, a multi-step swap creation that broke under --check, group_vars scoping subtleties, and a few Galaxy role version pins that didn't actually exist on first try. The model had a confident answer for each before the gate caught it. Trust the gates, not the confidence.

Step 12

Validation and Lockout Recovery

Three checkpoints prove the box is actually ready:

End-to-end validation against a fresh box. Provision a CPX11, run the playbook end to end, run the validation commands from Step 9 (ss -tulpn -4/-6, docker ps, ufw status verbose), run lynis audit system for a baseline score. Total real-money cost: well under a dollar for the entire dev loop assuming you destroy the box afterward.

Idempotency check. Re-run the playbook against the same converged box. Expect changed=0 (modulo the audit role, scoped per the rules above so its baseline-capture task uses creates: and doesn’t re-run). Anything that reports changed on a second run is a non-idempotent task to fix.

Lockout-recovery rehearsal. This is the one most people skip and live to regret. The recovery path on a Tailscale-only box if both Tailscale and the web console fail is the Hetzner Rescue System (or your provider’s equivalent: DigitalOcean’s Recovery Console, Vultr’s View Console, Linode’s Lish). Rehearse it once before treating the box as production-ready:

Take a Hetzner snapshot first as insurance (~€0.01/GB/month, deletable after the rehearsal).
Add a temporary edge SSH allow rule for your current WAN IP (Tailscale doesn’t run in rescue mode; the public-IP path needs to be temporarily open). Or use the web console for the rescue session: no firewall change but text-mode VNC.
In Hetzner Cloud Console, activate rescue and reboot in one action (“Enable Rescue & Power Cycle”). Activating without reboot does nothing; the box only enters rescue on the next boot, and Hetzner makes this easy to miss.
SSH into rescue with host-key-bypass flags (the host key changes since rescue boots a different OS): ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@<public-ipv4>.
Mount the disk and edit something verifiable: mount /dev/sda1 /mnt, append a benign comment to /mnt/etc/motd, umount /mnt.
Power-cycle from Cloud Console, which boots back to normal Debian (rescue is one-shot per activation).
Reconnect via Tailscale, verify the rescue-side edit persisted, revert it, remove the temporary edge SSH rule, verify edge is closed again with external SSH attempts that time out on both stacks (ssh -o ConnectTimeout=10 root@<public-ipv4> and ssh -6 -o ConnectTimeout=10 root@<public-ipv6>).

The point of rehearsing is not to memorize the steps. It’s to know, before you need it, that you can mount the disk, persist a real edit, and recover normal access. That’s the actual recovery path on a Tailscale-only box if both Tailscale and the web console are unreachable simultaneously.

A common but incorrect recovery story: “if Tailscale dies, just re-add a public-IP SSH allow rule to the Hetzner firewall and reconnect over the public IP.” This does not work in this guide’s final state, because UFW also denies public-interface SSH. Re-adding the Hetzner edge rule lets packets reach the host but UFW still drops them, and there’s no way to relax UFW without already being able to reach the box. The rescue system is the canonical answer.

For kept boxes specifically: the Hetzner Web Console is a faster intermediate fallback if you’ve set the deploy login password (Step 5). The console gives you a browser-based VNC into the running OS, no SSH involved. Without a deploy login password, it’s unusable since PermitRootLogin no blocks the only other login.

Step 13

The Downstream App Contract

Every future app guide that lands on this base box inherits a contract. The contract exists because the box’s security posture is fragile to careless app deployments. Docker bypasses UFW, “fixing” connectivity by relaxing the firewall undoes the whole guide, and pinned-and-forgotten images become frozen vulnerable images.

Apps must:

Bind Docker ports to 127.0.0.1 unless the app is intentionally public via a tunnel.
Install their own cloudflared as a sidecar if they need a public surface.
Encrypt offsite backups before they leave the box.
Restore-test before relying on backups.
Pin all images to specific versions (no :latest).
Document the update path for those pinned images: how to bump versions, how to back up first, how to roll back. A pinned-and-forgotten stack becomes a frozen vulnerable stack.

Apps must not:

Re-open public SSH on the provider firewall or UFW.
Publish container ports on 0.0.0.0 (or [::]).
Disable UFW or relax provider firewall rules to “fix” connectivity.
Store secrets unencrypted in the repo.
Skip restore testing.
Ship a guide that pins images without an update procedure.

The provider firewall and UFW combined are the safety net. The compose-file binding pattern is the primary defense. App guides on this site enforce both.

What’s Next

Optional follow-on work, deliberately not in this guide:

CrowdSec as a fail2ban replacement with shared threat intel. Reasonable next layer for a box that gets noisy enough to want behavioral defense. Likely a separate guide.
AIDE for file-integrity monitoring beyond debsums. Needs careful exclusion rules on a Docker host (/var/lib/docker churns constantly and will drown a default AIDE config in noise).
auditd for kernel-level audit events. Useful but needs scoped rules, not the stock RHEL-style ruleset.
Log shipping: Loki + Vector or equivalent. Worth it once you’re running more than one box.
Binding sshd to the Tailscale IP as advanced hardening. Requires a systemd drop-in (After=tailscaled.service), RestartSec tuning, and a recovery rehearsal on top of Step 12’s. The Tailscale interface may not exist when sshd starts on boot; a bad ListenAddress produces a host that’s reachable only via the rescue console.

Intentionally not included: rkhunter and chkrootkit (signature-based with high false-positive rates and ongoing database curation that doesn’t match the “set up and walk away” posture), fail2ban (the threat it defends against, public-internet SSH brute-force, is gone in this guide’s final state since public scanners can’t reach sshd at all), GRUB password (an attacker who reaches the provider’s hypervisor console already has paths around it; the control matters on bare metal, not on a cloud VPS).

Once the base box is up, every other guide on this site starts at “follow the VPS Hardening guide first; come back here when your box is at the end-state checklist.” The hardening is the load-bearing prerequisite. Everything downstream assumes it.

Toolkit Reference

Tools and services that show up across this guide, plus the spots where AI saves real time.

Tools and Services

Hetzner Cloud: $6.99/month CPX11 VPS recommendation. Equivalents at DigitalOcean, Vultr, Linode, OVH.
Tailscale: Private mesh network for SSH and admin access. Free tier up to 100 devices.
UFW: Host-side firewall. Default-deny inbound plus one tailscale0 allow.
Docker CE: Container runtime. Installed from the official apt repo, deploy user in the docker group.
rclone + restic + age: Backup tooling. Installed but not configured here; per-app concern.
lynis: Security audit tool. Captures a baseline once for drift comparison.
Ansible: Optional infrastructure-as-code layer. Captures the full hardening flow as a replayable playbook with vault-encrypted secrets.
geerlingguy.security + geerlingguy.docker: Battle-tested community roles for SSH config and Docker install. Pin to specific versions in requirements.yml.
hetzner.hcloud: Ansible collection for driving Hetzner Cloud Firewall via API. Optional, last role.

Where AI Earns Its Keep

Initial hardening script: Turning the Step 4 checklist into a single shell script with verification at the end. Cross-check against a second model before running.
Ansible playbook authoring: The biggest leverage point. Hand the AI this guide and the role outline; it produces idempotent, lint-clean playbook code in one or two passes.
UFW + Docker + Tailscale interaction: The IPT_SYSCTL trap, the Docker-bypasses-UFW behavior, and the rp_filter / subnet-router caveat are exactly the kind of subtle interactions an AI catches in review. Ask it to audit your config before you enable UFW.
Lockout recovery rehearsal scripting: Walking through the rescue-system flow, generating the host-key-bypass SSH command, writing a verification script that confirms post-recovery state matches expected.