If you run netboot.xyz as the PXE target on an Asuswrt-Merlin router, the official “netboot.xyz on Merlin” instructions are a trap, and they are a trap in the worst possible way: they work the day you set them up and detonate later. That page still tells you to curl a release binary into /jffs/tftproot with no autoexec.ipxe and no version discipline. The day you do a routine “just grab the latest”, three independent faults fire at once, and each one’s symptom looks like one of the others. Single-shot troubleshooting, human or AI, keeps confidently fixing the wrong layer, because every fix you try makes the visible symptom change instead of go away.
This guide is the build that routes around all three. The install was never the hard part; the diagnosis is. So the spine of this guide is the diagnosis, and the build steps hang off it. The end state is a self-built iPXE that boots straight into the live rolling netboot.xyz menu with signature verification deliberately turned on and the live signing CA baked into trust, so imgverify of that menu passes silently with no private key and no error bounce, and that survives a router reboot, with an atomic deploy you can roll back if it doesn’t.
The audience is narrow and badly served: Asuswrt-Merlin users doing local-network PXE, where the useful search results are close to zero and the problem is provably not one-shot solvable by a chat model (it was tried). One honest caveat up front, the same one the self-hosted MCP servers guide carries: this is reconstructed from a real working build and a real multi-hour debugging session, not yet re-run end to end from a clean stock-firmware fresh clone. Every command and fact below is from the working build; the gap is a from-scratch reproduction pass, not the result.
The Three Faults
Read this step before you touch anything. It is the map. The reason this problem eats afternoons is that the three faults alias: each one’s symptom resembles a different one’s cause, so fixing fault A surfaces fault B in a way that looks like you made things worse, and you revert the correct fix.
Fault 1: the rotated signing CA (verified boot becomes unreachable). netboot.xyz 3.0.0 rotated its code-signing certificate authority. The live menus at boot.netboot.xyz are now signed by the new CA. Signature verification in netboot.xyz is a build-time opt-in (sigs_enabled, off by default), but on a network boot path it is exactly the thing you should turn on: it is what stops a tampered menu from booting your machines. Turn it on with a stock 2.x bootloader and its imgverify of the live menu fails, because the bootloader trusts the old CA and the live menu is signed by the new one. iPXE bounces to an error and back to the menu, and the only way to boot anything is to flip verification back off. The naive escape, pinning to old 2.x release binaries that match old menus so verification passes, trades that for a frozen, stale menu that slowly rots as the live one moves on. So a stock bootloader forces a choice between an unverified boot and a frozen menu. Neither is a fix.
Fault 2: Secure Boot and autoexec.ipxe (the menu you never reach). netboot.xyz 3.0.1 added Secure Boot support and a separate autoexec.ipxe flow: the Secure Boot path generates an autoexec.ipxe, uses it for the signed ISO/USB images, and also ships it as a standalone release asset. This is a completely different failure surface from fault 1. It does not produce a signature error; it produces a boot that never reaches a menu at all. If you “fixed” fault 1 by grabbing the newest release and landed on the Secure Boot artifacts, you can conclude your fault 1 fix broke everything, when in fact you’ve just moved to an unrelated fault. (netboot.xyz’s own boot.cfg even force-disables signature checks under Secure Boot, because imgverify is unavailable on the upstream Secure Boot image. That is a strong hint that the clean, verifiable path is the non-Secure-Boot scheme, which is the one this guide builds.)
Fault 3: the iPXE EFI autoexec NIC hang (the red herring). Building your own current iPXE to escape faults 1 and 2 trips a regression in current iPXE master: on UEFI, iPXE runs an EFI autoexec network probe at startup, before any embedded script runs. With built-in NIC drivers that probe forces a NIC init that hangs on some UEFI machines. The freeze appears right after a harmless line that reads autoexec.ipxe not found, so every instinct says “it can’t find autoexec, that’s the bug.” It is not. That message is non-fatal by iPXE’s own design (efi_autoexec.c returns -ENOENT and continues). The hang is the NIC probe, not the missing file. Chasing the message is chasing the alias.
Hold all three in your head at once. The rest of the guide fixes them in an order where each fix is verifiable before the next, so the aliasing can’t bite you.
AI in this step
This is the canonical case where a second model earns its keep and also the canonical case where single-shot prompting fails, so use it deliberately. If you hand a model only the current symptom ("netboot.xyz errors back to the menu when I turn signature checks on"), it will fix fault 1 in isolation, you'll apply it, hit fault 2 or 3, and reasonably conclude the advice was wrong. The prompt that works gives the model all three faults from this step as the frame and asks it to keep the diagnosis attached to which fault each symptom belongs to. The value is not the individual fix; it's refusing to let the conversation collapse the three faults into one. This guide is itself an artifact of that failure mode: an earlier draft asserted that a silent boot proved the signature had been verified, when a silent boot of the default build actually proves only that verification was switched off. That is fault 1 aliasing as success, and it survived until a second model checked the claim against the build instead of the notes.
Prerequisites
This guide deliberately does not assume the hardened VPS base box the rest of this site’s guides build on. This is local-network work. What it assumes instead:
- An Asuswrt-Merlin router already doing PXE. You have JFFS enabled, a TFTP root (the conventional path is
/jffs/tftproot), and a dnsmasq PXE config (conventionally/jffs/configs/dnsmasq.conf.add) that already maps client architectures to a boot file and setsnext-serverto the router. The standard mapping is DHCP arch0(legacy BIOS) to a.kpxeand the EFI arches (2,6,7,8,9) to a.efi, withnext-serverpointed at the router’s LAN IP. Throughout, the router is shown as the Asuswrt default192.168.1.1and the admin account as the defaultadmin; substitute your own. This guide changes only the binary, never the dnsmasq config, which is the property that makes the deploy in Step 8 safe and reversible. - A build host with Docker. The bootloaders are built reproducibly in a container. Linux, macOS, or Windows all work; the Windows path mount has one wrinkle, called out in Step 6.
- A PXE client to test against, ideally both a BIOS and a UEFI one. Fault 3 is UEFI-only, so a UEFI client is the one that proves the hard part. Note that fault 3 is firmware and NIC dependent: the EFI autoexec probe hangs built-in-driver NIC init on some UEFI machines, not all. If yours doesn’t hang, you still want the Step 7 build, because you cannot tell in advance and the fix has no downside.
- The netboot.xyz source repo, cloned.
git clone https://github.com/netbootxyz/netboot.xyz. You build from this; you do not download release artifacts. That choice is the fix for faults 1 and 2.
Why the Official Update Path Breaks
Spend a step here because if you don’t internalize why, you will eventually “simplify” back to the broken path.
The official Merlin instructions predate the 3.x changes. They tell you to fetch a release .kpxe/.efi and drop it in the TFTP root. That worked under 2.x because a 2.x binary and the 2.x menus shared a CA. Two upstream changes broke that contract:
The 3.0.0 release notes say, in plain text, “BREAKING: Updated embedded certificates used for image signature verification.” That is fault 1. The live boot.netboot.xyz menu is now signed by a CA your old binary doesn’t trust. iPXE’s imgverify is working correctly when it refuses it; the trust anchor genuinely doesn’t match. Be precise about the symptom, because imprecision here is exactly what makes this problem alias: netboot.xyz ships with verification off by default, so a stock bootloader appears to “just work” precisely because it is checking nothing. The fault only surfaces when you do the responsible thing and turn verification on for a network boot path. Then a stale-CA bootloader cannot verify the live menu, bounces to an error, and your only way through is the in-menu toggle that switches verification back off. “Just turn the check off so it boots” trains you to wave through the one control that exists to stop a tampered menu, on the boot path for every machine on the LAN.
The instinctive escape, pinning to a matching old release so verification passes, is fault 1’s trap closing. The 2.x release binaries are version-pinned to stale menus. Verification succeeds because the binary and its frozen menu share the old CA, but you have quietly opted out of every operating system, kernel, and fix netboot.xyz has shipped since. The menu is supposed to roll; a pinned binary stops it rolling.
Reaching for the newest release instead lands you in fault 2: 3.0.1+ introduced the Secure Boot scheme with its generated autoexec.ipxe flow, a different boot path with its own failure modes that does not get you to a menu on a plain Merlin TFTP setup. So the official path has no good exit: old binary verifies nothing or freezes the menu; new binary changes the failure instead of removing it. The only path that removes the fault rather than relocating it is building your own, with verification turned on and the live CA trusted so that turning it on actually works.
The Build Approach
Here is the leverage, and it is the part the official instructions never mention: the netboot.xyz repo’s default build is already almost exactly what you want. Built from source with the default settings, it:
- Chains the live rolling menu (
boot_domain: boot.netboot.xyz, the 3.x menu version), so you get the menu that keeps moving instead of a frozen snapshot. That is fault 1’s stale-menu trap avoided structurally. - Ships no
autoexec.ipxeand no Secure Boot artifacts whengenerate_disks_securebootis left off, which is the default. That is fault 2 avoided structurally: you are building the old, simple.efi/.kpxescheme on purpose, against the current menu.
So the build approach is not “heavily patch netboot.xyz.” It is “build the default, make two deliberate changes, and serve the right EFI artifact.” The two changes: turn signature verification on (it is off by default, which is the whole reason fault 1 hides), and bake the live menu’s new CA into the build’s trust set so that turning verification on actually passes instead of erroring out (a from-source build otherwise trusts only iPXE’s own CA, not netboot.xyz’s). Both, together, without a private key and without signing anything yourself, are Step 5. Everything else is the stock build.
A word on the option you are not taking, so the choice is honest. You could leave verification off. That is the role’s literal default (sigs_enabled: false), and a self-built default-config binary then chains the live menu with no signature step to fail. It is the lowest-maintenance posture that exists. It is deliberately not this guide’s path, for two reasons. First, on a network boot path that boots every machine on the LAN, waving off the one control that detects a tampered menu is a poor trade for saving a rare rebuild. Second, “off” is not actually what the official route gives you anyway: netboot.xyz’s own release pipeline overrides the default (script/netbootxyz-overrides.yml sets sigs_enabled: true), so the curl-and-go release binaries verify. A release binary built before the rotation carries the old CA; once netboot.xyz re-signed the live menu under the new one, that binary’s verification began failing, which is the manual-disable-every-boot experience. It is not that every latest release misbehaves; it is that a verification-on binary whose baked CA has gone stale relative to the live menu cannot pass, and the official path gives you exactly such a binary. So the realistic choice is not “verify versus the easy unverified life.” It is “verify correctly with the right CA, or fight your own bootloader’s verification forever.” This guide takes the first. The unverified path remains available to anyone who wants it and accepts what it gives up; it is mentioned, not recommended.
Trusting the Live Menu Without a Private Key
This is the conceptual core. The goal: a self-built bootloader that has signature verification turned on and whose imgverify of the live boot.netboot.xyz menu then passes silently, with no signing infrastructure, no private key, and no error bounce. Fault 1, removed rather than switched off.
The key realization: signature verification only needs the public CA certificate. You are not signing anything; you are telling your bootloader which public CA to trust, the same trust the official 3.x binaries ship with. And that CA is public: it is embedded, in full chain, in the CMS signature bundle netboot.xyz serves for its own menu. You extract it from there. No private repository, no secret.
Be exact about what this is and is not, because the words invite confusion. “Signing with their cert” is not a thing you can do: a certificate is a public key, and signing the menu requires netboot.xyz’s private key, which is theirs alone and never published. What you do here is the verification half: trust their public CA so the menus they already signed verify on your build. There is a separate, real capability the guide deliberately ignores, generating your own signatures with your own key (generate_signatures: true, which the role default leaves false). That solves a different problem, signing a menu you host and control, and it reintroduces private-key management for no benefit here, because the menu you want is netboot.xyz’s live one, not yours. So: trust their CA, yes; sign anything, no; self-sign with your own key, deliberately out of scope.
Pull the live signature bundle and print the certificate chain out of it:
curl -sk https://boot.netboot.xyz/sigs/menu.ipxe.sig -o menu.ipxe.sig
openssl pkcs7 -inform DER -in menu.ipxe.sig -print_certs -out chain.pem
chain.pem now holds the full chain. The trust anchor you want is the self-signed root: the entry whose Subject equals its Issuer and whose basicConstraints is CA:TRUE. Save that certificate as roles/netbootxyz/files/certs/ca-netboot-xyz.crt inside your clone. Verify you extracted the right object before baking it into anything:
openssl x509 -in roles/netbootxyz/files/certs/ca-netboot-xyz.crt -noout -subject -issuer -fingerprint -sha256 -dates
You are checking that Subject and Issuer are both the netboot.xyz CA (self-signed), and recording the SHA-256 fingerprint and the validity window. Record the exact notBefore value: Step 9 is a clock failure that hangs entirely off it, and you want the precise date written down, not a vague memory. As a final sanity check, prove that this root actually signs the live menu. Do not run openssl verify against the whole chain.pem: it contains multiple certificates and the result depends on order and on which entry verify happens to treat as the target. Split it instead, isolate the leaf (the entry whose Subject is not its Issuer, the code-signing cert), and verify that leaf explicitly against the root you extracted, passing any intermediates as untrusted:
openssl crl2pkcs7 -nocrl -certfile chain.pem | openssl pkcs7 -print_certs -out certs.pem
Identify the code-signing leaf and any intermediate by inspecting certs.pem (openssl storeutl -noout -text -certs certs.pem lists them), save the leaf as leaf.pem and intermediates as intermediates.pem, then:
openssl verify -CAfile roles/netbootxyz/files/certs/ca-netboot-xyz.crt -untrusted intermediates.pem leaf.pem
leaf.pem: OK means the root you bundled is genuinely the CA that signs the live menu (this is the same check the bundled cert’s provenance README records). If netboot.xyz ever rotates the CA again, which is the only menu-verification event that makes this build stale, the symptom returns as the verification error bounce on a freshly built binary, and the fix is re-running exactly this extraction and rebuilding. There is no private material to lose and nothing to coordinate; that is the durable property of doing it this way instead of pasting a fixed certificate.
Now wire that CA into the build’s trust set. Three small edits to your clone, on the non-signing path (you are explicitly not enabling generate_signatures; that is a different feature requiring a private key, and you don’t want it):
In roles/netbootxyz/defaults/main.yml, add an opt-in flag near the other trust settings:
# Bake the public netboot.xyz CA into TRUST so imgverify against the live
# boot.netboot.xyz signed menus succeeds silently (no per-boot toggle).
# Independent of generate_signatures: no private key, no signing.
embed_netbootxyz_ca: true
In roles/netbootxyz/tasks/generate_disks_base.yml, install the bundled CA into the build’s cert dir on the non-signature path:
- name: Install bundled netboot.xyz CA trust anchor
ansible.builtin.copy:
src: "certs/{{ cert_file_filename }}"
dest: "{{ cert_dir }}/{{ cert_file_filename }}"
mode: "0644"
when:
- embed_netbootxyz_ca | default(false) | bool
- not (generate_signatures | default(false) | bool)
In both roles/netbootxyz/tasks/generate_disks_efi.yml and roles/netbootxyz/tasks/generate_disks_legacy.yml, append the netboot.xyz CA to trust_files on the not generate_signatures path. The stock line sets the trust file to the iPXE CA only:
- name: Set trust file to ipxe ca
ansible.builtin.set_fact:
trust_files: "{{ ([cert_dir + '/' + ipxe_ca_filename] + ([cert_dir + '/' + cert_file_filename] if (embed_netbootxyz_ca | default(false) | bool) else [])) | join(',') }}"
when: not generate_signatures
That is the entire patch: one opt-in default, one copy task, and the trust list extended in the two disk tasks. It keeps iPXE’s own CA and adds netboot.xyz’s, so a from-source build trusts the live menu exactly the way an official 3.x binary does, with no signing path involved.
The patch makes verification able to pass. It does not turn verification on; that is a separate, deliberate choice, and skipping it is how you would build a binary that trusts the right CA and then never checks anything. netboot.xyz gates every menu’s imgverify behind a single build-time variable, sigs_enabled, which defaults to false. The boot.cfg template renders it literally: set sigs_enabled {{ sigs_enabled | default(false) | bool | lower }}. Turn it on in user_overrides.yml (the same file the build already uses for overrides):
# Turn signature verification on in the built bootloader. Combined with the
# embedded netboot.xyz CA above, imgverify of the live menu now passes
# silently instead of erroring back to the menu. Still no private key.
sigs_enabled: true
With sigs_enabled: true and the CA patch, the built boot.cfg will contain set sigs_enabled true, every menu runs imgverify, and that verification passes against the live CA. With the CA patch but sigs_enabled left false, you have built a bootloader that trusts the right CA and then verifies nothing, which is the silent non-fix that an earlier version of this very guide shipped. Build both changes together or you have not closed fault 1, only hidden it differently.
AI in this step
The OpenSSL extraction is a good delegated task with one trap to guard against. "Pull the CA out of this CMS bundle" is mechanical and a model will write the pkcs7 -print_certs pipeline correctly. The trap is the model handing back the leaf or an intermediate instead of the self-signed root, because "the certificate in the signature" is ambiguous. Pin the ask: "from this chain, return only the entry where Subject == Issuer and basicConstraints is CA:TRUE, and show me the openssl command that proves the leaf verifies against it." That turns it from a fuzzy extraction into a checkable one, and the openssl verify ... OK is the check.
Building the Binaries
Build in the container so the toolchain is pinned and the result is reproducible. From the repo root:
docker build -t nbxyz-custom --platform=linux/amd64 -f Dockerfile .
docker run --rm -i --platform=linux/amd64 -v "$(pwd -W):/buildout" nbxyz-custom
That second command is written for Git Bash on Windows: $(pwd -W) is the Git Bash idiom that prints a Windows-style path Docker Desktop will accept. On Linux or macOS use -v "$(pwd):/buildout" instead; from PowerShell use -v "${PWD}:/buildout". Get the mount wrong and the build runs but you find no artifacts on the host, because they were written inside a container that’s already gone.
Build speed is the only thing the build’s job count touches. The repo default is a single make job (make_num_jobs: 1 in user_overrides.yml); raising it (for example to 8) only makes the iPXE compile finish sooner and changes nothing about the output. It is a convenience, not part of the fix.
When it finishes, the artifacts land in buildout/ipxe/. The two that matter for a Merlin PXE setup:
| Artifact | Client firmware | Notes |
|---|---|---|
netboot.xyz.kpxe |
Legacy BIOS (DHCP arch 0) | Unaffected by fault 3; the EFI autoexec path doesn’t exist on BIOS. |
netboot.xyz-snponly.efi |
UEFI (EFI arches) | This is the one to serve for UEFI, not netboot.xyz.efi. Step 7 is why. |
You will also see netboot.xyz.efi, netboot.xyz-snp.efi, and a pile of .dsk/.lkrn/.iso-shaped files. For a TFTP PXE setup on Merlin you care about exactly the two in the table. Resist serving netboot.xyz.efi just because its name matches what the dnsmasq config points at; you rename in Step 8, you don’t pick by filename here.
Before you deploy anything, confirm Step 5 actually took. The build emits the rendered config as buildout/boot.cfg; it must contain set sigs_enabled true, not set sigs_enabled false. If it still says false, the sigs_enabled: true override did not apply and you are about to deploy the silent non-fix: a binary that trusts the right CA and verifies nothing. That one-line check is the difference between “fault 1 fixed” and “fault 1 hidden,” so do not skip it.
The snponly-for-UEFI Requirement
This is fault 3, and it is the single least-documented fact in this whole problem. It is also the one most likely to make you revert a correct build, because the symptom frames itself as a different bug.
iPXE master runs an EFI autoexec network probe at startup, before any embedded script, before the menu, before anything you control. The build that uses iPXE’s built-in NIC drivers (netboot.xyz.efi) forces that probe to initialize the NIC with iPXE’s own driver. On some UEFI machines that NIC init hangs hard: the screen sits forever right after the line autoexec.ipxe not found, with no menu and no banner.
Everything about that presentation lies to you. The visible text says it can’t find autoexec.ipxe, so the entire internet’s worth of instinct says “the problem is the missing autoexec file.” It is not. autoexec.ipxe not found is non-fatal by iPXE’s own design: efi_autoexec.c returns -ENOENT and execution continues past it deliberately. The freeze is the NIC probe that runs in the same neighborhood, not the message. Spend your time making an autoexec.ipxe appear and you have fixed nothing and confirmed the wrong theory, because the message stays either way.
The fix is to serve the snponly.efi build for UEFI, never ipxe.efi/netboot.xyz.efi. snponly.efi uses the firmware’s own Simple Network Protocol (SNP) instead of iPXE’s built-in driver, so the startup NIC init that hangs simply does not run. This is not a hack; it is what netboot.xyz itself does. Its own boot.cfg picks the artifact by platform: iseq ${platform} efi && set ipxe_disk netboot.xyz-snponly.efi || set ipxe_disk netboot.xyz-undionly.kpxe, that is, snponly for EFI and the undionly build for BIOS. You are matching upstream’s own choice, not inventing one.
BIOS is unaffected: there is no EFI autoexec path on legacy boot, so netboot.xyz.kpxe is fine as-is. Fault 3 is strictly the UEFI artifact selection, and the entire fix is “serve the snponly EFI.”
AI in this step
This is the textbook "fixing the wrong layer" trap, and it's worth handing to a model precisely as a test of whether your framing survives. Describe only the symptom ("UEFI PXE freezes right after autoexec.ipxe not found, no menu") and a weak model run will send you down the autoexec-file path, because the message is a magnet. The prompt that gets the right answer supplies the constraint that the message is non-fatal in iPXE by design and asks what else runs around EFI startup that would hang a built-in-driver NIC. Use it as a check on your own understanding: if the model's answer doesn't end at SNP versus built-in drivers, the framing was too thin, not the model.
Deploying to the Router
Never overwrite the working binary in place. The deploy ritual exists so that a bad build cannot take your PXE boot down with no way back. It changes only the binary file; it does not touch dnsmasq, so there is no service to restart and no config that can be left half-edited.
The pattern, for each binary (the .kpxe for BIOS, the renamed snponly.efi for UEFI), from your build host. Upload to a temporary name first so the live file is never mid-write:
scp buildout/ipxe/netboot.xyz.kpxe admin@192.168.1.1:/jffs/tftproot/netboot.xyz.kpxe.new
scp buildout/ipxe/netboot.xyz-snponly.efi admin@192.168.1.1:/jffs/tftproot/netboot.xyz.efi.new
On Windows, PowerShell’s native scp works as-is on one line; WinSCP is the GUI equivalent if you prefer. Note the EFI upload is renamed to netboot.xyz.efi.new: the dnsmasq config points the EFI arches at netboot.xyz.efi, so the snponly build takes that name on the router. That rename is the entire mechanism by which Step 7’s fix reaches UEFI clients; nothing in dnsmasq changes.
SSH to the router and verify the upload arrived intact before you let it become live. Compare the SHA-256 the build host computed against the router’s:
ssh admin@192.168.1.1 "sha256sum /jffs/tftproot/netboot.xyz.kpxe.new /jffs/tftproot/netboot.xyz.efi.new"
Match those against buildout/ipxe/netboot.xyz-sha256-checksums.txt from the build, and mind the rename when you do. The router’s netboot.xyz.kpxe.new must match the netboot.xyz.kpxe line in that file. The router’s netboot.xyz.efi.new must match the netboot.xyz-snponly.efi line, not the netboot.xyz.efi line, because the EFI file you uploaded is the snponly build wearing the .efi name (Step 7). Checking it against the wrong line is how you would “verify” a correct upload as corrupt, or worse, accept the wrong artifact. Only once both match their correct source line, back up the currently-working binaries and swap atomically with mv (a rename on the same filesystem is atomic; a client TFTP-fetching mid-swap gets either the whole old file or the whole new one, never a torn read):
ssh admin@192.168.1.1 "cd /jffs/tftproot && cp -a netboot.xyz.kpxe netboot.xyz.kpxe.bak && cp -a netboot.xyz.efi netboot.xyz.efi.bak && mv netboot.xyz.kpxe.new netboot.xyz.kpxe && mv netboot.xyz.efi.new netboot.xyz.efi"
No dnsmasq edit. No service restart_dnsmasq. The next PXE client simply fetches the new file. Rollback is the same move in reverse and is the reason the .bak exists:
ssh admin@192.168.1.1 "cd /jffs/tftproot && mv netboot.xyz.kpxe.bak netboot.xyz.kpxe && mv netboot.xyz.efi.bak netboot.xyz.efi"
If a build ever boots wrong, you are one command from the last known-good binary, and you never had a window where the file was absent or partial.
The Clock Caveat
One more failure mode, and it is here near the end on purpose, because it aliases onto fault 1 and will send you back to re-extracting a CA that was already correct.
iPXE checks certificate validity windows. The netboot.xyz CA you bundled has a notBefore date: for the CA current as of this writing it is Nov 17 05:32:48 2025 GMT (this is the value you recorded in Step 5; it moves whenever netboot.xyz rotates the CA, so trust your recorded value over this sentence if they differ). If a PXE client boots with its firmware clock set earlier than that date, iPXE rejects the CA as not-yet-valid and imgverify fails, even though the build is completely correct. The symptom is the same error bounce as fault 1, so the instinct is “my CA is wrong again.” It isn’t; your clock is.
The chain of which clock matters: at boot, iPXE uses the machine’s firmware RTC, then netboot.xyz’s menu attempts an ntp correction (ntp 0.pool.ntp.org or similar). If that NTP step can’t reach a time server (no route yet, or UDP 123 blocked on the network), the firmware RTC stands. So the clock that decides this is iPXE’s at boot time, not the booted operating system’s clock. This is why, in the original debugging, “fixing the OS clock” did nothing for the earlier failure: the OS clock is downstream of the moment that matters.
Keep two distinct failures separate, because they present almost identically and only one is a clock problem:
- Trust-anchor mismatch (old CA in the binary versus new-CA-signed live menu). This is fault 1. Fixing the clock does nothing. The fix is the rebuilt binary from Step 5.
- Validity-window rejection (correct CA, client clock before the CA’s
notBefore). The build is fine. The fix is the client’s clock, or letting NTP reach a server.
There is a third tell that points at the clock specifically: a wrong clock also breaks the HTTPS handshake to boot.netboot.xyz (a TLS certificate is invalid outside its own validity window too), which surfaces as iPXE printing something like HTTPS failed, attempting HTTP and falling back, not as a verification error. If you see the HTTP-fallback line, suspect the clock before you suspect the CA. A verification error with HTTPS working is more likely a genuine trust-anchor problem; a verification error alongside HTTPS failing is the clock wearing fault 1’s mask.
Verifying
Verification here is not “did it boot.” It is “did each fault stay fixed, and if not, which one came back.” The single most important point, the one this guide got wrong in an earlier draft: a silent boot does not prove the signature was verified. A default netboot.xyz build boots just as silently because it verifies nothing. Silence is only meaningful once you have first proved verification is switched on. So the ladder starts there.
- Verification is actually on (do this first, it makes the rest meaningful). You already confirmed
buildout/boot.cfgshowsset sigs_enabled truein Step 6. The runtime tell is in the menu itself: netboot.xyz only renders the signature-check toggle item when the build has sigs enabled, and it shows the live state, reading roughlynetboot.xyz [ enabled: true ]. If that item is present and readstrue, verification is on and every menu chain is going throughimgverify. If the item is absent, you are running a sigs-off build and nothing below proves anything; go back to Step 5. - BIOS client (
netboot.xyz.kpxe). With verification confirmed on, it boots straight to the live netboot.xyz menu with no error bounce. BIOS never had fault 3, so a clean verified BIOS boot isolates fault 1 (the CA trust) from the UEFI complications. If it errors back to the menu instead: fault 1 (stale CA) or the clock (Step 9), not fault 3. - UEFI client (
netboot.xyz-snponly.efi, served asnetboot.xyz.efi). Same verified live menu, no hang. Theautoexec.ipxe not foundline may flash by; that is the harmless red herring from Step 7 and is expected, not a failure. What matters is that it continues past it to the menu instead of freezing on it. imgverifyactually ran and passed. With the toggle item readingenabled: true(rung 1) and the menu chaining cleanly with no error return (rungs 2 and 3),imgverifyran against the live menu, found the bundled netboot.xyz CA, and accepted it. That is the real success condition, and it is only a success condition because rung 1 established verification was on. A build that “works” by shipping sigs off, or by having you toggle them off to get past an error, has not fixed fault 1; it has hidden it, which is precisely the trap this guide itself fell into before review.
The failure-to-cause map, so a regression names its own fault instead of restarting the whole investigation:
| What you see | Which fault | Where to look |
|---|---|---|
| Menu chains fine but the sig-check toggle item is absent | Not fixed, hidden: sigs off | buildout/boot.cfg says false; set sigs_enabled: true (Step 5/6) |
| Sigs on, but boot errors back to the menu, HTTPS fine | Fault 1: CA trust | Rebuild with Step 5; confirm the leaf verifies against the bundled CA |
Same error, but HTTPS failed, attempting HTTP first |
Clock, masking as fault 1 | Step 9: client firmware RTC versus the CA notBefore |
Frozen right after autoexec.ipxe not found, UEFI only |
Fault 3: wrong EFI artifact | You served netboot.xyz.efi; serve snponly.efi (Step 7/8) |
| Stale menu, missing recent OSes, no errors | Fault 1’s trap: pinned binary | You are running an old release, not a from-source rolling build (Step 4) |
| Boot never reaches a menu, no verification error at all | Fault 2: Secure Boot / autoexec scheme | You grabbed a 3.0.1+ Secure Boot artifact; build the non-SB scheme from source |
If every row of that table is something you can now diagnose in one read instead of one afternoon, the guide did its job. The build is a means; the map is the deliverable.
Keeping It Working
Lead with the part that matters most, because it is the answer to the obvious worry and it is unambiguous: new menus never require an update. Nothing about the menu is baked into your binary. The build chains boot.netboot.xyz live at every boot. New operating systems, new menu entries, upstream menu fixes: all of it arrives on its own, with zero rebuilds for menu and OS updates, indefinitely, on the verified-with-their-CA path you built (the rare CA rotation covered below is the single exception, and it is not a menu update). “Keeping up with the menu” is not a maintenance task here; it is the structural payoff of building against the rolling menu instead of pinned release binaries. If you do nothing at all for years, the menu still stays current. That is the design working as intended, not luck.
There is exactly one event that ever forces a rebuild, and it is rare: netboot.xyz rotating its signing CA. The bundled CA is long-lived (the current one is valid into 2035) and rotations are project decisions announced in the CHANGELOG, not a schedule, not an expiry you have to track. The trigger needs no calendar: a previously working sigs-on binary you did not change starts erroring back to the menu on verify. Rule out the clock first (Step 9), a clock-driven validity failure looks similar but is fixed at the client, not by rebuilding. With the clock excluded, the tell that it is a true rotation is that a rebuild from your current cert does not fix it, because the cert itself is now stale against the re-signed live menu. That is the rotation, and only that sends you to the runbook below; most readers will go years without seeing one.
The refresh, when it does happen, is deliberately small, and the single most important property is that it does not involve upstream at all. You do not git pull. You do not rebase. The Ansible patch logic never changes across a rotation; only the contents of one certificate file change. Pulling upstream has no menu benefit (the menu is already live) and a real cost (a newer iPXE can re-introduce fault 3), so the correct posture is to stay on your known-good pinned build and touch nothing but the cert. The runbook:
- Regenerate the CA into the same path. Re-run the Step 5 extraction, writing over the existing
roles/netbootxyz/files/certs/ca-netboot-xyz.crtin your existing clone. Re-verify it with the Step 5opensslleaf check so you know the new cert genuinely signs the live menu before you trust it. Record the new fingerprint andnotBeforenext to the old, the same provenance discipline as the first time. - Rebuild the existing branch. Run the Step 6 Docker build unchanged. No upstream fetch, no rebase, no flag changes;
sigs_enabled: trueand the patch are already in your tree. Confirmbuildout/boot.cfgstill saysset sigs_enabled true, exactly as in Step 6. - Deploy to the router. Run the Step 8 ritual verbatim: upload to
.new, sha256-verify on the router against the new build’s checksums (minding the snponly-to-.efiname mapping), back up the working binary, atomicmv. Rollback stays the restored.bak. Because nothing but the binary changes, the router side is identical to the first deploy and just as reversible.
That is the entire lifetime maintenance of this build: nothing, ever, for menus; and on the rare CA rotation, regenerate one file and rerun two procedures you already have, with upstream untouched.
One optional lever reduces even that rare event to a non-event, if you want it. Instead of treating ca-netboot-xyz.crt as a hand-maintained bundled file, regenerate it from the public menu.ipxe.sig bundle as the first action of every build (a small pre-build step or an added Ansible task that runs the Step 5 curl plus openssl extraction before the trust files are assembled). Then a rotation stops being a distinct task at all: any rebuild you ever do already trusts the then-current CA, and step 1 of the runbook above disappears. The tradeoff, and the reason this guide’s default keeps the static file, is real: you take on a build-time network dependency and you trade a static, reviewed, provenance-pinned certificate for one fetched live at build time and implicitly trusted. For an automated fleet that rebuilds often, the lever is worth it. For a single router you rebuild rarely and want to be able to audit, the static file is the better default. Pick deliberately; both are correct, they optimize for different things.
What’s Next
Deliberately out of scope, in rough order of how likely you are to want it:
- Secure Boot on purpose. This guide builds the simple non-Secure-Boot scheme deliberately, because it is the path that removes faults 1 and 2 with the least surface. If you specifically need Secure Boot on the PXE client, that is a different build (
generate_disks_secureboot) with its own artifact scheme and its own failure modes, and it is its own guide, not a flag to flip casually here. - A second router or a non-Merlin firmware. The trust-and-snponly logic is firmware-agnostic; the deploy ritual in Step 8 is Merlin-specific (JFFS paths, dnsmasq.conf.add). On OpenWrt or a dedicated firewall the what is identical and only the where changes.
- Fully scripting the refresh. Step 11 gives the refresh as a short manual runbook. Wrapping it into one idempotent script (CA regenerate, rebuild, sha256 gate, atomic swap) is reasonable once you have run it by hand enough times to trust each step; automate the verified ritual, never an unverified one.
The throughline, if you keep one thing: the install was never the problem. Three faults that alias onto each other were, and the only durable fix is the one that removes a fault instead of relocating its symptom or switching the symptom off. Build from source against the live menu (kills fault 2 and fault 1’s stale trap), turn verification on and bake the public CA into trust with no private key so it actually passes instead of erroring out (kills fault 1 without hiding it), serve snponly on UEFI (kills fault 3), and keep the clock in mind so it can’t wear fault 1’s mask. The sharpest lesson here is the one the guide learned about itself under review: “it boots quietly” is not “it is verified,” and only checking the build instead of the story tells them apart. Get the map from Step 1 right and every later step is mechanical. Get it wrong and no single correct fix will look like it worked.
Toolkit Reference
What shows up across this guide, and the spots where a second model genuinely saves an afternoon rather than a minute.
Tools and Sources
- netboot.xyz
- The boot menu itself and the source repository you build from. Building from source against the live menu is what removes fault 2 and fault 1's stale-menu trap structurally.
- netboot.xyz repo
- The Ansible/Jinja project that generates the bootloaders. The default build is almost the whole solution; the only addition is the CA trust patch in Step 5.
- iPXE
- The bootloader under netboot.xyz. Its EFI autoexec network probe at startup is fault 3;
snponly.efisidesteps it by using firmware SNP instead of built-in NIC drivers. - Asuswrt-Merlin
- The router firmware serving the binary over TFTP/dnsmasq. JFFS for
/jffs/tftprootanddnsmasq.conf.add; the deploy never edits dnsmasq, only the binary. - Docker
- Pinned, reproducible build environment for the bootloaders. The Windows path-mount idiom (
$(pwd -W)) is the one cross-platform wrinkle. - OpenSSL
- Extracts the live netboot.xyz code-signing CA from the public
menu.ipxe.sigCMS bundle, and proves it withopenssl verify. No private key is ever involved.
Where AI Earns Its Keep
- Holding all three faults at once
- The whole problem is that single-shot prompting fixes one fault and surfaces another. The win is framing the model with all three faults from Step 1 so it keeps each symptom attached to its own cause instead of collapsing them.
- The CA extraction, pinned
- Mechanical for a model to write, with one trap: returning the leaf or intermediate instead of the self-signed root. Pin the ask to "Subject == Issuer, CA:TRUE" and demand the
openssl verify ... OKas the check. - The snponly red herring as a framing test
- Describe only the symptom and a weak run chases the missing autoexec file. Supply the "non-fatal by design" constraint and a good answer ends at SNP versus built-in drivers. Use it to check your own framing, not just the model's.