From d3a067886e21a6bf2d4cef1ac74a9d6be98c6ab4 Mon Sep 17 00:00:00 2001 From: Lumpiasty Date: Sat, 13 Jun 2026 02:19:16 +0200 Subject: [PATCH] coredns: fix ENOTFOUND for own zone, enable dns64 for IPv4 clients Two Corefile changes: - Add lumpiasty.xyz server block without dns64. Replaces the manual RouterOS static FWD entry (\"bypass nat64\") which returned NOERROR with empty answer instead of relaying NXDOMAIN. Combined with ndots:5 and pod search domains this made getaddrinfo stop at the search-suffixed candidate and fail with ENOTFOUND for valid names (kaneo -> authentik OAuth fetch failures). CoreDNS relays rcodes faithfully; internal zone keeps real AAAA for native IPv6. - Add allow_ipv4 to dns64 (previously uncommitted): without it only queries arriving over IPv6 are synthesized, but all clients reach CoreDNS via RouterOS over IPv4, so translate_all never applied. The RouterOS static FWD entry must be removed after deploying the new image - ansible already declares only the ts.net entry, so a playbook run handles it. --- ansible/roles/routeros/tasks/base.yml | 11 ++++---- ansible/roles/routeros/tasks/containers.yml | 2 +- ansible/roles/routeros/tasks/firewall.yml | 14 ++++++++++ docs/coredns-nat64.md | 31 +++++++++++++++++---- docs/network.md | 13 +++++++-- mikrotik/coredns/Corefile | 19 +++++++++++++ 6 files changed, 76 insertions(+), 14 deletions(-) diff --git a/ansible/roles/routeros/tasks/base.yml b/ansible/roles/routeros/tasks/base.yml index aa4917b..08438f4 100644 --- a/ansible/roles/routeros/tasks/base.yml +++ b/ansible/roles/routeros/tasks/base.yml @@ -192,11 +192,12 @@ forward-to: 100.100.100.100 match-subdomain: true comment: Tailscale MagicDNS - - name: lumpiasty.xyz - type: FWD - forward-to: 1.1.1.1 - match-subdomain: true - comment: lumpiasty.xyz bypass nat64 + # Do NOT add a lumpiasty.xyz FWD entry here. RouterOS FWD entries return + # NOERROR with an empty answer instead of relaying NXDOMAIN, which breaks + # getaddrinfo search-domain processing (ENOTFOUND for valid names in k8s + # pods). The DNS64 bypass for our own zone lives in the CoreDNS Corefile + # (mikrotik/coredns/Corefile, lumpiasty.xyz server block) which relays + # rcodes correctly. See docs/coredns-nat64.md pitfall #4. handle_absent_entries: remove handle_entries_content: remove_as_much_as_possible diff --git a/ansible/roles/routeros/tasks/containers.yml b/ansible/roles/routeros/tasks/containers.yml index c182e6d..6c60fc3 100644 --- a/ansible/roles/routeros/tasks/containers.yml +++ b/ansible/roles/routeros/tasks/containers.yml @@ -20,7 +20,7 @@ data: - dst: /var/lib/tailscale list: tailscale_state - src: tailscale/state + src: /tailscale/state handle_absent_entries: remove handle_entries_content: remove_as_much_as_possible diff --git a/ansible/roles/routeros/tasks/firewall.yml b/ansible/roles/routeros/tasks/firewall.yml index 29ebcc2..a0481d5 100644 --- a/ansible/roles/routeros/tasks/firewall.yml +++ b/ansible/roles/routeros/tasks/firewall.yml @@ -72,6 +72,15 @@ comment: Allow Tayga NAT64 pool to internet out-interface: pppoe-gpon src-address: 192.168.240.0/20 + # IPv6-only clients reaching internal services published on the public IP + # (e.g. authentik.lumpiasty.xyz -> 139.28.40.212 -> dst-nat -> 10.44.0.0/16) + # arrive from the Tayga pool after NAT64 translation. Without this rule + # they fall through to the final reject (hairpin via NAT64). + - action: accept + chain: forward + comment: Allow Tayga NAT64 pool to LoadBalancer (hairpin port forwards) + dst-address: 10.44.0.0/16 + src-address: 192.168.240.0/20 - action: jump chain: forward comment: Allow port forwards @@ -446,6 +455,11 @@ comment: Allow from IOT to internet only in-interface: vlan5 out-interface-list: wan + - action: accept + chain: forward + comment: Allow from SRV to internet via NAT64 + in-interface: vlan4 + out-interface: nat64 - action: accept chain: forward comment: Allow from IOT to internet via NAT64 diff --git a/docs/coredns-nat64.md b/docs/coredns-nat64.md index 593361f..615c130 100644 --- a/docs/coredns-nat64.md +++ b/docs/coredns-nat64.md @@ -67,19 +67,38 @@ Provides: - Static IPv6 route `64:ff9b::/96 → Tayga` - Masquerade of Tayga's IPv4 pool to WAN - PREF64 option in Router Advertisements (`/ipv6/nd pref64`) -- DHCP option 108 to signal IPv6-only preference to capable clients +- PREF64 + RDNSS options in Router Advertisements (per-interface `ipv6 nd` entries) +- DHCP option 108 to signal IPv6-only preference to capable clients (sent only when requested) ## Client behaviour with DHCPv4 option 108 +Option 108 and PREF64 work as a pair — deploying one without the other breaks clients: + +- **Option 108** (RFC 8925): tells capable clients to drop IPv4. RouterOS only sends it to clients that request code 108 in their Parameter Request List (that is what the `force` flag on the option controls — we leave it unset). Legacy clients never see it. +- **PREF64 in RA** (RFC 8781): tells the now IPv6-only client the NAT64 prefix so it can activate CLAT. Without PREF64, a client that honoured option 108 has no working translation and appears stuck "obtaining IP address". +- **RDNSS in RA** (RFC 8106): IPv6-only clients ignore DHCPv4 entirely, including its `dns-server`. They need an IPv6 DNS address from RA. We advertise the router's per-VLAN IPv6 address; RouterOS DNS forwards to CoreDNS. + | Client OS | Behaviour | |---|---| -| iOS 16+, macOS 13+ | Activates CLAT, drops IPv4, uses NAT64 for IPv4 literals | -| Android 10+ | Activates CLAT via PREF64, drops IPv4 | +| iOS 16+, macOS 13+ | Requests 108, drops IPv4, activates CLAT via PREF64 | +| Android 10+ | Requests 108, drops IPv4, activates CLAT via PREF64 | | Windows 11 (preview) | Partial — CLAT support in preview as of 2026 | -| Linux (NetworkManager) | DHCP option 108 honoured; CLAT via NM requires PREF64 | -| Legacy/unaware devices | Ignore option 108, receive IPv4 lease normally, continue dual-stack | +| Linux (NetworkManager) | Honours option 108; CLAT requires PREF64 | +| Legacy/unaware devices | Never request 108, receive IPv4 lease normally, dual-stack | -Option 108 value is a 32-bit seconds timer. Set to 28 (0x1c) for testing, 86400 for production. +Option 108 value is a 32-bit seconds timer (V6ONLY_WAIT, minimum 300 per RFC), refreshed on each DHCP renewal. We use 86400 (1 day) so a failed DNS64/NAT64 stack self-heals within a day by clients falling back to IPv4. + +### Deployment pitfalls (learned the hard way) + +Option 108 must never be deployed before the whole IPv6-only path works end to end. A client that honours it drops IPv4 immediately and depends on RA-provided PREF64 + RDNSS and a working NAT64. Each of these failure modes was hit in sequence, and every one presented identically on the phone ("stuck obtaining IP address" / "failed to connect"): + +1. **ND entries silently not created.** RouterOS ships only the `interface=all` default in `/ipv6/nd`. An `api_find_and_modify` task searching for `interface=vlan2` matches zero entries and silently succeeds (`require_matches_min` defaults to 0) — PREF64 was never advertised. Use `api_modify`, which creates missing entries. +2. **RDNSS pointing at a nonexistent address.** VLAN IPv6 addresses came `from-pool`, so the actual prefix was dynamic (`:0::/64`), while the ND `dns=` advertised the documented-but-wrong `:9::/64` router address. Fixed by switching VLANs to static addressing — the HE prefix is static, the pool indirection served no purpose. +3. **`advertise-dns=no` on new ND entries.** RouterOS creates per-interface ND entries with `advertise-dns=no`, which suppresses the RDNSS option entirely — even when a static `dns=` list is configured on the entry. Must be set to `yes` explicitly. + +4. **RouterOS static FWD entries corrupt NXDOMAIN.** A manually added `type=FWD match-subdomain=yes` entry for `lumpiasty.xyz` (intended to bypass DNS64 for our own zone) returned `NOERROR` with an empty answer for nonexistent subdomains instead of relaying NXDOMAIN. Combined with `ndots:5` and the `homelab-infra.lumpiasty.xyz` search domain in kubernetes pods, `getaddrinfo` received NODATA for the search-suffixed candidate (`authentik.lumpiasty.xyz.homelab-infra.lumpiasty.xyz`), concluded the name exists, stopped the search loop, and never tried the absolute name — apps failed with `ENOTFOUND` for perfectly valid hostnames while `nslookup` (absolute query) worked. The zone bypass now lives in the CoreDNS Corefile as a dedicated `lumpiasty.xyz:53` server block without `dns64`, which relays rcodes faithfully. RouterOS DNS does plain forwarding only; no FWD entries except Tailscale MagicDNS. + +Verification tooling: `rdisc6` (NixOS package `ndisc6`) shows the exact RA contents — RDNSS and PREF64 must both be present. When capturing DHCP in Wireshark, do not filter by client MAC: OFFER/ACK are sent to the broadcast MAC and disappear from the capture, hiding the server side of the exchange. When diagnosing DNS, the CoreDNS `log` plugin output is visible via `/log print` on the router (container `logging=yes`) and includes the rcode CoreDNS returned — comparing it with what the client received isolates which hop corrupts responses. Beware misleading test names: `*.example.com` legitimately returns NODATA upstream, making it useless for NXDOMAIN testing. ## CI/CD diff --git a/docs/network.md b/docs/network.md index 0be1607..d9fa81f 100644 --- a/docs/network.md +++ b/docs/network.md @@ -93,8 +93,12 @@ There are also networks, which are not VLANs, but are routed: Static assignment on CRS, access to factory IP of ONU - Containers on CRS
Access to every other network
- IP: 172.17.0.1/16, 2001:470:61a3:500::/64
- Static IP management + IP: 172.20.0.1/24, 2001:470:61a3:500::/64
+ Static IP management, hosts Tailscale and CoreDNS (DNS64) containers +- NAT64 link on CRS
+ Dedicated bridge for the Tayga NAT64 container
+ IP: 192.168.239.0/30, fc64::/126 (link), 192.168.240.0/20 (Tayga dynamic pool)
+ IPv6 traffic to 64:ff9b::/96 is routed here for translation to IPv4 Whole network is designed to eliminate VLANs, overlays where unnecessary to keep things simple. Only NAT rules are: @@ -103,8 +107,13 @@ Whole network is designed to eliminate VLANs, overlays where unnecessary to keep It doesn't have a gateway configured, we want to access it from other networks so we need to talk to it as if we were in the same subnet - src-nat tailscale IPv6 to internet
Tailscale assigns IPv6 from private subnet with no way to configure it, so the assigned IPs are not routable +- Masquerade Tayga NAT64 dynamic pool (192.168.240.0/20) via GPON PPPoE - IPv4 port forwards from GPON PPPoE to respective services +## IPv6-mostly (NAT64/DNS64) + +LAN (vlan2) and IoT (vlan5) are IPv6-mostly networks (RFC 8925): clients capable of IPv6-only operation receive DHCP option 108, drop their IPv4 address, and activate CLAT using the NAT64 prefix advertised via PREF64 in router advertisements. Legacy clients keep dual-stack. DNS64 (CoreDNS container, with `translate_all`) synthesizes 64:ff9b::/96 AAAA answers so all named traffic exits via NAT64 (Tayga container) on our IPv4 WAN — bypassing the HE tunnel for egress and avoiding datacenter-IP captcha flagging. See [CoreDNS DNS64 + NAT64 design](./coredns-nat64.md) for details and deployment pitfalls. + There is also an UPnP and NAT-PMP enabled to automatically configure port forwards from LAN. ## Uplink diff --git a/mikrotik/coredns/Corefile b/mikrotik/coredns/Corefile index 5d65c3a..f074abc 100644 --- a/mikrotik/coredns/Corefile +++ b/mikrotik/coredns/Corefile @@ -1,3 +1,22 @@ +# Our own zone bypasses DNS64: internal services have native IPv6 (LB pool +# routed via HE prefix), so clients should get real AAAA records and connect +# directly instead of hairpinning through NAT64. +# +# This MUST live here, not as a RouterOS static FWD entry: RouterOS FWD +# entries return NOERROR with an empty answer instead of relaying NXDOMAIN, +# which breaks getaddrinfo search-domain processing (resolver stops at the +# first NODATA search candidate and never tries the absolute name -> apps +# fail with ENOTFOUND for names that exist). +lumpiasty.xyz:53 { + forward . 1.1.1.1 8.8.8.8 { + prefer_udp + } + + cache 300 + errors + log +} + .:53 { # Synthesize AAAA from A records for all destinations. # translate_all: override real AAAA records too, so all traffic exits