Get rid of NAT64 setup
This commit is contained in:
@@ -1,124 +0,0 @@
|
||||
# CoreDNS DNS64 + NAT64 — design and implementation
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the RouterOS built-in DNS forwarder with CoreDNS and implement IPv6-mostly networking (RFC 8925) using DNS64 + NAT64, allowing clients to phase out IPv4 while maintaining full connectivity to IPv4-only destinations.
|
||||
|
||||
## Background
|
||||
|
||||
The network uses Hurricane Electric as an IPv6 tunnel broker (`2001:470:61a3::/48`). HE assigns addresses from datacenter IP ranges, causing some websites to serve endless CAPTCHAs or flag connections as bot traffic. IPv6-mostly solves this differently: capable clients prefer IPv6 natively, and IPv4-only destinations are reached through NAT64 — using our own IPv4 WAN address rather than HE's, avoiding the datacenter flagging problem for those destinations.
|
||||
|
||||
## How it works
|
||||
|
||||
```
|
||||
IPv6-only client CoreDNS (DNS64) NAT64 (Tayga)
|
||||
│ │ │
|
||||
│── AAAA? example.com ──────────▶│ │
|
||||
│ │── A? example.com ─────────▶ upstream
|
||||
│ │◀── 93.184.216.34 ──────────│
|
||||
│◀── 64:ff9b::5db8:d822 ─────────│ (synthesized AAAA) │
|
||||
│ │ │
|
||||
│── TCP SYN to 64:ff9b::5db8:d822 ──────────────────────────▶│
|
||||
│ │ (RouterOS routes │
|
||||
│ │ 64:ff9b::/96 │
|
||||
│ │ to Tayga) │
|
||||
│ │ │── TCP SYN to 93.184.216.34
|
||||
│ │ │◀─ TCP SYN-ACK
|
||||
│◀── TCP SYN-ACK (translated) ───────────────────────────────│
|
||||
```
|
||||
|
||||
For all destinations — including sites with real AAAA records — DNS64 overrides the response with a synthesized `64:ff9b::/96` address. All traffic routes through Tayga and exits on our own IPv4 WAN address, bypassing the HE tunnel broker. This eliminates the datacenter IP flagging and CAPTCHA loops that HE addresses trigger on some sites.
|
||||
|
||||
## Components
|
||||
|
||||
### CoreDNS (custom build)
|
||||
|
||||
Built from source with 7 plugins instead of the default ~40, reducing the compressed image from ~20 MB to ~6-8 MB. This matters for fitting on the CRS internal flash.
|
||||
|
||||
Plugin set: `errors`, `log`, `health`, `cache`, `dns64`, `forward`, `reload`.
|
||||
|
||||
Plugin order in `plugin.cfg` determines execution order. `dns64` must come before `forward` so it can intercept AAAA responses from upstream rather than letting `forward` return them directly to the client.
|
||||
|
||||
Source: [`mikrotik/coredns/`](../mikrotik/coredns/)
|
||||
|
||||
The `dns64` plugin is built into CoreDNS — no external plugin needed. It performs the A→AAAA synthesis using the well-known prefix `64:ff9b::/96` (RFC 6052).
|
||||
|
||||
`translate_all` and `allow_ipv4` are both set. Without `allow_ipv4`, the plugin only intercepts queries arriving over IPv6 — dual-stack clients querying CoreDNS over IPv4 (the common case, since the router forwards DNS via IPv4) would receive real AAAA records and use the HE tunnel instead of NAT64.
|
||||
|
||||
| Client type | AAAA query handling | A query handling |
|
||||
|---|---|---|
|
||||
| IPv6-only (CLAT) | synthesized `64:ff9b::` → NAT64 path | not asked; client has no IPv4 stack |
|
||||
| Dual-stack (no CLAT) | synthesized `64:ff9b::` → NAT64 path | `forward` returns real A → client uses IPv4 directly |
|
||||
| IPv4-only (no IPv6) | synthesized `64:ff9b::` → client ignores it (no IPv6 stack), uses A record | `forward` returns real A → client uses IPv4 directly |
|
||||
|
||||
IPv4-only clients receive synthesized AAAA records but their stack cannot use them — they fall back to A records normally. No breakage.
|
||||
|
||||
### Tayga (NAT64)
|
||||
|
||||
Stateless IP/ICMP translation (SIIT, RFC 7915). Receives IPv6 packets for `64:ff9b::/96`, strips the prefix to get the IPv4 destination, rewrites the packet headers, and routes it out as IPv4. Return traffic gets the inverse translation.
|
||||
|
||||
RouterOS does not implement NAT64 natively (confirmed in official docs). The approach described in some blog posts of writing per-destination `/ipv6 firewall nat dst-nat` rules is not real NAT64 — it is static port forwarding and requires manually enumerating every destination.
|
||||
|
||||
Official image: `ghcr.io/apalrd/tayga` — no custom build needed.
|
||||
|
||||
### RouterOS
|
||||
|
||||
Provides:
|
||||
- Static IPv6 route `64:ff9b::/96 → Tayga`
|
||||
- Masquerade of Tayga's IPv4 pool to WAN
|
||||
- PREF64 option in Router Advertisements (`/ipv6/nd pref64`)
|
||||
- PREF64 + RDNSS options in Router Advertisements (per-interface `ipv6 nd` entries)
|
||||
- DHCP option 108 to signal IPv6-only preference to capable clients (sent only when requested)
|
||||
|
||||
## Client behaviour with DHCPv4 option 108
|
||||
|
||||
Option 108 and PREF64 work as a pair — deploying one without the other breaks clients:
|
||||
|
||||
- **Option 108** (RFC 8925): tells capable clients to drop IPv4. RouterOS only sends it to clients that request code 108 in their Parameter Request List (that is what the `force` flag on the option controls — we leave it unset). Legacy clients never see it.
|
||||
- **PREF64 in RA** (RFC 8781): tells the now IPv6-only client the NAT64 prefix so it can activate CLAT. Without PREF64, a client that honoured option 108 has no working translation and appears stuck "obtaining IP address".
|
||||
- **RDNSS in RA** (RFC 8106): IPv6-only clients ignore DHCPv4 entirely, including its `dns-server`. They need an IPv6 DNS address from RA. We advertise the router's per-VLAN IPv6 address; RouterOS DNS forwards to CoreDNS.
|
||||
|
||||
| Client OS | Behaviour |
|
||||
|---|---|
|
||||
| iOS 16+, macOS 13+ | Requests 108, drops IPv4, activates CLAT via PREF64 |
|
||||
| Android 10+ | Requests 108, drops IPv4, activates CLAT via PREF64 |
|
||||
| Windows 11 (preview) | Partial — CLAT support in preview as of 2026 |
|
||||
| Linux (NetworkManager) | Honours option 108; CLAT requires PREF64 |
|
||||
| Legacy/unaware devices | Never request 108, receive IPv4 lease normally, dual-stack |
|
||||
|
||||
Option 108 value is a 32-bit seconds timer (V6ONLY_WAIT, minimum 300 per RFC), refreshed on each DHCP renewal. We use 86400 (1 day) so a failed DNS64/NAT64 stack self-heals within a day by clients falling back to IPv4.
|
||||
|
||||
### Deployment pitfalls (learned the hard way)
|
||||
|
||||
Option 108 must never be deployed before the whole IPv6-only path works end to end. A client that honours it drops IPv4 immediately and depends on RA-provided PREF64 + RDNSS and a working NAT64. Each of these failure modes was hit in sequence, and every one presented identically on the phone ("stuck obtaining IP address" / "failed to connect"):
|
||||
|
||||
1. **ND entries silently not created.** RouterOS ships only the `interface=all` default in `/ipv6/nd`. An `api_find_and_modify` task searching for `interface=vlan2` matches zero entries and silently succeeds (`require_matches_min` defaults to 0) — PREF64 was never advertised. Use `api_modify`, which creates missing entries.
|
||||
2. **RDNSS pointing at a nonexistent address.** VLAN IPv6 addresses came `from-pool`, so the actual prefix was dynamic (`:0::/64`), while the ND `dns=` advertised the documented-but-wrong `:9::/64` router address. Fixed by switching VLANs to static addressing — the HE prefix is static, the pool indirection served no purpose.
|
||||
3. **`advertise-dns=no` on new ND entries.** RouterOS creates per-interface ND entries with `advertise-dns=no`, which suppresses the RDNSS option entirely — even when a static `dns=` list is configured on the entry. Must be set to `yes` explicitly.
|
||||
|
||||
4. **RouterOS static FWD entries corrupt NXDOMAIN.** A manually added `type=FWD match-subdomain=yes` entry for `lumpiasty.xyz` (intended to bypass DNS64 for our own zone) returned `NOERROR` with an empty answer for nonexistent subdomains instead of relaying NXDOMAIN. Combined with `ndots:5` and the `homelab-infra.lumpiasty.xyz` search domain in kubernetes pods, `getaddrinfo` received NODATA for the search-suffixed candidate (`authentik.lumpiasty.xyz.homelab-infra.lumpiasty.xyz`), concluded the name exists, stopped the search loop, and never tried the absolute name — apps failed with `ENOTFOUND` for perfectly valid hostnames while `nslookup` (absolute query) worked. The zone bypass now lives in the CoreDNS Corefile as a dedicated `lumpiasty.xyz:53` server block without `dns64`, which relays rcodes faithfully. RouterOS DNS does plain forwarding only; no FWD entries except Tailscale MagicDNS.
|
||||
|
||||
Verification tooling: `rdisc6` (NixOS package `ndisc6`) shows the exact RA contents — RDNSS and PREF64 must both be present. When capturing DHCP in Wireshark, do not filter by client MAC: OFFER/ACK are sent to the broadcast MAC and disappear from the capture, hiding the server side of the exchange. When diagnosing DNS, the CoreDNS `log` plugin output is visible via `/log print` on the router (container `logging=yes`) and includes the rcode CoreDNS returned — comparing it with what the client received isolates which hop corrupts responses. Beware misleading test names: `*.example.com` legitimately returns NODATA upstream, making it useless for NXDOMAIN testing.
|
||||
|
||||
## CI/CD
|
||||
|
||||
The Woodpecker pipeline at [`.woodpecker/coredns-build.yaml`](../.woodpecker/coredns-build.yaml) triggers on any push that touches `mikrotik/coredns/**`. It:
|
||||
|
||||
1. Authenticates to OpenBao using the shared Renovate AppRole (`renovate_role_id` / `renovate_secret_id` Woodpecker secrets)
|
||||
2. Fetches registry credentials from the `container-registry` KV secret (`REGISTRY_USERNAME` / `REGISTRY_PASSWORD`)
|
||||
3. Builds the `linux/arm64` image using `docker buildx`
|
||||
4. Pushes `latest` and a short-SHA tag to `gitea.lumpiasty.xyz/<owner>/coredns-mikrotik`
|
||||
5. Revokes the OpenBao token
|
||||
|
||||
To update the CoreDNS version: change the `--branch` argument in the Dockerfile `git clone` line.
|
||||
|
||||
## RouterOS deployment
|
||||
|
||||
See [`mikrotik/README.md`](../mikrotik/README.md) for the full set of RouterOS commands.
|
||||
|
||||
## Known limitations
|
||||
|
||||
- **DNSSEC**: The `dns64` plugin does not validate DNSSEC on synthesized responses (upstream bug noted in the plugin docs). If DNSSEC is required, run a validating resolver upstream and disable synthesis for signed zones.
|
||||
- **IPv4 literals**: Applications using hardcoded IPv4 addresses (e.g. `connect("1.2.3.4")`) cannot use DNS64. CLAT on the client handles this for capable OSes; legacy apps on non-CLAT clients will fail on IPv6-only VLANs.
|
||||
- **Native IPv6 bypassed**: `translate_all` means no traffic uses native IPv6 directly — everything goes through Tayga. This is intentional; it trades native IPv6 performance for a consistent exit IP. If native IPv6 is ever desired for specific destinations, remove `translate_all` and handle the HE captcha problem differently (e.g. per-domain exceptions).
|
||||
- **IPv6-only destinations (no A record)**: With `translate_all`, the plugin still attempts an A lookup for every AAAA query. If no A record exists, `Synthesize` produces a NOERROR with an empty answer — the real AAAA is discarded. Confirmed by reading the source: `responseShouldDNS64` returns `true` unconditionally when `TranslateAll` is set (except NXDOMAIN), and `Synthesize` only converts A records — anything without an A record yields an empty answer. In practice this only affects genuinely IPv6-only destinations with no A record, which is rare on the public internet today.
|
||||
+110
@@ -0,0 +1,110 @@
|
||||
# CoreDNS resolver
|
||||
|
||||
## Goal
|
||||
|
||||
Replace the RouterOS built-in DNS forwarder with a CoreDNS container for
|
||||
configurability, and suppress IPv6 (AAAA) resolution by default to keep traffic
|
||||
on IPv4.
|
||||
|
||||
## Background
|
||||
|
||||
The ISP provides no native IPv6 — only a Hurricane Electric (HE) tunnel
|
||||
(`2001:470:61a3::/48`). HE addresses fall in ranges some sites flag as
|
||||
datacenter/bot traffic, producing endless CAPTCHAs. The goal is to prefer IPv4
|
||||
egress while keeping IPv6 available for our own services and any domain
|
||||
explicitly trusted over IPv6.
|
||||
|
||||
## What this is NOT (and why)
|
||||
|
||||
An earlier iteration used **DNS64 + NAT64 (Tayga)** to force traffic through
|
||||
IPv4. It was removed:
|
||||
|
||||
- **Performance**: Tayga is a userspace translator with no hardware offload.
|
||||
Every translated packet crossed RouterOS twice (v6 in, v4 out) plus a
|
||||
userspace hop, capping throughput at ~250 Mbps against a 1 Gbps line.
|
||||
- **SPOF**: two containers (CoreDNS + Tayga) in the datapath of nearly all
|
||||
traffic on a router whose native forwarder had been rock-solid.
|
||||
- **Architectural inversion**: NAT64 exists to let IPv6-only clients reach IPv4.
|
||||
We don't want IPv6 egress at all — using NAT64 to avoid IPv6 was solving the
|
||||
problem backwards.
|
||||
|
||||
Plain AAAA suppression in CoreDNS achieves the same IPv4-preferred outcome with
|
||||
zero datapath overhead — DNS is the only thing touched, packet forwarding stays
|
||||
on the RouterOS fastpath at line rate.
|
||||
|
||||
The full account of the NAT64/IPv6-mostly attempt and why it was abandoned is in
|
||||
[nat64-dns64-postmortem.md](./nat64-dns64-postmortem.md).
|
||||
|
||||
## How it works
|
||||
|
||||
CoreDNS runs as a single container (`172.20.0.3`), reachable from RouterOS DNS
|
||||
which forwards client queries to it. The [Corefile](../mikrotik/coredns/Corefile)
|
||||
has three server blocks:
|
||||
|
||||
1. **`lumpiasty.xyz`** — our own zone. Forwards normally, keeps AAAA, so internal
|
||||
services reachable over the HE prefix resolve to their real IPv6 addresses.
|
||||
2. **`.` (default)** — forwards everything else, but a `template IN AAAA` block
|
||||
returns empty NOERROR for all AAAA queries, so clients fall back to IPv4 and
|
||||
avoid the HE tunnel's flagged egress. A records and all other types pass
|
||||
through untouched.
|
||||
|
||||
The whitelist is implemented as a reusable `(aaaa_allowed)` snippet imported by
|
||||
zones that should keep AAAA. To trust another domain over IPv6, add a server
|
||||
block for it that imports `aaaa_allowed`.
|
||||
|
||||
### Why suppression, not NXDOMAIN
|
||||
|
||||
The AAAA template returns NOERROR with an empty answer (NODATA), not NXDOMAIN.
|
||||
This is correct: the name exists, it just has no (advertised) AAAA. Clients
|
||||
treat it as "no IPv6 address" and use the A record. Returning NXDOMAIN would
|
||||
wrongly imply the name doesn't exist and break the A lookup.
|
||||
|
||||
## Future improvement
|
||||
|
||||
The current global-suppress-plus-whitelist is coarse: a domain that is genuinely
|
||||
IPv6-only (no A record) and not whitelisted becomes unreachable. The intended
|
||||
end state is a plugin that suppresses AAAA only when the domain also has an A
|
||||
record, so IPv6-only destinations keep working without manual whitelisting. No
|
||||
in-tree CoreDNS plugin does this today.
|
||||
|
||||
## Custom image
|
||||
|
||||
Built from source with a minimal plugin set (`errors`, `log`, `health`,
|
||||
`template`, `cache`, `forward`, `reload`) instead of the default ~40, producing
|
||||
a ~6-8 MB image. The `dns64` plugin is no longer compiled in.
|
||||
|
||||
Source: [`mikrotik/coredns/`](../mikrotik/coredns/). Built by Woodpecker
|
||||
([`.woodpecker/coredns-build.yaml`](../.woodpecker/coredns-build.yaml)) on pushes
|
||||
touching `mikrotik/coredns/**`, pushed to `gitea.lumpiasty.xyz/lumpiasty/coredns-mikrotik`.
|
||||
|
||||
## RouterOS integration
|
||||
|
||||
- `/ip/dns servers=172.20.0.3` — RouterOS forwards client queries to CoreDNS
|
||||
- RDNSS in RA (`/ipv6/nd dns=...` on vlan2/vlan5) advertises an IPv6 resolver
|
||||
(the router's per-VLAN address) to dual-stack clients; RouterOS DNS relays to
|
||||
CoreDNS
|
||||
- No DHCP option 108, no PREF64 — those belonged to the removed IPv6-mostly setup
|
||||
|
||||
## Pitfalls learned (kept for reference)
|
||||
|
||||
These were hit during the NAT64 era and the migration; some still apply:
|
||||
|
||||
1. **RouterOS static FWD entries corrupt NXDOMAIN.** A `type=FWD match-subdomain=yes`
|
||||
entry returns NOERROR/empty instead of relaying NXDOMAIN. Combined with
|
||||
`ndots:5` and kubernetes pod search domains, `getaddrinfo` stops at the first
|
||||
search-suffixed NODATA candidate and never tries the absolute name — apps fail
|
||||
with `ENOTFOUND` for valid hostnames while `nslookup` (absolute query) works.
|
||||
Our own zone is therefore handled in the Corefile, not via a RouterOS FWD
|
||||
entry. RouterOS DNS does plain forwarding only (plus the Tailscale `ts.net`
|
||||
FWD, which is acceptable as its subdomains genuinely don't exist publicly).
|
||||
2. **`advertise-dns=no` on new ND entries.** RouterOS creates per-interface
|
||||
`ipv6 nd` entries with `advertise-dns=no`, suppressing the RDNSS option even
|
||||
when a static `dns=` list is set. Must be enabled explicitly.
|
||||
3. **Per-interface ND entries must be created, not modified.** Only the
|
||||
`interface=all` default ships out of the box; `api_find_and_modify` matching a
|
||||
specific interface silently matches nothing. Use `api_modify`.
|
||||
|
||||
Verification: `rdisc6` (NixOS package `ndisc6`) dumps RA contents. The CoreDNS
|
||||
`log` plugin output is visible via `/log print` on the router (container
|
||||
`logging=yes`) and shows the rcode CoreDNS returned — comparing it to what the
|
||||
client received isolates which hop corrupts a response.
|
||||
@@ -0,0 +1,136 @@
|
||||
# Postmortem: NAT64 / IPv6-mostly attempt
|
||||
|
||||
A record of an architecture that was built, run for ~2 days, and removed. Kept
|
||||
so the reasoning isn't re-discovered the hard way. For the current DNS setup see
|
||||
[coredns.md](./coredns.md); for network overview see [network.md](./network.md).
|
||||
|
||||
## The original problem
|
||||
|
||||
The ISP provides no native IPv6 — only a Hurricane Electric (HE) 6in4 tunnel
|
||||
(`2001:470:61a3::/48`). HE address ranges are widely classified as
|
||||
datacenter/hosting space, so some sites (Google, Cloudflare-fronted services,
|
||||
various login flows) treat IPv6 traffic from them as bot/VPN traffic: endless
|
||||
CAPTCHAs, "unusual traffic" interstitials, or outright blocks. IPv4 egress
|
||||
(the ISP's residential PPPoE address) is unaffected.
|
||||
|
||||
The goal: keep using the network normally without IPv6 triggering these flags,
|
||||
while still wanting some IPv6 (e.g. inbound to self-hosted services).
|
||||
|
||||
## What was built
|
||||
|
||||
An **IPv6-mostly** network (RFC 8925) with **DNS64 + NAT64**, intended to push
|
||||
egress onto IPv4 while presenting IPv6 to clients:
|
||||
|
||||
- **CoreDNS container** with the `dns64` plugin (`translate_all`): synthesized
|
||||
`64:ff9b::/96` AAAA records from A records for *all* names, so even dual-stack
|
||||
destinations resolved to a NAT64 address.
|
||||
- **Tayga container** (`ghcr.io/apalrd/tayga-nat64`): stateless NAT64 translator.
|
||||
IPv6 traffic to `64:ff9b::/96` was routed to it, translated to IPv4, and
|
||||
masqueraded out the GPON PPPoE interface. So all "IPv6" egress actually left
|
||||
as IPv4 on the residential address — bypassing the HE tunnel and its flagging.
|
||||
- **RouterOS RA + DHCP**: DHCP option 108 (IPv6-only preferred) to make capable
|
||||
clients drop IPv4, PREF64 (RFC 8781) to advertise the NAT64 prefix for CLAT,
|
||||
RDNSS (RFC 8106) to hand IPv6-only clients a resolver.
|
||||
- Dedicated `nat64` bridge, `fc64::/126` link, `192.168.240.0/20` Tayga pool,
|
||||
static routes, and firewall rules (including NAT64-mapped RFC1918 blocks to
|
||||
prevent the translator being used as a policy bypass).
|
||||
|
||||
## Why it was removed
|
||||
|
||||
### 1. Performance — the dealbreaker
|
||||
|
||||
Throughput collapsed from line rate (~1 Gbps) to **~200-300 Mbps**, saturating
|
||||
the router CPU. Causes, all structural:
|
||||
|
||||
- Tayga is a **userspace** translator. Every translated packet leaves the kernel
|
||||
fastpath, is copied to userspace, translated, and re-injected.
|
||||
- Translated traffic crosses RouterOS **twice** — once as IPv6 (LAN → Tayga),
|
||||
once as IPv4 (Tayga → WAN, with masquerade) — doubling firewall/conntrack work.
|
||||
- No hardware offload or fasttrack applies to either leg.
|
||||
|
||||
With `translate_all`, *nearly all* internet traffic went through this path, so
|
||||
the penalty hit everything, not just IPv4-only destinations.
|
||||
|
||||
### 2. Single point of failure
|
||||
|
||||
DNS (CoreDNS) and most of the datapath (Tayga) became two containers in the
|
||||
critical path on a router whose built-in forwarder had been completely reliable.
|
||||
Container restarts, image pulls, or a crash now took down connectivity.
|
||||
|
||||
### 3. Architectural inversion
|
||||
|
||||
NAT64 exists to let **IPv6-only** clients reach the **IPv4** internet. The actual
|
||||
goal here was the opposite — *avoid* IPv6 egress entirely. Building an IPv6-only
|
||||
client environment (option 108, CLAT, PREF64) and then translating all of it back
|
||||
to IPv4 was solving the problem backwards. The complexity existed only to route
|
||||
around a property of the HE tunnel.
|
||||
|
||||
### 4. Firewall complexity and a translation bypass hole
|
||||
|
||||
NAT64 punched a hole in the firewall model. RouterOS filters IPv4 and IPv6
|
||||
independently, but NAT64 traffic enters as IPv6 and *leaves* as IPv4 after
|
||||
translation — so the carefully-built IPv4 forward policy (inter-VLAN isolation,
|
||||
RFC1918-to-WAN blocks) was simply bypassed for anything arriving via the
|
||||
translator. A client could reach a private IPv4 range by encoding it in the
|
||||
NAT64 prefix (`64:ff9b::c0a8:xxyy` = `192.168.x.y`), and the IPv4 rules would
|
||||
never see it because the packet was IPv6 until Tayga rewrote it.
|
||||
|
||||
Plugging this required mirroring the IPv4 policy in the IPv6 chain: explicit
|
||||
`reject` rules for every NAT64-mapped RFC1918 block (`64:ff9b::a00:0/104`,
|
||||
`64:ff9b::ac10:0/108`, `64:ff9b::c0a8:0/112`), per-VLAN accept rules toward the
|
||||
`nat64` interface, plus a separate masquerade and LB hairpin-accept for the
|
||||
Tayga pool. That is a parallel, easy-to-get-wrong copy of the existing ruleset,
|
||||
whose correctness depended on getting CIDR-to-prefix arithmetic right. Removing
|
||||
NAT64 deleted all of it.
|
||||
|
||||
### 5. Operational fragility (see coredns.md for detail)
|
||||
|
||||
The setup had a long tail of subtle failure modes, each presenting identically
|
||||
as "client can't connect":
|
||||
|
||||
- RouterOS static `FWD` entries return `NOERROR`/empty instead of relaying
|
||||
`NXDOMAIN`, which broke `getaddrinfo` search-domain handling in Kubernetes
|
||||
pods (`ENOTFOUND` for valid names).
|
||||
- `translate_all` discarded real AAAA for IPv6-only internal services, and
|
||||
returned empty answers for names with no A record.
|
||||
- Per-interface RouterOS `ipv6 nd` entries default to `advertise-dns=no` and must
|
||||
be *created* (not modified), so RDNSS/PREF64 silently never advertised.
|
||||
- Dynamic `from-pool` VLAN addressing made advertised RDNSS addresses point at
|
||||
nonexistent router addresses.
|
||||
- Option 108 honoured by clients before the NAT64 path was verified working left
|
||||
them stuck "obtaining IP address".
|
||||
|
||||
Each was individually fixable, but the aggregate was a brittle system whose
|
||||
benefit didn't justify the surface area.
|
||||
|
||||
## What replaced it
|
||||
|
||||
Plain CoreDNS forwarder with **AAAA suppression by default** plus a whitelist for
|
||||
domains that should keep IPv6 (our own zone over the HE prefix, and any explicitly
|
||||
trusted domain). Clients prefer IPv4 because they simply don't receive AAAA for
|
||||
most names — no translation, no extra datapath hop, packet forwarding stays on the
|
||||
RouterOS fastpath at line rate. DNS is the only thing in the path. See
|
||||
[coredns.md](./coredns.md).
|
||||
|
||||
Tradeoff accepted: a non-whitelisted IPv6-only destination (no A record) is
|
||||
unreachable. In practice essentially everything on the public internet still has
|
||||
an A record. The intended future refinement is a CoreDNS plugin that suppresses
|
||||
AAAA only when an A record also exists, removing the need for the whitelist; no
|
||||
in-tree plugin does this today.
|
||||
|
||||
## Lessons
|
||||
|
||||
- **Measure throughput before committing to an in-path translator on SOHO-class
|
||||
hardware.** Userspace NAT64 (Tayga/Jool-in-container) on a MikroTik CPU is
|
||||
fine for a few hundred Mbps, not for saturating a gigabit line.
|
||||
- **Match the mechanism to the actual goal.** The goal was "prefer IPv4 egress",
|
||||
which is a one-line DNS policy, not a transition technology.
|
||||
- **Prefer solutions that stay on the fastpath.** Anything that pulls bulk
|
||||
traffic into userspace or doubles the forwarding work will dominate the CPU.
|
||||
- **Fewer moving parts in the critical path.** Two containers carrying all DNS
|
||||
and most traffic is a worse availability story than the stock forwarder, for a
|
||||
cosmetic benefit (avoiding CAPTCHAs on some sites).
|
||||
- **Protocol translation breaks the firewall model.** When traffic changes L3
|
||||
protocol mid-path, the two firewall policies must be kept in sync by hand, and
|
||||
any gap is a silent bypass. A solution that doesn't translate keeps a single
|
||||
coherent policy.
|
||||
+3
-8
@@ -94,11 +94,7 @@ There are also networks, which are not VLANs, but are routed:
|
||||
- Containers on CRS<br>
|
||||
Access to every other network<br>
|
||||
IP: 172.20.0.1/24, 2001:470:61a3:500::/64<br>
|
||||
Static IP management, hosts Tailscale and CoreDNS (DNS64) containers
|
||||
- NAT64 link on CRS<br>
|
||||
Dedicated bridge for the Tayga NAT64 container<br>
|
||||
IP: 192.168.239.0/30, fc64::/126 (link), 192.168.240.0/20 (Tayga dynamic pool)<br>
|
||||
IPv6 traffic to 64:ff9b::/96 is routed here for translation to IPv4
|
||||
Static IP management, hosts Tailscale and CoreDNS containers
|
||||
|
||||
Whole network is designed to eliminate VLANs, overlays where unnecessary to keep things simple. Only NAT rules are:
|
||||
|
||||
@@ -107,12 +103,11 @@ Whole network is designed to eliminate VLANs, overlays where unnecessary to keep
|
||||
It doesn't have a gateway configured, we want to access it from other networks so we need to talk to it as if we were in the same subnet
|
||||
- src-nat tailscale IPv6 to internet<br>
|
||||
Tailscale assigns IPv6 from private subnet with no way to configure it, so the assigned IPs are not routable
|
||||
- Masquerade Tayga NAT64 dynamic pool (192.168.240.0/20) via GPON PPPoE
|
||||
- IPv4 port forwards from GPON PPPoE to respective services
|
||||
|
||||
## IPv6-mostly (NAT64/DNS64)
|
||||
## DNS and IPv6 preference
|
||||
|
||||
LAN (vlan2) and IoT (vlan5) are IPv6-mostly networks (RFC 8925): clients capable of IPv6-only operation receive DHCP option 108, drop their IPv4 address, and activate CLAT using the NAT64 prefix advertised via PREF64 in router advertisements. Legacy clients keep dual-stack. DNS64 (CoreDNS container, with `translate_all`) synthesizes 64:ff9b::/96 AAAA answers so all named traffic exits via NAT64 (Tayga container) on our IPv4 WAN — bypassing the HE tunnel for egress and avoiding datacenter-IP captcha flagging. See [CoreDNS DNS64 + NAT64 design](./coredns-nat64.md) for details and deployment pitfalls.
|
||||
DNS is served by a CoreDNS container (`172.20.0.3`); RouterOS forwards client queries to it. CoreDNS suppresses AAAA records by default so clients prefer IPv4, avoiding the HE tunnel's datacenter-flagged egress (which triggers CAPTCHAs on some sites). Our own zone (`lumpiasty.xyz`) and any explicitly whitelisted domains keep AAAA for native IPv6. See [CoreDNS resolver](./coredns.md). An earlier NAT64/IPv6-mostly approach to the same problem was built and abandoned; see the [postmortem](./nat64-dns64-postmortem.md).
|
||||
|
||||
There is also an UPnP and NAT-PMP enabled to automatically configure port forwards from LAN.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user