@@ -15,22 +15,26 @@ Measured flattened rootfs for the arm64 image:
| Component | On-disk size |
|---|---|
| `tailscale.combined` (UPX-compressed) | ~2.98 MB |
| `tailscale.combined` (UPX-compressed) | ~3.47 MB |
| custom static busybox (UPX, ~100 applets) | ~218 kB |
| CA certificates | ~213 kB |
| **Total extracted rootfs ** | * * ~3.4 MB** |
| **Total extracted rootfs ** | * * ~3.9 MB** |
( The compressed image / transfer tarball is ~3.3– 4.3 MB depending on arch.)
The `tailscale.combined` figure includes `netstack` (gVisor), which adds
~0.5 MB on disk over a netstack-omitted build — a deliberate inclusion, see
[Why netstack is required (even with a kernel TUN) ](#why-netstack-is-required-even-with-a-kernel-tun ).
(The compressed image / transfer tarball is ~3.8– 4.3 MB depending on arch.)
| Arch | Image (compressed) |
|---|---|
| amd64 | ~4.2 MB |
| arm64 | ~3.5 MB |
| arm/v7 | ~3.5 MB |
| amd64 | ~4.3 MB |
| arm64 | ~4.0 MB |
| arm/v7 | ~4.0 MB |
On a deployed RouterOS device the container consumes * * ~3.7 MiB of flash**
On a deployed RouterOS device the container consumes * * ~4.2 MiB of flash**
(measured by `free-hdd-space` delta). Note that `du` * inside * the container
reports roughly double that (~7 MB) — that is RouterOS block-allocation
reports roughly double that (~8 MB) — that is RouterOS block-allocation
rounding, **not ** real usage or duplication; see
[Avoiding overlayfs layer duplication ](#avoiding-overlayfs-layer-duplication )
for how to measure correctly.
@@ -118,13 +122,13 @@ delta**, not `du`:
/system/resource/print # note free-hdd-space before and after adding the container
```
The container should consume * * ~3.7 MiB** of flash (e.g. 94.6 → 90.9 MiB free).
The container should consume * * ~4.2 MiB** of flash (e.g. 94.6 → 90.4 MiB free).
Do **not ** trust `du` inside the container for this. Busybox `du` reports
* allocated blocks * , and RouterOS's container store rounds a ~3 MB file up to
~6 MB of blocks — so `du -sx /` reports ~7 MB even though real flash use is
~3.7 MB. `ls -la /usr/local/bin` confirms the binary's true content size
(~3.1 MB) and that it is a single file with two symlinks (no duplication).
* allocated blocks * , and RouterOS's container store rounds the ~3.5 MB binary up
to ~7 MB of blocks — so `du -sx /` reports ~8 MB even though real flash use is
~4.2 MB. `ls -la /usr/local/bin` confirms the binary's true content size
(~3.5 MB) and that it is a single file with two symlinks (no duplication).
The image itself carries the binary in exactly one layer (verified at the blob
level); the inflation is purely the filesystem's block accounting.
@@ -149,7 +153,9 @@ that's a separate build, not just a `--platform` change.
| `advertise-routes` | Expose LAN subnets to the tailnet |
| `use-exit-node` | Route the router's own traffic via a remote exit node |
| `accept-routes` | Receive subnet routes from other tailnet nodes |
| DNS / MagicDNS | Resolve `*.ts.net` names |
| DNS / MagicDNS | Resolve `*.ts.net` names (resolver + resolv.conf manager). **Note: ** serving `100.100.100.100` also requires `netstack` — see [ Why netstack is required (even with a kernel TUN) ]( #why-netstack-is-required-even-with-a-kernel-tun ) |
| `netstack` + `gro` | gVisor userspace stack. Counter-intuitively **required ** to serve MagicDNS on `100.100.100.100` , even though the router uses a real kernel TUN — see [Why netstack is required (even with a kernel TUN) ](#why-netstack-is-required-even-with-a-kernel-tun ) |
| `peerapiserver` | Serves the PeerAPI, including the `/dns-query` DoH endpoint that lets **exit-node clients resolve public DNS automatically ** . A declared dependency of `advertise-exit-node` that the allowlist didn't pull in — see [Why peerapiserver is required for exit-node DNS ](#why-peerapiserver-is-required-for-exit-node-dns ) |
| portmapper (NAT-PMP/PCP/UPnP) | Punch through upstream NAT |
| listenrawdisco | Raw socket disco for better NAT traversal |
| health | Powers `tailscale status` output |
@@ -166,7 +172,6 @@ that's a separate build, not just a `--platform` change.
| `cachenetmap` | **Deliberately removed ** — see [Why netmap disk-caching is removed ](#why-netmap-disk-caching-is-removed ) |
| `logtail` | Would attempt persistent log writes; wear flash. Removing it also removes stderr verbosity filtering — restored by an injected filter, see [Log verbosity filtering ](#log-verbosity-filtering ) |
| `netlog` | Network flow logging; separate concern |
| `netstack` + `gro` | Userspace/gVisor networking; router uses kernel TUN |
| `ssh` | Access via MikroTik SSH + `tailscale` CLI instead |
| `linuxdnsfight` | inotify on `/etc/resolv.conf` ; no systemd in container |
| `networkmanager` / `resolved` / `dbus` / `sdnotify` | No systemd stack in container |
@@ -226,6 +231,158 @@ the in-memory resilience (the common case) while eliminating per-netmap flash
writes. Only `tailscaled.state` (written on auth / key rotation) ever touches
flash.
### Why netstack is required (even with a kernel TUN)
This is the least obvious inclusion in the build, so it is documented in full.
`netstack` is Tailscale's embedded **gVisor userspace TCP/IP stack ** . The
natural assumption — and what earlier versions of this build acted on — is that
a router which owns a **real kernel TUN device ** (it is * not * run with
`--tun=userspace-networking` ) has no use for a userspace stack, so `netstack`
(and its dependent `gro` ) can be omitted to save space. That assumption is
**wrong for one specific, important path: MagicDNS. **
**MagicDNS on `100.100.100.100` is served only by netstack. ** In Tailscale
v1.98.5 the in-process listener for the Tailscale service IP
(`100.100.100.100:53` , UDP) is installed exclusively by netstack's
`handleLocalPackets` , wired into the TUN wrapper as
`PreFilterPacketOutboundToWireGuardNetstackIntercept`
(`wgengine/netstack/netstack.go` ). When a packet leaves the host toward
`100.100.100.100` , this hook absorbs it into the gVisor stack, whose UDP-53
acceptor runs the MagicDNS resolver.
**The "engine fallback" does not actually exist. ** The TUN wrapper consults a
second hook, `PreFilterPacketOutboundToWireGuardEngineIntercept` , and a comment
in `net/tstun/wrap.go` claims it "primarily handles quad-100 if netstack is not
installed." In v1.98.5 that comment is **false on Linux ** : the engine
`handleLocalPackets` (`wgengine/userspace.go` ) only reflects loopback on
darwin/ios/plan9 and otherwise returns `Accept` — it never touches
`100.100.100.100` . So with `ts_omit_netstack` there is **no ** code that absorbs
quad-100 packets at all.
* * `dns` and `netstack` are independent tags.** The `dns` feature (which this
build opts in) links the resolver and the `/etc/resolv.conf` manager, but it has
no dependency on `netstack` and does **not ** install any quad-100 transport.
The net result of `dns` on + `netstack` off is a resolver that is correctly
wired up but that **never receives any packets ** — the worst kind of silent
breakage. Symptoms observed on the device:
- `/etc/resolv.conf` correctly points at `100.100.100.100` (the manager works),
- but `dig anything @100.100.100.100` from inside the container **times out **
("no servers could be reached"),
- and even tailnet-internal names fail: `ping host.<tailnet>.ts.net` →
`bad address` (a name that needs **no ** upstream forwarding still can't
resolve, proving the listener itself is dead, not an upstream-resolver issue),
- while `ping 1.1.1.1` (a raw IP needing no DNS) works fine over the kernel data
path — confirming forwarding/exit-node connectivity is unaffected and isolating
the fault to DNS serving.
**It also fixed a crash. ** Omitting `netstack` set `buildfeatures.HasNetstack`
to a compile-time `false` , which turned the guard in
`net/tstun.invertGSOChecksum` (`if !HasNetstack { panic("unreachable") }` ) into
an always-panic. That function is called on the packet-injection path used when
enabling exit-node mode, producing `panic: unreachable` and a daemon restart
loop. Enabling `netstack` makes `HasNetstack` a const `true` , so the guard
becomes dead code and the crash disappears as a side effect — fixed at the root
cause rather than patched around.
**Cost. ** Measured on arm64, a netstack-enabled build versus a netstack-omitted
one:
| Metric | netstack omitted | netstack enabled | Delta |
|---|---|---|---|
| Extracted rootfs (flash) | ~3.42 MB | ~3.91 MB | * * +0.49 MB** |
| `tailscale.combined` on disk (UPX) | ~2.99 MB | ~3.47 MB | +0.48 MB |
| Resident RAM after UPX decompress | ~12.25 MB | ~14.56 MB | * * +2.31 MB** |
The flash cost (~0.5 MB) is negligible on a 16 MB-class device. The RAM cost
(~2.3 MB resident) is the real consideration on low-memory models, but is
acceptable given that without it MagicDNS is entirely non-functional. The
trade is: **half a megabyte of flash to make MagicDNS work at all. ** `gro`
(Generic Receive Offload) depends on `netstack` and is pulled in alongside it;
it is small and improves throughput on the netstack path.
**Caveat for future Tailscale bumps. ** This coupling (quad-100 serving living
only in netstack) is an upstream implementation detail, not a stable contract.
If a future release adds a genuine non-netstack quad-100 path — or the daemon
itself is refactored — re-test whether `netstack` can be dropped again. The
canary is simple: from inside the container, `dig google.com @100.100.100.100`
must return answers and `ping <host>.<tailnet>.ts.net` must resolve.
### Why peerapiserver is required for exit-node DNS
This is a second non-obvious DNS inclusion, and it exposes a limitation of the
allowlist build strategy.
**Symptom. ** With `netstack` enabled, MagicDNS worked from the router and from
LAN hosts, including public names. But a device using this router **as its exit
node** could not resolve public names: `dig google.com @100.100.100.100` on the
* client * returned an instant authoritative `SERVFAIL` (`flags: qr aa rd ad` ,
`Query time: 0 msec` , "recursion not available"). Tailnet names and raw-IP
connectivity (e.g. `ping 1.1.1.1` ) through the exit node worked.
**Root cause. ** The `SERVFAIL` is generated **on the client ** , locally, with no
network I/O — which is why it is instant and authoritative. The path
(traced through v1.98.5 source):
1. The client's query for `google.com` reaches its in-process resolver, which
determines the name is not a tailnet name and marks it for forwarding
(`net/dns/resolver/tsdns.go` ).
2. The forwarder looks up which upstream resolver to use for the catch-all
`"."` route (`net/dns/resolver/forwarder.go` → `resolvers()` ).
3. That route set is **empty ** , so `forwardWithDestChan` short-circuits and
synthesises an authoritative `SERVFAIL` (`servfailResponse` , `aa=1` ) without
opening any socket. The query never reaches this router at all.
Why the route set is empty: when a client selects an exit node,
`dnsConfigForNetmap` (`ipn/ipnlocal/node_backend.go` ) deliberately routes **all **
default DNS through the exit node and drops the client's own LAN/system
resolver — the whole premise of an exit node is "send everything, including
DNS, through me." It does this by setting the client's default resolver to the
exit node's **DoH proxy ** URL (`http://<peer>/dns-query` ). But that only happens
if `exitNodeCanProxyDNS(thisRouter)` returns true — i.e. if **this router
advertises a working PeerAPI DoH endpoint**. If it does not, and there is no
tailnet global nameserver to fall back to, the client ends up with an empty
default route and returns `SERVFAIL` .
**Why this router didn't advertise the DoH proxy. ** The `/dns-query` DoH
endpoint is part of the **PeerAPI server ** , gated by
`buildfeatures.HasPeerAPIServer` (`ipn/ipnlocal/peerapi.go` ). With
`ts_omit_peerapiserver` , `initPeerAPIListenerLocked()` returns early: no PeerAPI
listener is created, the `PeerAPIDNS` service is never advertised, and
`peerCanProxyDNS()` is false for this node on every client.
**The allowlist gap that caused it. ** In `feature/featuretags/featuretags.go` ,
`advertiseexitnode` **declares a dependency on `peerapiserver` ** ("to run the
ExitDNS server"). Upstream's own `--add` resolution would have pulled it in.
But this build's allowlist works differently: it runs `featuretags --min` to get
the full omit set, then strips the specific `ts_omit_<feature>` tags it wants —
it does **not ** re-resolve transitive `Deps` . So opting in `advertiseexitnode`
did not pull in `peerapiserver` , and `featuretags --min` had emitted
`ts_omit_peerapiserver` , leaving the node an exit node * without * its declared
ExitDNS dependency — a feature combination upstream's graph says shouldn't
occur. Including `peerapiserver` explicitly closes the gap.
> **Known limitation:** the allowlist (strip-individual-`ts_omit_`-tags) does
> not resolve feature dependencies. When opting a feature in, check its `Deps`
> in `featuretags.go` and add them explicitly. `peerapiserver` is the only such
> gap found and fixed so far; a full dependency audit has not been done.
**Cost. ** Negligible. `peerapiserver` has **no ** `Deps` and pulls in no large
subsystems; measured at ~+10 kB on the UPX'd binary (arm64), rootfs unchanged
within measurement noise.
**Result. ** The router now serves the exit-node DoH DNS proxy, so devices using
it as their exit node resolve public names automatically — the normal exit-node
behavior — with **no ** tailnet DNS configuration required. (Setting a tailnet
global nameserver in the admin console is an alternative runtime fix that also
works, by populating the client's default resolver directly; it is not required
once the router serves the proxy.)
**Canary for future bumps: ** from a client using this router as exit node,
`dig google.com @100.100.100.100` must return real answers with `flags: ... ra`
(recursion available) and a non-zero query time.
### Log verbosity filtering
Upstream `tailscaled` embeds verbosity tags (`[v1]` , `[v2]` , …) inside its log