From e0cbaee48b081e6f39fe40f4617a3accf4aa25a0 Mon Sep 17 00:00:00 2001 From: Lumpiasty Date: Fri, 29 May 2026 04:24:12 +0200 Subject: [PATCH] split docs into README + USAGE/DEVELOPMENT/DESIGN README shrinks to a repo intro with pointers. Separate the three audiences: - docs/USAGE.md deploy the prebuilt image on RouterOS + operate it - docs/DEVELOPMENT.md build, local test, version bump, cut releases - docs/DESIGN.md size optimizations, feature allowlist, why the updater and netmap disk-cache are removed, flash-wear protection, versioning/release architecture, the overlayfs layer-duplication gotcha, dependency pinning --- README.md | 399 ++++---------------------------------------- docs/DESIGN.md | 339 +++++++++++++++++++++++++++++++++++++ docs/DEVELOPMENT.md | 159 ++++++++++++++++++ docs/USAGE.md | 201 ++++++++++++++++++++++ 4 files changed, 735 insertions(+), 363 deletions(-) create mode 100644 docs/DESIGN.md create mode 100644 docs/DEVELOPMENT.md create mode 100644 docs/USAGE.md diff --git a/README.md b/README.md index 8177df9..dcef663 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,26 @@ A minimal Tailscale Docker image built for MikroTik routers running 16 MB internal flash. Built from source with only router-relevant features included. +- **~4 MB** extracted rootfs (`FROM scratch` + UPX'd Tailscale binary + a custom + static busybox debug shell). +- **Multi-arch**: amd64, arm64, arm/v7 — one tag, RouterOS pulls the right one. +- **No built-in updater** (it would pull the full upstream binary and wear + flash); updates are delivered by CI and pulled only when the image actually + changed. +- **Flash-wear conscious**: minimal persistent state, no netmap disk-caching, + tmpfs for scratch and runtime. + +## Documentation + +- **[Usage](docs/USAGE.md)** — deploy the published image on a MikroTik router + and operate it (networking, auth, MagicDNS, automatic updates). Start here if + you just want it running. +- **[Development](docs/DEVELOPMENT.md)** — build the image, test it locally, bump + the Tailscale version, and cut releases. +- **[Design & rationale](docs/DESIGN.md)** — size optimizations, the feature + allowlist, why certain features are deliberately removed, flash-wear + protection, and the versioning / release / update architecture. + ## Supported architectures | Docker platform | RouterOS arch | Example devices | @@ -15,371 +35,24 @@ included. | `linux/arm64` | arm64 | RB5009, CCR2004/2116/2216, hAP ax³, L009, Chateau | | `linux/arm/v7` | arm (ARMv7) | hAP ac², RB3011, RB4011, RB1100AHx4 | -A single Dockerfile builds all three. The Go binary is **cross-compiled** (the -builder stage runs natively on the host for speed), while the busybox stage and -final image are built for the target platform (via `buildx` + QEMU/binfmt for -non-native targets). +ARMv5 (hEX Refresh / hAP ax S) is **not** supported — see +[DESIGN.md](docs/DESIGN.md#architecture-support). -**ARMv5 is not supported** (hEX Refresh / hAP ax S, EN7562CT CPU — RouterOS -calls these `arm32v5`). ARMv5 has no Alpine/musl base image, so it cannot use -this image's musl + `scratch` design; it would require a glibc (Debian) base -and produce a substantially larger image (~50 MB+ vs ~4 MB). If you need it, -that's a separate build, not just a `--platform` change. +## Quick start -## Image size +- **Run it on a router:** follow **[docs/USAGE.md](docs/USAGE.md)** — it deploys + the prebuilt image, no build needed. +- **Build it yourself:** `./build.sh` (needs docker buildx + QEMU for + cross-arch); details in **[docs/DEVELOPMENT.md](docs/DEVELOPMENT.md)**. -On-disk footprint once extracted (this is what matters — RouterOS stores the -**extracted** rootfs on disk via overlayfs, not the compressed layers): +## Repository layout -| Component | On-disk size | +| Path | Purpose | |---|---| -| tailscale.combined (UPX-compressed) | ~3.84 MB | -| custom static busybox (UPX, ~100 applets) | ~229 kB | -| CA certificates | ~218 kB | -| **Total extracted rootfs** | **~4.1 MB** | - -(The compressed image / transfer tarball is ~4.3 MB.) - -The binary is built with Tailscale's `--extra-small` feature tag set as the -baseline. Features are opted in explicitly — any new feature Tailscale adds -in a future release stays omitted until deliberately added to the Dockerfile. - -### Size optimizations applied - -- **Feature allowlist** (`--extra-small` baseline + ~10 opt-ins) keeps the - binary minimal and forward-safe against new Tailscale features. -- **`-gcflags=all=-l`** disables function inlining across all packages, - shrinking the compressed binary by ~600 kB. Inlining is a performance - optimization only; disabling it does not affect correctness. The CPU cost - is negligible for an I/O-bound router daemon. -- **`-ldflags="-s -w"`** strips the symbol table and DWARF debug info. -- **`-trimpath`** removes local filesystem paths from the binary. -- **UPX `--lzma --best`** compresses the Tailscale binary (~14 MB → ~3.8 MB). -- **Custom static busybox** — instead of the official `busybox:musl` image - (all ~404 applets, ~1.24 MB), a static busybox is built from source with - only ~100 curated applets (~420 kB), then UPX-compressed to ~229 kB on - disk. The applet set is defined in - [`busybox-applets.config`](busybox-applets.config). - - **busybox UPX requires care.** UPX normally breaks busybox's standalone - applet dispatch: the ash shell re-execs `/proc/self/exe` to run built-in - applets, and UPX breaks that path so typed commands fail - ([upx#248](https://github.com/upx/upx/issues/248), closed as "invalid"). - We work around it by building **without** the standalone/nofork features - and providing an explicit `/bin/` symlink farm. Commands then - resolve via the normal `PATH` → symlink → `argv[0]` dispatch, which works - under UPX. The cost is a `fork+exec` per command instead of a nofork - internal call — fine for an occasional debug shell. - - Because RouterOS stores the extracted rootfs on disk, UPX'ing busybox - saves a real ~195 kB of flash (424 kB → 229 kB), not just transfer size. - -The final image is built `FROM scratch` — there is no base distro layer. -It contains only the busybox binary + applet symlinks, the CA bundle, and -the Tailscale binary. - -## Features included - -| Feature | Why | -|---|---| -| `advertise-exit-node` | Run the router as a Tailscale exit node | -| `advertise-routes` | Expose LAN subnets to the tailnet | -| `use-exit-node` | Route the router's own traffic via a remote exit node | -| `accept-routes` | Receive subnet routes from other tailnet nodes | -| DNS / MagicDNS | Resolve `*.ts.net` names (see DNS section below) | -| portmapper (NAT-PMP/PCP/UPnP) | Punch through upstream NAT | -| listenrawdisco | Raw socket disco for better NAT traversal | -| health | Powers `tailscale status` output | -| cachenetmap | Cache network map for faster reconnect after reboot | -| iptables | Linux iptables support for routing rules | -| osrouter | Configure kernel network stack and routing tables | - -## Features intentionally omitted - -| Feature | Reason | -|---|---| -| `clientupdate` | Updates are managed by rebuilding the Docker image | -| `logtail` | Would attempt persistent log writes; wear flash | -| `netlog` | Network flow logging; separate concern | -| `netstack` + `gro` | Userspace/gVisor networking; router uses kernel TUN | -| `ssh` | Access via MikroTik SSH + `tailscale` CLI instead | -| `linuxdnsfight` | inotify on `/etc/resolv.conf`; no systemd in container | -| `networkmanager` / `resolved` / `dbus` / `sdnotify` | No systemd stack in container | -| `drive` / `taildrop` / `webclient` | Not useful on a headless router | -| All GUI / desktop / cloud / k8s features | Irrelevant | - -## Volume layout - -Three mount points, with different persistence requirements: - -``` -/var/lib/tailscale persistent — node identity, auth state - bind-mount to MikroTik disk storage - written rarely (only on auth / key rotation) - -/var/lib/tailscale/cache ephemeral — netmap cache - mount as tmpfs to avoid flash writes - recreated automatically on next connect - -/var/run/tailscale ephemeral — daemon Unix socket - mount as tmpfs - lost on reboot, recreated on start -``` - -Keeping the cache and socket directories on tmpfs prevents unnecessary -flash wear while still allowing fast reconnect after reboot (the cache -is repopulated from the Tailscale coordination server on first connect). - -## Building - -### All architectures at once - -Use the helper script (requires `docker buildx` + QEMU/binfmt for non-native -targets): - -```sh -# One-time: register emulators for cross-arch builds -docker run --privileged --rm tonistiigi/binfmt --install arm64,arm - -# Build all arches and load into local docker -./build.sh - -# Build all arches and also export per-arch tarballs into ./dist/ -./build.sh --tar - -# Build a single arch -./build.sh arm64 -./build.sh --tar armv7 -``` - -### Manual single-arch build - -The architecture is selected via `buildx --platform`; the Dockerfile maps it to -the correct `GOARCH`/`GOARM` automatically: - -```sh -docker buildx build --platform linux/arm64 --load -t mikrotik-tailscale:arm64 . -docker buildx build --platform linux/arm/v7 --load -t mikrotik-tailscale:armv7 . -docker buildx build --platform linux/amd64 --load -t mikrotik-tailscale:amd64 . -``` - -To build for a different Tailscale version, add: - -```sh ---build-arg TAILSCALE_VERSION=v1.98.3 -``` - -### Notes - -- The Go builder cross-compiles natively (fast); only the busybox stage runs - under emulation for non-native targets. -- The build prints the resolved target and Go build tags, e.g.: - - ``` - Cross-compiling: GOOS=linux GOARCH=arm64 GOARM= - Build tags: ts_include_cli,ts_omit_ace,ts_omit_acme,... - ``` - -### Per-architecture image sizes - -| Arch | Image | -|---|---| -| amd64 | ~4.2 MB | -| arm64 | ~3.5 MB | -| arm/v7 | ~3.5 MB | - -## Running (local test) - -```sh -# Create a volume for persistent state -docker volume create tailscale-state - -# Start the daemon -docker run -d \ - --name tailscale \ - --cap-add NET_ADMIN \ - --cap-add NET_RAW \ - --device /dev/net/tun \ - --tmpfs /var/lib/tailscale/cache \ - --tmpfs /var/run/tailscale \ - -v tailscale-state:/var/lib/tailscale \ - mikrotik-tailscale - -# Authenticate (opens browser / prints auth URL) -docker exec tailscale tailscale login - -# Check status -docker exec tailscale tailscale status - -# Advertise a subnet -docker exec tailscale tailscale set --advertise-routes=192.168.88.0/24 - -# Advertise as exit node -docker exec tailscale tailscale set --advertise-exit-node -``` - -Subnet routes and exit node advertisement must also be approved in the -[Tailscale admin console](https://login.tailscale.com/admin/machines). - -## Unattended authentication - -For automated / headless deployment, use an auth key: - -```sh -docker exec tailscale tailscale up \ - --authkey=tskey-auth- \ - --advertise-routes=192.168.88.0/24 \ - --advertise-exit-node -``` - -Auth keys can be created in the Tailscale admin console under -**Settings → Keys**. Use a reusable key tagged with a device tag for -infrastructure nodes. - -## MagicDNS - -The binary includes DNS support but the daemon is started with -`--no-logs-no-support`, which does not affect DNS. To use MagicDNS name -resolution, configure MikroTik's DNS to forward `.ts.net` queries to -Tailscale's magic DNS resolver: - -``` -/ip dns static -add name="ts.net" type=FWD forward-to=100.100.100.100 match-subdomain=yes -``` - -This avoids writing to `/etc/resolv.conf` inside the container (which would -happen if `--accept-dns` is passed to `tailscale up`). The container resolves -Tailscale node names; the rest of the router uses its own DNS. - -## Flash wear protection - -Several measures are in place to avoid wearing out internal flash: - -- `clientupdate` omitted from binary — no background update downloads -- `logtail` omitted from binary — no log upload attempts -- `--no-logs-no-support` passed to daemon — suppresses any remaining log - buffering -- `netmap` cache mounted on tmpfs — cache writes never reach flash -- `/var/run/tailscale` socket on tmpfs — runtime files never reach flash -- Only `/var/lib/tailscale/tailscaled.state` touches persistent storage, - and it is written only when the node authenticates or rotates its key - -## Upgrading - -Version bumps (Tailscale, busybox, base image digests) are normally proposed -automatically via Renovate — see -[Dependency pinning & automated updates](#dependency-pinning--automated-updates). -Merge the Renovate PR, then rebuild and redeploy. - -The feature allowlist in the Dockerfile carries forward automatically across -Tailscale versions — any new `ts_omit_*` tags introduced in a new release will -be omitted by default. - -To bump manually, edit `ARG TAILSCALE_VERSION` in the `Dockerfile` (so the pin -stays in version control) and rebuild: - -```sh -./build.sh --tar # rebuild all arches at the pinned version -# or, override at build time without editing the Dockerfile: -docker buildx build --platform linux/arm64 \ - --build-arg TAILSCALE_VERSION=v1.100.0 \ - --load -t mikrotik-tailscale:arm64 . -``` - -## Versioning & releases - -Released images are versioned as: - -``` -v-mt. -``` - -e.g. `v1.98.3-mt.1`. The two parts mean: - -- **`v`** — the bundled Tailscale version (the "what's - inside" identifier), taken from `ARG TAILSCALE_VERSION` in the Dockerfile. -- **`mt.`** — the local revision. It only changes on a *meaningful* release, - never on a build-system-only rebuild. - -### When a release happens - -| Trigger | Result | -|---|---| -| Renovate bumps `TAILSCALE_VERSION` (merged to `main`) | CI **auto-creates** git tag `v-mt.1` → image published | -| You make a meaningful fix/change on the current Tailscale version | **You** create the next tag manually (`v-mt.2`, `mt.3`, …) → image published | -| Dependency-only bump (Go / Alpine / busybox / Dockerfile syntax) | **No release.** Rides along with the next Tailscale bump or manual tag | - -So routers only ever see a new release for Tailscale bumps or your deliberate -fixes — build-system churn doesn't trigger updates. - -Each published image is stamped with `org.opencontainers.image.version` equal to -its full tag; this is the value the MikroTik update job compares against the -registry to decide whether to recreate the container. - -### How it's wired (Woodpecker) - -- **`.woodpecker/release-tag.yaml`** — on push to `main`, parses - `TAILSCALE_VERSION`; if no `v-mt.*` tag exists yet, creates and pushes - `v-mt.1` (using the Gitea token from OpenBao). It never creates `mt.2+`. -- **`.woodpecker/release.yaml`** — on a `v*-mt.*` tag push, builds the - multi-arch manifest (amd64 + arm64 + arm/v7) and pushes it to - `gitea.lumpiasty.xyz/lumpiasty/mikrotik-tailscale` as both `:` and - `:stable`. Registry creds come from OpenBao (`secret/container-registry`). - -### Cutting a manual release - -```sh -# fix something, commit to main, then: -git tag -a v1.98.3-mt.2 -m "Fix X" -git push origin v1.98.3-mt.2 -``` - -The tag push triggers the build+publish automatically. - -## Dependency pinning & automated updates - -All upstream dependencies are version-pinned for reproducible builds: - -All versions are fully qualified (no floating `major.minor` tags): - -| Dependency | Where | Pinned form | -|---|---|---| -| Go toolchain | `Dockerfile` `FROM golang:…` | full version tag + `@sha256` digest | -| Alpine (busybox build base) | `Dockerfile` `FROM alpine:…` | full version tag + `@sha256` digest | -| Tailscale | `Dockerfile` `ARG TAILSCALE_VERSION` | full git release tag | -| busybox | `Dockerfile` `ARG BUSYBOX_VERSION` | full release version | -| Renovate / OpenBao | `.woodpecker/renovate.yaml` `image:` | full version tag | - -Updates are proposed automatically by [Renovate](https://docs.renovatebot.com/), -run **self-hosted** from a Woodpecker cron pipeline (Woodpecker has no native -Renovate support): - -- `renovate.json` — repository rules. All dependencies follow the latest - upstream releases (including major versions); each bump arrives as its own PR - that the multi-arch build validates before you merge. Base image tags also - get their `@sha256` digests refreshed via `pinDigests`. The one special rule: - - `tailscale` only follows **stable** releases — Tailscale uses even minor - versions for stable (`v1.98.x`) and odd for unstable (`v1.99.x`), so the - rule filters to even minors. -- `.woodpecker/renovate.yaml` — the scheduled job that runs `renovate/renovate` - against this repo. - -```sh -# Renovate repo config -docker run --rm -e RENOVATE_CONFIG_TYPE=repo -v "$PWD":/work -w /work \ - --entrypoint renovate-config-validator renovate/renovate - -# Woodpecker pipeline -docker run --rm -v "$PWD":/work -w /work \ - woodpeckerci/woodpecker-cli:v3 lint .woodpecker/renovate.yaml -``` - -## References - -- [Tailscale: Smaller binaries for embedded devices](https://tailscale.com/docs/how-to/set-up-small-tailscale) -- [Renovate self-hosting](https://docs.renovatebot.com/getting-started/running/) -- [Woodpecker cron jobs](https://woodpecker-ci.org/docs/usage/cron) -- [MikroTik Container documentation](https://help.mikrotik.com/docs/display/ROS/Container) -- [Tailscale subnet routers](https://tailscale.com/kb/1019/subnets) -- [Tailscale exit nodes](https://tailscale.com/kb/1103/exit-nodes) +| `Dockerfile` | Multi-stage, multi-arch build (cross-compiled Go + custom busybox) | +| `busybox-applets.config` | Curated busybox applet set | +| `build.sh` | Build all/one arch, optionally export per-arch tarballs | +| `routeros/update-tailscale.rsc` | RouterOS auto-update script (digest compare + recreate) | +| `.woodpecker/` | CI: Renovate cron, release tagging, multi-arch publish | +| `renovate.json` | Dependency-update rules | +| `docs/` | Tutorial and design docs | diff --git a/docs/DESIGN.md b/docs/DESIGN.md new file mode 100644 index 0000000..afb5438 --- /dev/null +++ b/docs/DESIGN.md @@ -0,0 +1,339 @@ +# Design & rationale + +Why `mikrotik-tailscale` is built the way it is: size optimizations, the +feature allowlist, deliberate omissions, flash-wear protection, and the +versioning/release/update architecture. + +For deployment, see [USAGE.md](USAGE.md); for building and releasing, see +[DEVELOPMENT.md](DEVELOPMENT.md). + +## Image size + +On-disk footprint once extracted (this is what matters — RouterOS stores the +**extracted** rootfs on disk via overlayfs, not the compressed layers). +Measured flattened rootfs for the arm64 image: + +| Component | On-disk size | +|---|---| +| `tailscale.combined` (UPX-compressed) | ~2.98 MB | +| custom static busybox (UPX, ~100 applets) | ~218 kB | +| CA certificates | ~213 kB | +| **Total extracted rootfs** | **~3.4 MB** | + +(The compressed image / transfer tarball is ~3.3–4.3 MB depending on arch.) + +| Arch | Image (compressed) | +|---|---| +| amd64 | ~4.2 MB | +| arm64 | ~3.5 MB | +| arm/v7 | ~3.5 MB | + +> The extracted rootfs must contain the binary only **once**. If you measure +> ~7 MB on the device with `du -sx /`, the Dockerfile has reintroduced an +> overlayfs copy-up — see +> [Avoiding overlayfs layer duplication](#avoiding-overlayfs-layer-duplication). + +The binary is built with Tailscale's `--extra-small` feature tag set as the +baseline. Features are opted in explicitly — any new feature Tailscale adds +in a future release stays omitted until deliberately added to the Dockerfile. + +### Size optimizations applied + +- **Feature allowlist** (`--extra-small` baseline + ~10 opt-ins) keeps the + binary minimal and forward-safe against new Tailscale features. +- **`-gcflags=all=-l`** disables function inlining across all packages, + shrinking the compressed binary by ~600 kB. Inlining is a performance + optimization only; disabling it does not affect correctness. The CPU cost + is negligible for an I/O-bound router daemon. +- **`-ldflags="-s -w"`** strips the symbol table and DWARF debug info. +- **`-trimpath`** removes local filesystem paths from the binary. +- **UPX `--lzma --best`** compresses the Tailscale binary (~14 MB → ~3.8 MB). +- **Custom static busybox** — instead of the official `busybox:musl` image + (all ~404 applets, ~1.24 MB), a static busybox is built from source with + only ~100 curated applets (~420 kB), then UPX-compressed to ~229 kB on + disk. The applet set is defined in + [`busybox-applets.config`](../busybox-applets.config). + + **busybox UPX requires care.** UPX normally breaks busybox's standalone + applet dispatch: the ash shell re-execs `/proc/self/exe` to run built-in + applets, and UPX breaks that path so typed commands fail + ([upx#248](https://github.com/upx/upx/issues/248), closed as "invalid"). + We work around it by building **without** the standalone/nofork features + and providing an explicit `/bin/` symlink farm. Commands then + resolve via the normal `PATH` → symlink → `argv[0]` dispatch, which works + under UPX. The cost is a `fork+exec` per command instead of a nofork + internal call — fine for an occasional debug shell. + + Because RouterOS stores the extracted rootfs on disk, UPX'ing busybox + saves a real ~195 kB of flash (424 kB → 229 kB), not just transfer size. + +The final image is built `FROM scratch` — there is no base distro layer. +It contains only the busybox binary + applet symlinks, the CA bundle, and +the Tailscale binary. + +### Avoiding overlayfs layer duplication + +A subtle but important detail: **the final image must not run a `RUN` that +mutates a directory already populated by an earlier layer**, or the extracted +on-disk size roughly doubles for that directory's contents. + +RouterOS Container uses overlayfs and stores the **extracted** layers on disk. +Each Dockerfile instruction is its own layer. If `/usr/local/bin/` is created by +a `COPY` (containing the ~3 MB `tailscale.combined`) and a later `RUN ln -s …` +adds a symlink *inside that same directory*, overlayfs performs a **copy-up**: +it copies the entire `/usr/local/bin/` directory — including the 3 MB binary — +into the new layer's upper dir. RouterOS then extracts both copies to flash, so +`du -sx /` reports ~7 MB instead of ~3.4 MB for a directory whose only real file +is 3 MB. (The compressed image hides this — compression dedupes identical blocks +— which is why it only shows up when you measure the *extracted* rootfs on the +device.) + +The fix: assemble `/usr/local/bin/` completely in the **builder** stage (binary ++ both `argv[0]` symlinks) and bring it into the final image with a **single +`COPY` layer**, never mutating it afterwards. The Dockerfile does this; don't +reintroduce a post-`COPY` `RUN` against that path. + +To verify the extracted footprint on a deployed router: + +``` +/container/shell [find where name=tailscale] +du -sx / # expect ~3500 KiB (1 KiB blocks), not ~7000 +``` + +## Architecture support + +A single Dockerfile builds all three supported RouterOS architectures. The Go +binary is **cross-compiled** (the builder stage runs natively on the host for +speed), while the busybox stage and final image are built for the target +platform (via `buildx` + QEMU/binfmt for non-native targets). + +**ARMv5 is not supported** (hEX Refresh / hAP ax S, EN7562CT CPU — RouterOS +calls these `arm32v5`). ARMv5 has no Alpine/musl base image, so it cannot use +this image's musl + `scratch` design; it would require a glibc (Debian) base +and produce a substantially larger image (~50 MB+ vs ~4 MB). If you need it, +that's a separate build, not just a `--platform` change. + +## Features included + +| Feature | Why | +|---|---| +| `advertise-exit-node` | Run the router as a Tailscale exit node | +| `advertise-routes` | Expose LAN subnets to the tailnet | +| `use-exit-node` | Route the router's own traffic via a remote exit node | +| `accept-routes` | Receive subnet routes from other tailnet nodes | +| DNS / MagicDNS | Resolve `*.ts.net` names | +| portmapper (NAT-PMP/PCP/UPnP) | Punch through upstream NAT | +| listenrawdisco | Raw socket disco for better NAT traversal | +| health | Powers `tailscale status` output | +| iptables | Linux iptables support for routing rules | +| osrouter | Configure kernel network stack and routing tables | + +## Features intentionally omitted + +| Feature | Reason | +|---|---| +| `clientupdate` | **Deliberately removed** — see [Why the built-in updater is removed](#why-the-built-in-updater-is-removed) | +| `cachenetmap` | **Deliberately removed** — see [Why netmap disk-caching is removed](#why-netmap-disk-caching-is-removed) | +| `logtail` | Would attempt persistent log writes; wear flash | +| `netlog` | Network flow logging; separate concern | +| `netstack` + `gro` | Userspace/gVisor networking; router uses kernel TUN | +| `ssh` | Access via MikroTik SSH + `tailscale` CLI instead | +| `linuxdnsfight` | inotify on `/etc/resolv.conf`; no systemd in container | +| `networkmanager` / `resolved` / `dbus` / `sdnotify` | No systemd stack in container | +| `drive` / `taildrop` / `webclient` | Not useful on a headless router | +| All GUI / desktop / cloud / k8s features | Irrelevant | + +### Why the built-in updater is removed + +Tailscale's `clientupdate` feature (and `tailscale update` / auto-update) is +**intentionally compiled out**, for several compounding reasons: + +- **It would defeat the entire purpose of this build.** `clientupdate` + downloads the *full official upstream binary* — built with every feature, tens + of megabytes — and writes it onto the device. This image exists precisely to + be a few MB with only router-relevant features; letting it pull the upstream + binary would undo all of that. +- **It would risk filling the flash.** On a 16 MB-class device, downloading and + unpacking a large upstream binary can simply run the device out of space, and + the download itself causes significant flash writes. +- **It can't work on a container image anyway.** The binary lives in a + read-only, content-addressed image layer. An in-place self-update has nowhere + valid to write and would not survive a container recreate — the next pull + would replace it regardless. +- **Updates should be controlled and reproducible.** Instead of the client + silently swapping its own binary, new versions are produced by rebuilding and + republishing *this* image through CI (pinned dependencies, known feature set, + multi-arch). The device then pulls a new image **only when it actually + changed** — see [Versioning & releases](#versioning--releases). + +Net effect: the update path is explicit, version-pinned, flash-safe, and keeps +the on-device footprint minimal — none of which the built-in updater could +provide here. + +### Why netmap disk-caching is removed + +The `cachenetmap` feature is **intentionally omitted**. It is worth being +precise about what it does and doesn't do: + +- The network map always lives in the daemon's **memory** — this is core + behavior, not gated by any feature flag. A daemon that has connected once and + then **loses its control-plane connection keeps that map** and can still + reach known peers. The data path is direct WireGuard / DERP between nodes; the + control plane is only for coordination, not for relaying your traffic. So + initiating a connection to a reachable peer during a control outage works + **without** this feature, as long as the daemon stays running. +- `cachenetmap` *only* adds writing that map to **disk**, so the node can come + online from the last-known config after a **cold start that coincides with a + control-plane outage** — a narrow case (it requires a reboot *and* control + being unreachable at that moment *and* needing connectivity before control + recovers). + +The cost of the feature is that it writes the netmap to flash, and the netmap +changes frequently on an active tailnet (every peer endpoint/DERP/online-status +change). For a flash-constrained router that is the wrong trade: frequent writes +to internal flash to buy resilience for a rare corner case. Omitting it keeps +the in-memory resilience (the common case) while eliminating per-netmap flash +writes. Only `tailscaled.state` (written on auth / key rotation) ever touches +flash. + +## Volume layout + +Two mount points, with different persistence requirements: + +``` +/var/lib/tailscale persistent — node identity, auth state + bind-mount to MikroTik disk storage + written rarely (only on auth / key rotation / + prefs change); netmap is not cached to disk + (cachenetmap omitted), so no per-netmap writes + +/var/run/tailscale ephemeral — daemon Unix socket + mount as tmpfs + lost on reboot, recreated on start +``` + +Only the small, rarely-written state file touches flash; the socket dir is +tmpfs. The netmap is held in memory only — see +[Why netmap disk-caching is removed](#why-netmap-disk-caching-is-removed). + +## Flash wear protection + +Several measures are in place to avoid wearing out internal flash: + +- `clientupdate` omitted from binary — no background update downloads + ([why](#why-the-built-in-updater-is-removed)) +- `cachenetmap` omitted from binary — netmap is never written to disk, so the + frequent netmap updates cause no flash writes + ([why](#why-netmap-disk-caching-is-removed)) +- `logtail` omitted from binary — no log upload attempts +- `--no-logs-no-support` passed to daemon — suppresses any remaining log + buffering +- `/var/run/tailscale` socket on tmpfs — runtime files never reach flash +- Only `/var/lib/tailscale/tailscaled.state` touches persistent storage, + and it is written only when the node authenticates or rotates its key + +## Versioning & releases + +Released images are versioned as: + +``` +v-mt. +``` + +e.g. `v1.98.3-mt.1`. The two parts mean: + +- **`v`** — the bundled Tailscale version (the "what's + inside" identifier), taken from `ARG TAILSCALE_VERSION` in the Dockerfile. +- **`mt.`** — the local revision. It only changes on a *meaningful* release, + never on a build-system-only rebuild. + +### When a release happens + +| Trigger | Result | +|---|---| +| Renovate bumps `TAILSCALE_VERSION` (merged to `main`) | CI **auto-creates** git tag `v-mt.1` → image published | +| You make a meaningful fix/change on the current Tailscale version | **You** create the next tag manually (`v-mt.2`, `mt.3`, …) → image published | +| Dependency-only bump (Go / Alpine / busybox / Dockerfile syntax) | **No release.** Rides along with the next Tailscale bump or manual tag | + +So routers only ever see a new release for Tailscale bumps or your deliberate +fixes — build-system churn doesn't trigger updates. + +Each published image is stamped with `org.opencontainers.image.version` equal to +its full tag; this is the value the MikroTik update job compares against the +registry to decide whether to recreate the container. + +### How it's wired (Woodpecker) + +- **`.woodpecker/release-tag.yaml`** — on push to `main`, parses + `TAILSCALE_VERSION`; if no `v-mt.*` tag exists yet, creates and pushes + `v-mt.1` (using the Gitea token from OpenBao). It never creates `mt.2+`. +- **`.woodpecker/release.yaml`** — on a `v*-mt.*` tag push, builds the + multi-arch manifest (amd64 + arm64 + arm/v7) and pushes it to + `gitea.lumpiasty.xyz/lumpiasty/mikrotik-tailscale` as both `:` and + `:stable`. Registry creds come from OpenBao (`secret/container-registry`). + +To cut a release manually, see +[DEVELOPMENT.md → Cutting a manual release](DEVELOPMENT.md#cutting-a-manual-release). + +### How the router consumes releases + +The RouterOS update script (`routeros/update-tailscale.rsc`) compares the +`:stable` **manifest digest** against the digest from the last deploy: + +- It fetches the digest using an anonymous bearer token (the Gitea package is + public) — no credentials stored on the router. +- **Unchanged → does nothing** (no pull, no recreate, no flash wear). +- **Changed → recreates the container** from the new image, then records the + new digest. + +Because `:stable` only moves on a meaningful release, dependency-only rebuilds +never trigger an update on the router. Setup is in +[USAGE.md → step 7](USAGE.md#7-enable-automatic-updates). + +## Dependency pinning & automated updates + +All upstream dependencies are version-pinned for reproducible builds, fully +qualified (no floating `major.minor` tags): + +| Dependency | Where | Pinned form | +|---|---|---| +| Go toolchain | `Dockerfile` `FROM golang:…` | full version tag + `@sha256` digest | +| Alpine (busybox build base) | `Dockerfile` `FROM alpine:…` | full version tag + `@sha256` digest | +| Tailscale | `Dockerfile` `ARG TAILSCALE_VERSION` | full git release tag | +| busybox | `Dockerfile` `ARG BUSYBOX_VERSION` | full release version | +| Renovate / OpenBao | `.woodpecker/*.yaml` `image:` | full version tag | + +Updates are proposed automatically by [Renovate](https://docs.renovatebot.com/), +run **self-hosted** from a Woodpecker cron pipeline (Woodpecker has no native +Renovate support): + +- `renovate.json` — repository rules. All dependencies follow the latest + upstream releases (including major versions); each bump arrives as its own PR + that the multi-arch build validates before you merge. Base image tags also + get their `@sha256` digests refreshed via `pinDigests`. The one special rule: + - `tailscale` only follows **stable** releases — Tailscale uses even minor + versions for stable (`v1.98.x`) and odd for unstable (`v1.99.x`), so the + rule filters to even minors. +- `.woodpecker/renovate.yaml` — the scheduled job that runs `renovate/renovate` + against this repo. + +Validate the configs locally: + +```sh +# Renovate repo config +docker run --rm -e RENOVATE_CONFIG_TYPE=repo -v "$PWD":/work -w /work \ + --entrypoint renovate-config-validator renovate/renovate + +# Woodpecker pipeline +docker run --rm -v "$PWD":/work -w /work \ + woodpeckerci/woodpecker-cli:v3 lint .woodpecker/renovate.yaml +``` + +## References + +- [Tailscale: Smaller binaries for embedded devices](https://tailscale.com/docs/how-to/set-up-small-tailscale) +- [Renovate self-hosting](https://docs.renovatebot.com/getting-started/running/) +- [Woodpecker cron jobs](https://woodpecker-ci.org/docs/usage/cron) +- [MikroTik Container documentation](https://help.mikrotik.com/docs/display/ROS/Container) +- [Tailscale subnet routers](https://tailscale.com/kb/1019/subnets) +- [Tailscale exit nodes](https://tailscale.com/kb/1103/exit-nodes) diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md new file mode 100644 index 0000000..9dcd0e8 --- /dev/null +++ b/docs/DEVELOPMENT.md @@ -0,0 +1,159 @@ +# Development + +Building the image, testing it locally, bumping the Tailscale version, and +cutting releases. This is for working *on* this repo; if you just want to run +the published image on a router, see [USAGE.md](USAGE.md). + +For the reasoning behind the build choices, see [DESIGN.md](DESIGN.md). + +## Prerequisites + +- `docker` with `buildx`. +- For cross-arch builds, QEMU/binfmt emulators registered: + + ```sh + docker run --privileged --rm tonistiigi/binfmt --install arm64,arm + ``` + +The Go toolchain and busybox are built inside the image stages, so no local Go +install is needed. + +## Building + +### All architectures at once + +Use the helper script: + +```sh +# Build all arches and load into local docker +./build.sh + +# Build all arches and also export per-arch tarballs into ./dist/ +./build.sh --tar + +# Build a single arch +./build.sh arm64 +./build.sh --tar armv7 +``` + +### Manual single-arch build + +The architecture is selected via `buildx --platform`; the Dockerfile maps it to +the correct `GOARCH`/`GOARM` automatically: + +```sh +docker buildx build --platform linux/arm64 --load -t mikrotik-tailscale:arm64 . +docker buildx build --platform linux/arm/v7 --load -t mikrotik-tailscale:armv7 . +docker buildx build --platform linux/amd64 --load -t mikrotik-tailscale:amd64 . +``` + +To build for a different Tailscale version, add: + +```sh +--build-arg TAILSCALE_VERSION=v1.98.3 +``` + +### Notes + +- The Go builder cross-compiles natively (fast); only the busybox stage runs + under emulation for non-native targets. +- The build prints the resolved target and Go build tags, e.g.: + + ``` + Cross-compiling: GOOS=linux GOARCH=arm64 GOARM= + Build tags: ts_include_cli,ts_omit_ace,ts_omit_acme,... + ``` + +## Running (local test) + +Quick smoke test on a dev machine with Docker (this is *not* how it runs on a +router — see [USAGE.md](USAGE.md) for that): + +```sh +# Create a volume for persistent state +docker volume create tailscale-state + +# Start the daemon +docker run -d \ + --name tailscale \ + --cap-add NET_ADMIN \ + --cap-add NET_RAW \ + --device /dev/net/tun \ + --tmpfs /var/run/tailscale \ + -v tailscale-state:/var/lib/tailscale \ + mikrotik-tailscale + +# Authenticate (opens browser / prints auth URL) +docker exec tailscale tailscale login + +# Check status +docker exec tailscale tailscale status + +# Advertise a subnet +docker exec tailscale tailscale set --advertise-routes=192.168.88.0/24 + +# Advertise as exit node +docker exec tailscale tailscale set --advertise-exit-node +``` + +Subnet routes and exit node advertisement must also be approved in the +[Tailscale admin console](https://login.tailscale.com/admin/machines). + +For headless / unattended auth, use a reusable auth key from the admin console +(**Settings → Keys**): + +```sh +docker exec tailscale tailscale up \ + --authkey=tskey-auth- \ + --advertise-routes=192.168.88.0/24 \ + --advertise-exit-node +``` + +## Bumping the Tailscale version + +Version bumps (Tailscale, busybox, base image digests) are normally proposed +automatically via Renovate (see +[DESIGN.md → Dependency pinning](DESIGN.md#dependency-pinning--automated-updates)). +Merge the Renovate PR; a Tailscale bump then auto-publishes a new release. + +The feature allowlist in the Dockerfile carries forward automatically across +Tailscale versions — any new `ts_omit_*` tags introduced in a new release will +be omitted by default. + +To bump manually, edit `ARG TAILSCALE_VERSION` in the `Dockerfile` (so the pin +stays in version control) and rebuild: + +```sh +./build.sh --tar # rebuild all arches at the pinned version +# or, override at build time without editing the Dockerfile: +docker buildx build --platform linux/arm64 \ + --build-arg TAILSCALE_VERSION=v1.100.0 \ + --load -t mikrotik-tailscale:arm64 . +``` + +## Cutting a manual release + +A Tailscale bump auto-creates `v-mt.1` and publishes it. For a meaningful +fix/change on the *current* Tailscale version, tag the next `mt.N` by hand: + +```sh +# fix something, commit to main, then: +git tag -a v1.98.3-mt.2 -m "Fix X" +git push origin v1.98.3-mt.2 +``` + +The tag push triggers the build + multi-arch publish automatically. See +[DESIGN.md → Versioning & releases](DESIGN.md#versioning--releases) for the full +scheme and CI wiring. + +## Validating CI configs locally + +```sh +# Renovate repo config +docker run --rm -e RENOVATE_CONFIG_TYPE=repo -v "$PWD":/work -w /work \ + --entrypoint renovate-config-validator renovate/renovate + +# Woodpecker pipelines +docker run --rm -v "$PWD":/work -w /work \ + woodpeckerci/woodpecker-cli:v3 lint .woodpecker/renovate.yaml +``` diff --git a/docs/USAGE.md b/docs/USAGE.md new file mode 100644 index 0000000..bceb320 --- /dev/null +++ b/docs/USAGE.md @@ -0,0 +1,201 @@ +# Usage + +Deploying the published image on a MikroTik router and operating it: networking, +authentication, MagicDNS, and automatic updates. This uses the prebuilt image +from the registry — you don't need to build anything. + +To build the image yourself, see [DEVELOPMENT.md](DEVELOPMENT.md). For the +reasoning behind these choices, see [DESIGN.md](DESIGN.md). + +## Deploy on MikroTik (RouterOS) + +Verified on RouterOS 7.21.2 (arm64, CRS418). Commands are grouped into +copy-paste blocks; **only the values marked `CHANGE ME` need editing**. + +> Because the image has no built-in updater (the `clientupdate` feature is +> [intentionally compiled out](DESIGN.md#why-the-built-in-updater-is-removed)), +> updates are handled by a small script that only re-pulls when the published +> image actually changed — see [step 7](#7-enable-automatic-updates). + +### 0. Prerequisites + +- RouterOS 7.x with the **container** package installed. +- Container mode enabled (needs physical access — press reset / cold-boot when + prompted): + + ``` + /system/device-mode/update container=yes + ``` + +- A Tailscale **auth key** from the admin console + (**Settings → Keys**, reusable, optionally tagged). You'll use it in step 6. + +### 1. Networking (veth + bridge + NAT) + +Gives the container an internal IP and outbound internet via NAT. Pick a subnet +that doesn't clash with your LAN. + +``` +/interface/veth/add name=veth-tailscale address=172.20.0.2/24 gateway=172.20.0.1 +/interface/bridge/add name=containers +/ip/address/add address=172.20.0.1/24 interface=containers +/interface/bridge/port/add bridge=containers interface=veth-tailscale +/ip/firewall/nat/add chain=srcnat action=masquerade src-address=172.20.0.0/24 +``` + +### 2. Extraction scratch dir (tmpfs) + +Put the image extraction scratch dir on **tmpfs** (RAM) so the pull/extract +never writes to flash: + +``` +/disk/add type=tmpfs tmpfs-max-size=256M slot=tmp +/container/config/set tmpdir=tmp +``` + +> **No `registry-url` change needed.** This guide puts the full registry host in +> `remote-image` (step 5), and RouterOS pulls directly from that host — the +> global `registry-url` is ignored when the image reference includes a host. +> This is intentional: it leaves your existing `registry-url` untouched, so +> other containers (e.g. ones pulling from Docker Hub or ghcr.io) keep working, +> and multiple registries can be used side by side. + +### 3. Authentication note (no env needed) + +This image runs `tailscaled` directly and does **not** bundle Tailscale's +`containerboot` wrapper, so the `TS_AUTHKEY` environment variable is **not** +read automatically. You authenticate with `tailscale up --authkey=...` after the +container starts (step 6) — this keeps the image minimal and needs no env list. + +### 4. Persistent state mount (the only thing on flash) + +Only the tiny `tailscaled.state` (node identity / key) needs to persist. Mount +just that directory: + +``` +/container/mounts/add list=tailscale_state src=tailscale/state dst=/var/lib/tailscale +``` + +`src=tailscale/state` is on internal storage. This holds `tailscaled.state` +(and `derpmap.cached.json`), written only on auth / key rotation / prefs +change — **not** on every netmap update, because netmap disk-caching is omitted +([why](DESIGN.md#why-netmap-disk-caching-is-removed)). Flash wear is therefore +minimal. If you want *zero* persistent writes, point `src` at a tmpfs disk slot +instead and accept re-authentication after a reboot. + +### 5. Add and start the container + +``` +/container/add \ + remote-image=gitea.lumpiasty.xyz/lumpiasty/mikrotik-tailscale:stable \ + interface=veth-tailscale \ + root-dir=tailscale/root \ + mountlists=tailscale_state \ + logging=yes \ + start-on-boot=yes \ + name=tailscale +``` + +Wait for the pull/extract to finish (`status=stopped`), then start it: + +``` +/container/print ;# wait until status=stopped +/container/start [find where name=tailscale] +/log/print where message~"tailscale" +``` + +The daemon is now running but **not yet authenticated**. + +### 6. Authenticate + +Enter the container shell and bring Tailscale up with your auth key. You can set +subnet routes / exit-node advertisement in the same command: + +``` +/container/shell [find where name=tailscale] +# inside the container — CHANGE ME: your key (and adjust routes/subnet): +tailscale up --authkey=tskey-auth-CHANGEME \ + --advertise-routes=192.168.88.0/24 \ + --advertise-exit-node +exit +``` + +The node now appears in your Tailscale admin console. Approve the advertised +routes / exit node there. Because the auth state is written to the persisted +`tailscaled.state`, you only do this once — it survives reboots and updates. + +### 7. Enable automatic updates + +First, edit the `CONFIG` block at the top of `routeros/update-tailscale.rsc` if +you changed any names in the steps above. The defaults match this guide +(`name=tailscale`, `root-dir=tailscale/root`, `mountlists=tailscale_state`, +`interface=veth-tailscale`). + +Copy the file to the router (Winbox **Files** drag-and-drop, or SFTP), then +create a **named script** from it and schedule it: + +``` +# Create the named script from the uploaded file's contents. +# (Do NOT use `/import` — that just runs the file once and does not create a +# reusable script for the scheduler to call.) +/system/script/add name=update-tailscale source=[/file/get update-tailscale.rsc contents] + +# Run it daily. +/system/scheduler/add name=update-tailscale interval=1d \ + on-event="/system/script/run update-tailscale" \ + comment="Check for mikrotik-tailscale image updates" +``` + +If you later upload a changed version of the file, refresh the script: + +``` +/system/script/set update-tailscale source=[/file/get update-tailscale.rsc contents] +``` + +What it does on each run: + +1. Reads the current `:stable` manifest digest from the registry (anonymous — + the package is public). +2. Compares it to the digest stored from the last deploy. +3. **Unchanged → does nothing** (no pull, no flash writes). +4. **Changed → recreates the container** from the new image and records the new + digest. + +Since `:stable` only moves on a meaningful release, the router never re-pulls +for build-system-only changes — see +[DESIGN.md → Versioning & releases](DESIGN.md#versioning--releases). + +> The digest fetch/compare logic is verified against the registry; the RouterOS +> container/file API calls (marked in the script) should be smoke-tested once on +> your device, since those idioms vary slightly by RouterOS version. + +## MagicDNS + +To use MagicDNS name resolution, configure MikroTik's DNS to forward `.ts.net` +queries to Tailscale's magic DNS resolver: + +``` +/ip dns static +add name="ts.net" type=FWD forward-to=100.100.100.100 match-subdomain=yes +``` + +This avoids writing to `/etc/resolv.conf` inside the container (which would +happen if `--accept-dns` is passed to `tailscale up`). The container resolves +Tailscale node names; the rest of the router uses its own DNS. + +## Updating + +You don't normally do anything: when a new release is published, the +auto-update script ([step 7](#7-enable-automatic-updates)) detects the changed +`:stable` image on its next scheduled run and recreates the container. Your +node identity and settings persist across the update via the state mount. + +To force an immediate check instead of waiting for the schedule: + +``` +/system/script/run update-tailscale +``` + +To pin a specific version instead of tracking `:stable`, set `remote-image` (and +the script's `imageRef`) to an immutable tag like +`...mikrotik-tailscale:v1.98.3-mt.1`.