Files
klaster/docs/wwan-bm806c-qmi-workaround.md
T
Lumpiasty 5b026593ce
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
lte failover
2026-05-27 23:40:33 +02:00

32 KiB
Raw Blame History

LTE failover (BroadMobi BM806C / D-Link DWR-921 C1) — QMI data-plane workaround

Last verified: 2026-05-27, OpenWrt 25.12.2 r32802-f505120278, netifd 2026.02.26~cbb83a18-r1.

TL;DR

The embedded BroadMobi BM806C modem in the D-Link DWR-921 has three independent bugs in its firmware (M1.2.0_E1.0.1_A1.1.8, the only build that has ever shipped), all of which must be worked around for a usable LTE uplink:

  1. Cold-boot UIM wedge. On every cold boot, the modem's UIM (SIM) QMI service comes up permanently broken: --uim-get-sim-state returns {}, --get-imsi returns "UIM uninitialized", and AT+CPIN? returns +CME ERROR: SIM busy. The modem never recovers on its own (verified at uptime 21 min). A single USB re-enumeration (`echo 0 > /sys/.../1-1/authorized; sleep 3; echo 1

    ...) forces the modem to redo its internal QMI init from scratch, after which UIM comes up within ~1 s. The wwan-bringupservice installed by this role does the re-enumeration unconditionally on boot, then callsifup wwan. Full investigation: /root/wwan-diag/boot-wedge-investigation.md` on the router.

  2. qmi.sh requests 802.3 framing from the modem. The BM806C's 802.3 firmware path is buggy on this generation of Qualcomm silicon; raw-ip framing works correctly. The same kernel maintainer who added raw-ip support to qmi_wwan documents "buggy 802.3 firmware implementation" as a known issue for the MDM9x25 family this modem is built on.

  3. qmi.sh calls uqmi --start-network --apn <foo> to bring up the bearer. On BM806C this triggers a known firmware bug (OpenWrt FS#1363) that establishes a phantom bearer: kernel and modem agree there is a session, IP addresses are assigned, --get-data-status returns "connected" — but the bearer is not bound to a real PDN at the GGSN, so packets are blackholed. Invoking --start-network --profile <N> against a pre-configured NVRAM profile with the same APN works perfectly.

Bug 1 is the boot-time wedge; without the workaround wwan simply never comes up after a reboot. Bugs 2 and 3 are about the data plane itself; without their workarounds, wwan comes up but no traffic flows. Our role addresses all three: it installs wwan-bringup (re-enumerates the USB device once on boot, then ifup wwan), patches qmi.sh in two places (raw-ip + a kernel -EBUSY fix), creates a second NVRAM profile in the modem for the IPv6 APN, and adds option profile/option v6profile to the UCI wwan interface so qmi.sh uses the working code path. After all three workarounds, cold boot to working dual-stack IPv4+IPv6 LTE uplink completes in ~2:303:30 — verified end-to-end at HTTPS layer to multiple upstreams.

Symptoms

Boot-wedge symptoms (bug 1)

When the modem boots into the UIM-wedged state, all of the following hold simultaneously:

  • /dev/cdc-wdm0 exists, wwan0 netdev exists, qmi_wwan driver is bound to 1-1:1.4 — kernel side looks fine
  • ifup wwan runs forever in the SIM-init loop: wwan: SIM in illegal state - Power-cycling SIM repeating every ~8 s
  • uqmi -d /dev/cdc-wdm0 --uim-get-sim-state returns {} (empty body — no card_application_state field at all)
  • uqmi -d /dev/cdc-wdm0 --get-imsi returns the QMI string "UIM uninitialized"
  • uqmi -d /dev/cdc-wdm0 --get-pin-status returns "Invalid arguments given" (uqmi cannot allocate a UIM client because the modem-side service has not registered)
  • AT side: AT+CFUN? returns +CFUN: 1 (modem firmware is alive), AT+CPIN? returns +CME ERROR: SIM busy, and AT+CREG? / AT+CEREG? / AT+COPS? all return bare ERROR
  • This persists indefinitely; we measured no recovery at uptime 21 min

Data-plane symptoms (bugs 2 and 3)

When the modem comes up cleanly but the qmi.sh patches are missing or the wrong --start-network invocation is used, all of the following are true at the same time:

  • ifup wwan succeeds, ifstatus wwan reports "up": true
  • wwan0 has a valid CG-NAT IPv4 (10.x.x.x/30) and IPv6 (2a00:f41:.../128 for Orange Poland)
  • uqmi --get-data-status returns "connected"
  • ip route shows default routes via wwan0
  • tcpdump -i wwan0 shows outbound TCP SYNs leaving normally with the wwan source IP
  • No reply ever comes back: RX bytes on wwan0 stays near zero while TX bytes climbs with each connection attempt
  • ping -I wwan0 to any destination shows 100% loss
  • curl --interface wwan0 times out on every TCP connect
  • After a while, +CEER on an AT port shows Regular deactivation or EMM detached (the network gives up on the broken session and drops it)

If your symptoms include Network registration failed, registration timeout reached instead of the silent "everything looks fine but no data flows", you are probably hitting OpenWrt forum issue: BM806U-E1/DWR-921 C3 which is the same root cause manifesting on a slightly different firmware revision. The fix is the same.

What the issue is NOT

We ruled all of these out during diagnosis. If you're tempted by any of them, read the corresponding "how we confirmed" section before going down that path.

Not a signal/RF problem

We initially had RSRP around -113 dBm and SNR around 0 dB and suspected weak coverage. Adding external antennas brought RSRP to -94 dBm and SNR to +15..+17 dB — well into the "good LTE" range — and the data-plane bug remained unchanged. Both poor-signal and good-signal sessions showed identical TX-only behaviour.

Not a SIM / subscription / APN-name problem

The same SIM card was confirmed to work in a different LTE router on the same Orange Poland subscription. The modem also registered correctly (+CEREG: 0,1, +COPS: 0,0,"Orange",7), and AT+CGCONTRDP showed valid IPs being assigned by the GGSN. APN strings internet (IPv4v6) and internetipv6 (IPv6) are Orange Poland's documented APNs.

Not a SIM-PIN / SIM-power / EMM detach problem

PIN is disabled and verified disabled (+CPIN: READY). EMM detaches we observed in +CEER were consequences of the broken bearer, not the cause: a session with no return traffic eventually gets torn down by the network.

Not a firewall / NAT / masquerade problem

We tested with the wwan firewall zone in every combination (REJECT/ ACCEPT, with and without masquerade, with and without explicit forwarding rules) and the symptom was identical. Inspection of the nftables byte counters showed packets leaving wwan0 reaching the forward chain on egress; the problem is that no packets ever arrive in the other direction. The firewall could not be the cause — nothing was inbound to be filtered.

Not an ARP/NDP / asymmetric-routing problem

Initial captures showed unanswered ARP requests for the cellular gateway on wwan0, which is a known issue with qmi_wwan in 802.3 mode (the kernel does ARP on what is really a point-to-point cellular link; the gateway never answers because there is no L2). We installed permanent neighbour entries to bypass ARP entirely — traffic still failed. Switching to raw-ip mode (where the kernel sets NOARP on the interface and ARP is never attempted) fixed the ARP weirdness but did NOT fix the data-plane problem. Both fixes are needed and they are independent.

Not an MBIM-vs-QMI problem

The BM806C does not expose an MBIM USB composition. Switching protocols isn't an option without re-flashing the modem firmware, which has no public images.

Not a modem-firmware-update problem

M1.2.0_E1.0.1_A1.1.8 is the only BM806C firmware that has ever shipped. BroadMobi (Shanghai Mobile) only releases firmware to OEM partners; D-Link's last DWR-921 router firmware (1.01.3.006 Generic, no date) bundles the same modem image. Extracting and re-flashing it would change nothing.

Not "QMI is fundamentally broken on this modem"

This was our working hypothesis for a long time. The decisive counter-test was running PPP over /dev/ttyUSB2 with ATD*99***1# while QMI was idle: data flowed instantly, HTTPS in 0.7 s, ping 25 ms, 0% loss. Same SIM, same cell, same antennas, same APN — just a different host-side dial-up mechanism. That proved the modem, the RAN, and the operator were all fine. Whatever was breaking QMI had to live in the QMI control path itself (uqmi / qmi.sh / qmi_wwan) and/or in how the modem handles specific QMI message shapes.

The forum thread and FS#1363 then nailed it down to --start-network --apn.

How we confirmed it IS the QMI control-path bug

The minimal repro is just two uqmi invocations:

DEV=/dev/cdc-wdm0

# Configure profile 1 in the modem's NVRAM with the v4 APN.
uqmi -d $DEV --modify-profile "3gpp,1" --apn internet --pdp-type ipv4v6

# Switch to raw-ip framing (the other fix).
uqmi -d $DEV --wda-set-data-format raw-ip
ip link set wwan0 down
echo Y > /sys/class/net/wwan0/qmi/raw_ip
ip link set wwan0 up

# Start the bearer.  --profile 1 instead of --apn internet.
cid=$(uqmi -d $DEV --get-client-id wds)
uqmi -d $DEV --set-client-id wds,$cid --set-ip-family ipv4 > /dev/null
uqmi -d $DEV --set-client-id wds,$cid --start-network --profile 1

Followed by manual addressing/routing of wwan0 from --get-current-settings, this just workscurl -4 --interface wwan0 https://1.1.1.1/ returns 301 in under a second, RX bytes climb.

If you replace --start-network --profile 1 with --start-network --apn internet (everything else identical), the bearer comes up, addresses are assigned, --get-data-status says "connected", and no downlink traffic ever arrives. This is the smoking-gun isolation of the firmware bug.

Are you affected?

You are affected if all of these hold:

  1. Your modem reports Manufacturer: BroadMobi, Model: BM806C (or BM806U), Revision: M1.2.0_E1.0.1_A1.1.8. Check via any AT port: printf 'ATI\r' | picocom -qrx 3000 /dev/ttyUSB2.
  2. Your USB IDs are 2020:2033. Check /sys/bus/usb/devices/<port>/idVendor / idProduct. On the C1 hardware revision the modem cold-boots directly into 2020:2033 QMI composite mode — no usb-modeswitch involved (there is no 2020:2033 entry in /etc/usb-mode.json on our build). Other hardware revisions may go through an EDL 05c6:90082020:2033 modeswitch first.
  3. qmi.sh (/lib/netifd/proto/qmi.sh) is the unmodified upstream netifd handler. Grep for --wda-set-data-format 802.3 — if present, you have the unpatched script.

The quick functional test is the minimal repro above: if you can get data flowing with --start-network --profile 1 but not with --start-network --apn internet, you have this bug.

Involved components & versions

Component Version
Router D-Link DWR-921 C1 (dlink,dwr-921-c1)
SoC MediaTek MT7620N ver:2 eco:6
OpenWrt 25.12.2 (r32802-f505120278)
Kernel Linux 6.12.74
netifd 2026.02.26~cbb83a18-r1
uqmi 2025.07.30~7914da43-r2
libqmi / qmi-utils 1.36.0-r1
luci-proto-qmi 26.133.20346~e9ebca7
qmi_wwan kernel driver backports from Linux v6.18.7 (per dmesg)
LTE modem BroadMobi BM806C (Qualcomm MDM9225)
Modem firmware M1.2.0_E1.0.1_A1.1.8
Modem USB id (data mode) 2020:2033 (cold-boots directly into this)
Modem USB id (EDL mode) 05c6:9008 (not observed on C1; may apply to other revs)
Mobile network Orange Poland (MCC 260 / MNC 03)
APN (IPv4 / dual-stack) internet (auth: PAP, user/pass internet/internet)
APN (IPv6) internetipv6 (same auth)

References

  • OpenWrt forum thread (same model, same symptoms): https://forum.openwrt.org/t/problem-with-bm806u-e1-dwr-921-c3/130094
  • OpenWrt issue #6295 / FS#1363 — "QMI does not use correct APN": https://github.com/openwrt/openwrt/issues/6295
  • Kernel commit "net: qmi_wwan: support 'raw IP' mode" (Bjørn Mork): documents the 802.3-firmware-is-buggy reality across this generation. Search the mainline kernel for QMI_WWAN_FLAG_RAWIP.
  • Kernel commit "net: qmi_wwan: add BroadMobi BM806U 2020:2033" (Pawel Dembicki, 2018, 6cb2669cb97f): adds the qmi_wwan entry for our exact USB id 2020:2033 as QMI_FIXED_INTF(0x2020, 0x2033, 4) with no quirks. The BM806C and BM806U share the device id and qmi_wwan driver path. The entry has not been touched in mainline through v6.18.7 (what OpenWrt 25.12.2 ships via backports).
  • libqmi maintainer Aleksander Morgado on cdc-wdm port readiness timing (libqmi-devel, Sep 2021): https://lists.freedesktop.org/archives/libqmi-devel/2021-September/003695.html — explains that cdc-wdm appearing in /dev is not a guarantee that the modem-side QMI service is operational. ModemManager uses up to 45 s of warmup tolerance; we measured this modem firmware needs ~5 min before CTL is even responsive, and UIM never converges without a USB re-enumeration.
  • CastixGitHub/re_wwan (https://github.com/CastixGitHub/re_wwan): another BM806C user, identical firmware build, identical recovery pattern (rmmod qmi_wwan; insmod qmi_wwan to recover from a hung modem; AT-side AT+CFUN= resets reported as not working). Useful independent confirmation that the right primitive is module reload / USB re-enumeration, not a soft reset.
  • D-Link DWR-921 support page (firmware images, region-specific): hardware revision C3 on the Polish site lists firmware 1.01.3.006 Generic, 1.00B07 T-Mobile, 1.00B06 Plus/Cyfrowy Polsat Rev C3 — all of which bundle the same modem firmware build.

Limitations

Can this be configured via LuCI / UCI alone?

Partly. The UCI side of the workaround is fully achievable through LuCI or uci set:

uci set network.wwan.profile='1'
uci set network.wwan.v6profile='2'
uci commit network

luci-proto-qmi already exposes profile and v6profile as fields in the LTE wizard. The wwan interface config alone, however, is not sufficientqmi.sh and the modem NVRAM both need attention before ifup wwan will work end-to-end. Specifically:

  • The qmi.sh patches (raw-ip + ip link down/up around the sysfs write) are filesystem edits that survive package upgrades only if re-applied. They cannot be expressed as UCI.
  • Creating modem-NVRAM profile 2 with internetipv6 is a one-shot QMI call (uqmi --create-profile 3gpp ...). It is not part of OpenWrt's configuration model; the profile lives in the modem itself.

So in practice: configurable via UCI/LuCI as far as the router is concerned, but the fixed router config will only do anything once the manual modem profile creation and qmi.sh patches are in place.

auto '0' on the wwan interface

We intentionally keep option auto '0': the wwan interface does not auto-start at boot. This is a deliberate failover-only setup — uplink (the wired VLAN to the MikroTik) is the primary path, and a human (or future failover script, e.g. mwan3) decides when to bring up wwan.

This also sidesteps a fragile boot ordering question: on cold boot the modem's UIM (SIM) QMI service comes up permanently broken and never recovers without an explicit USB re-enumeration (echo 0/1 > /sys/bus/usb/devices/1-1/authorized). Other QMI services (CTL, NAS, WDS) do come up after ~5 min of warmup, but UIM does not — verified at uptime 21 min with no intervention. The wwan-bringup service handles the re-enumeration on boot and then calls ifup wwan itself; netifd never has to deal with the wedge directly. See /root/wwan-diag/boot-wedge-investigation.md on the router for the full root-cause analysis (2026-05-27).

IPv6 is via a second NVRAM profile, not a single dual-stack PDP

Orange Poland uses two distinct APN strings (internet for v4, internetipv6 for v6). The BM806C firmware lets us configure profile 1 as IPV4V6 with internet, but the IPv6 leg of that profile cannot be made to use the dedicated internetipv6 APN. Our config uses two independent profiles (profile 1 = IPv4 from internet, profile 2 = IPv6 from internetipv6) and qmi.sh happily fires both --start-network --profile 1 and --start-network --profile 2 in sequence (one per address family).

qmi.sh patches survive package upgrades only if re-applied

/lib/netifd/proto/qmi.sh is owned by the netifd package. When netifd is upgraded, the file is replaced. Our patches are not listed in /etc/sysupgrade.conf and would not normally be preserved across a sysupgrade-style image flash either. The Ansible role re-applies them idempotently on every play; outside Ansible, you would need a wrapper (e.g. a postinst hook or a manual re-patch step in your upgrade runbook).

No automatic failover yet

Bringing wwan up requires explicit ifup wwan. There is no monitor that detects loss of uplink and switches over. mwan3 is the obvious candidate.

Implementation (manual, no Ansible)

Everything below assumes you have already SSH'd into the OpenWrt router as root, the modem is enumerated as /dev/cdc-wdm0 / wwan0, and uqmi / picocom are installed.

Step 1 — patch qmi.sh

Three single-line edits to /lib/netifd/proto/qmi.sh. Around line 233:

# Before
uqmi -s -d "$device" -t 1000 --set-data-format 802.3 > /dev/null 2>&1
uqmi -s -d "$device" -t 1000 --wda-set-data-format 802.3 > /dev/null 2>&1
...
echo "Y" > /sys/class/net/$ifname/qmi/raw_ip

# After
uqmi -s -d "$device" -t 1000 --set-data-format raw-ip > /dev/null 2>&1
uqmi -s -d "$device" -t 1000 --wda-set-data-format raw-ip > /dev/null 2>&1
...
ip link set $ifname down; echo "Y" > /sys/class/net/$ifname/qmi/raw_ip; ip link set $ifname up

The third edit is essential: writing Y to the raw_ip sysfs node fails with EBUSY ("Cannot change a running device") if wwan0 is up at the moment of the write. The kernel only lets you change the link-layer protocol while the interface is down. Without this bracket the patched script logs sh: write error: Resource busy, the kernel driver stays in Ethernet mode, and we are back to broken ARP/NDP.

In-place via sed:

sed -i 's|--set-data-format 802\.3|--set-data-format raw-ip|;
        s|--wda-set-data-format 802\.3|--wda-set-data-format raw-ip|;
        s|^\(\s*\)echo "Y" > /sys/class/net/$ifname/qmi/raw_ip$|\1ip link set $ifname down; echo "Y" > /sys/class/net/$ifname/qmi/raw_ip; ip link set $ifname up|' \
       /lib/netifd/proto/qmi.sh

Step 2 — create modem-NVRAM profile 2 for the IPv6 APN

Profile 1 is managed by qmi.sh itself (it calls --modify-profile "3gpp,1" with the UCI apn value on every ifup). Profile 2 has to be bootstrapped once, then it persists in modem NVRAM:

uqmi -d /dev/cdc-wdm0 --create-profile 3gpp --apn internetipv6 --pdp-type ipv6
# returns {"created-profile": 2}

# Verify
uqmi -d /dev/cdc-wdm0 --get-profile-settings 3gpp,2
# {"apn":"internetipv6","pdp-type":"ipv6", ...}

If profile 2 already exists with wrong settings, use --modify-profile instead:

uqmi -d /dev/cdc-wdm0 --modify-profile 3gpp,2 --apn internetipv6 --pdp-type ipv6

Step 3 — UCI config for the wwan interface

uci batch <<'EOF'
set network.wwan=interface
set network.wwan.device='/dev/cdc-wdm0'
set network.wwan.proto='qmi'
set network.wwan.apn='internet'
set network.wwan.v6apn='internetipv6'
set network.wwan.profile='1'
set network.wwan.v6profile='2'
set network.wwan.auth='pap'
set network.wwan.username='internet'
set network.wwan.password='internet'
set network.wwan.pdptype='ipv4v6'
set network.wwan.dhcp='0'
set network.wwan.dhcpv6='0'
set network.wwan.metric='100'
set network.wwan.auto='0'
EOF
uci commit network

apn and v6apn are still set even though profile / v6profile take precedence on the --start-network call: qmi.sh uses apn when it runs --modify-profile 3gpp,1 --apn $apn --pdp-type $profile_pdptype near the top of proto_qmi_setup, before --start-network. Without it, qmi.sh would re-write profile 1 with an empty APN on every ifup. v6apn is not strictly used by qmi.sh in the current code path (the --start-network --profile 2 invocation ignores --apn $v6apn), but is kept for clarity and so an operator reading the config sees what APN profile 2 is supposed to point at.

dhcp '0' / dhcpv6 '0' tell qmi.sh to apply the IP addresses itself (via proto_add_ipv4_address / proto_add_ipv6_address from uqmi --get-current-settings) instead of spawning udhcpc / odhcp6c on wwan0. The modem hands out the addresses through QMI; running DHCP on a point-to-point cellular link would fail anyway.

metric '100' keeps uplink (metric 0) preferred as the default route when both are up.

Step 4 — test

ifup wwan
sleep 10
ifstatus wwan | head -20
uqmi -d /dev/cdc-wdm0 --get-data-status         # "connected"
cat /sys/class/net/wwan0/qmi/raw_ip             # Y
ip -d link show wwan0 | head -2                 # POINTOPOINT,NOARP, link/none
ip addr show wwan0

Then, with uplink taken down or the wwan route preferred, verify real traffic:

curl -4 --interface wwan0 -sS -o /dev/null -w "%{http_code}\n" https://1.1.1.1/
curl -6 --interface wwan0 -sS -o /dev/null -w "%{http_code}\n" https://[2606:4700:4700::1111]/

Both should return 301 within ~1 second. ip -s link show wwan0 should show RX bytes climbing.

Step 5 — teardown / cleanup

ifdown wwan

That's it. The modem-NVRAM profiles persist across reboots and even across usb-modeswitch cycles, so step 2 only ever needs to be run once per physical SIM/modem.

These accompany the wwan fix in the same time frame; they aren't part of the wwan workaround per se but were made in the same series of work and are worth pointing at if you're trying to retrace this end-to-end.

  • VLAN 6 ("uplink") on the MikroTik CRS418 and on the OpenWrt AP: a tagged-only VLAN over ether3/WAN that carries the AP's wired uplink to the MikroTik. IPv4 192.168.6.0/24, IPv6 2001:470:61a3:600::/64 (point-to-point, no SLAAC, static ::1 and ::2). The AP's "uplink" netifd interface is dual-stack on eth0.6. wwan failover is to this uplink, not the LAN.
  • Management policy-routing on the AP. The management interface mgmt (192.168.255.11/24 on eth0.1) is reached through MikroTik from a non-directly-connected subnet, so replies from arbitrary src-subnets would have followed the default route out eth0.6 and been blackholed by the MikroTik. We have two policy-routing rules (priority 500 for same-subnet → main table, priority 1000 for any other → table 100) and a config route in table 100 sending 0.0.0.0/0 back via the MikroTik. None of this interacts with wwan directly but it's mentioned so anyone reading network.yml does not trip over the rules wondering whose problem they are.
  • community.openwrt.apk module migration. OpenWrt 25.12+ uses apk instead of opkg, and the upstream collection's apk module is only in community.openwrt git main at the time of writing. We pin to git+main in ansible/requirements.yml until a release ships it.
  • Manually-installed packages folded back into openwrt_packages: usb-modeswitch (drives the modem out of EDL 05c6:9008 into QMI 2020:2033 at boot) and luci-proto-qmi.

Future work

In rough priority order:

  1. Upstream a fix to qmi.sh that does the ip link down/up bracket around the raw_ip sysfs write. This is a strict bug in the upstream script: as written, the write fails with EBUSY whenever the modem actually wants raw-ip, which is precisely the case qmi.sh claims to handle. Likely a 3-line patch. This is the easiest, least controversial upstream contribution.
  2. Upstream a fix or knob for the BM806C-style firmware quirk. The cleanest path is probably an OpenWrt-level UCI option prefer_raw_ip (default off) on the qmi proto, similar to how mbim.sh is constructed. We don't want to change the default framing for all qmi devices — newer Qualcomm modems advertise 802.3 correctly and qmi.sh's readback logic does the right thing for them. A per-device opt-in keeps the existing autodetect intact.
  3. Document/upstream the --profile workaround for FS#1363. The bug is 7+ years old and still hits real users. The right cleanup is probably to make qmi.sh prefer --profile $N whenever profile modification has just succeeded, falling back to --apn $apn only if no profile was written. This is a behavioural change and would need a discussion thread / PR description that walks the reviewer through the modem-firmware history.
  4. Replace the qmi.sh patches in our Ansible role with a wrapper that does not edit qmi.sh directly. Options:
    • A custom proto qmi-bm806c that sources the original qmi.sh, overrides only proto_qmi_setup, and registers under a separate name. UCI would switch option proto 'qmi''qmi-bm806c'. Clean but harder to debug because there is now an extra layer of indirection.
    • A hotplug script in /etc/hotplug.d/iface/ that intercepts pre-ifup events on wwan, sets WDA + sysfs raw-ip beforehand, and trusts the modem's 802.3 readback to fail naturally so qmi.sh never writes the sysfs node. Untested. Likely flaky.
    • The current "patch the file, reapply via Ansible" approach is the simplest and most direct. It is fine as long as the role is the source of truth.
  5. Periodic session keepalive / reconnect on detach. Now that boot bring-up is fast and reliable (~2:303:30 from cold boot to wwan up), the next likely failure mode is the modem getting deactivated by the network (+CEER: Regular deactivation) after long idle periods. A simple procd service that polls uqmi --get-data-status and triggers ifup wwan on transition connected → disconnected would close this gap. Don't pre-emptively add it; wait until you have evidence the problem occurs in practice with the workaround in place. If the disconnect comes with UIM going bad (same wedge signature as cold boot), the keepalive needs to call wwan-bringup (which re-authorizes the USB device) rather than ifup wwan directly.
  6. Implement actual failover. mwan3 is the conventional choice. Alternatively a tiny shell loop that pings a target via uplink and triggers ifup wwan / ifdown wwan on transitions. Either way the wwan side of the work is done; the failover orchestration is a separate problem.
  7. Investigate mbim mode. The BM806C does not currently expose MBIM, but the modem chipset (MDM9225) supports it at the silicon level. Whether there exists a magic AT command, vendor QMI message, or firmware composition switch to enable MBIM is unknown — the AT command set we explored (AT^USBMODE, AT^SETPORT, AT+QCFG, AT+BMSWITCH, AT$QCPDPP, etc.) all returned ERROR. If MBIM could be enabled, qmi.sh becomes irrelevant and the upstream mbim.sh proto might just work. Significant payoff if it pans out; research-heavy if it doesn't.
  8. Periodic re-test on OpenWrt upgrades. When OpenWrt's netifd gets a new release, re-check the qmi.sh patches still apply cleanly. Our role uses regex-based lineinfile, so it tolerates the surrounding code drifting somewhat, but if upstream restructures the data-format block significantly we'd need to revisit.

Things worth noting if anyone picks this up again

  • qmi.sh's upstream "set IP format" block runs --set-data-format first (against the kernel/qmi_wwan) and --wda-set-data-format second (against the modem). Both must agree. We patch both.
  • The readback --wda-get-data-format call is what qmi.sh uses to decide whether to write Y to sysfs. Our patches make this return "raw-ip", which makes the existing branch fire — we don't add a branch, we just nudge the existing logic into the path that already exists for "device only supports raw-ip" modems.
  • The kernel qmi_wwan sysfs node /sys/class/net/wwan0/qmi/raw_ip toggles the kernel-side framing. The QMI WDA call toggles the modem-side framing. They are independent. Both must agree, or the kernel will parse bytes that came in as raw-IP as if they were Ethernet frames (or vice versa). The result, depending on which side is wrong, ranges from "all packets dropped silently in qmi_wwan_rx_fixup" to "kernel ARPs at a phantom MAC".
  • uqmi --modify-profile 3gpp,1 does work on this modem — both the JSON --get-profile-settings 3gpp,1 and the AT-side AT+CGDCONT? reflect the new value immediately. The bug is specifically with the --start-network --apn TLV, not with profile management.
  • uqmi --create-profile 3gpp returns the new profile index in {"created-profile": N}. It auto-allocates the next free slot, so in a fresh modem you'll get 2, but on an already-configured modem you might get 3 or higher. Always read the return value rather than assuming 2. (Our Ansible task hardcodes 2 but checks --get-profile-list first to skip creation if 2 already exists.)
  • +CEER: Regular deactivation and +CEER: EMM detached are last error codes; they persist until the modem clears them. Reading them tells you the last failure, not necessarily the current state. Always cross-reference with +CEREG? and +CGACT? to know if you are presently attached.
  • uqmi -t 5000 -d /dev/cdc-wdm0 --get-serving-system returns "Failed to connect to service" (or "Unknown error") for the first ~5 minutes after cold boot. CTL/NAS/WDS do eventually come up (we measured --get-versions first OK at uptime 320 s, serving-system at 376 s), but they flap in and out for several more minutes. UIM never comes up on cold boot without a USB re-enumeration--uim-get-sim-state keeps returning {} and --get-imsi keeps returning "UIM uninitialized" even at uptime 21 minutes. This is why the wwan-bringup worker now does an unconditional authorized=0/1 re-enumeration immediately after the modem enumerates; it is not waiting for warmup, it is forcing the modem to redo its init from scratch.
  • A reliable cold-boot vs. wedged-modem discriminator from AT side: AT+CPIN? returning +CME ERROR: SIM busy while AT+CFUN? returns +CFUN: 1 means the modem firmware is alive but UIM is stuck. If this persists past uptime 5 minutes the modem will not recover on its own; re-authorize the USB port.
  • The diagnostic scripts we accumulated live on the router at /root/wwan-diag/ (created during debugging; not part of the Ansible role). The most useful ones are at.sh (run AT commands through picocom), ppp-test.sh (PPP-via-AT as a control test that bypasses QMI), qmi-dual-profile.sh (manual reproduction of the working --profile-based dual-stack flow), and boot-capture.sh (instrumented per-service probe that maps the cold-boot wedge timeline; every probe wrapped in /usr/bin/timeout so it cannot hang). The full root-cause writeup for the boot wedge is at /root/wwan-diag/boot-wedge-investigation.md. Feel free to delete the older scripts once this is stable; they are not load-bearing.

Acknowledgements

gotgot04 on the OpenWrt forum did the original triage of FS#1363 against this exact device (DWR-921 C3 / BM806U-E1), and the comment trail on that thread saved us probably another day of guessing.