* Implement rotating JSONL recorder for persistent logging * Fixes * Update documentation and clean up imports in command files * Address remaining recorder review feedback Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/2541773c-869a-463f-9fae-8505272c06ff Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * recorder: fix lock re-entry deadlock on start() and force_rotate_all() The previous "Fixes" commit added `_files_snapshot()` which acquires `self._lock` so handlers don't race with `stop()` clearing `_files`. But two callers were already holding `self._lock` when they invoked methods that go through the snapshot: - `start()` writes the `recorder_start` event from inside its `with self._lock:` block. `_write_event` -> `_files_snapshot` re-acquires the same non-reentrant `threading.Lock`, freezing process startup. - `force_rotate_all()` calls `self.status()` (which also acquires `self._lock`) while still holding the lock from rotating each file. Both fixes release the lock before the call. The recorder_start marker still lands in events.jsonl because the started/started_at flags are already set when we write it. Verified end-to-end against the standalone /tmp/verify_pr_fixes.py harness — all 9 PR review-comment fixes pass, including pause/resume event ordering and concurrent start/stop without KeyError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix markdown linting issues in leakhunt.md and repro.md * Handle recorder startup and query review fixes Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Tighten recorder follow-up tests Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Stabilize recorder startup tests Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Remove brittle recorder startup test Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Polish recorder follow-up errors Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Refine recorder startup and regex errors Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Clean up recorder follow-up nits Agent-Logs-Url: https://github.com/meshtastic/firmware/sessions/78540a9f-fe62-4350-b252-0ae5621f0b8a Co-authored-by: thebentern <9000580+thebentern@users.noreply.github.com> * Trunk --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.4 KiB
description, argument-hint
| description | argument-hint | |
|---|---|---|
| Produce a device health report using the meshtastic MCP tools (device_info, list_nodes, get_config, short serial log capture) |
|
/diagnose — device health report
Call the meshtastic MCP tool bundle and format a structured health report for one or all detected devices. Zero guesswork for the operator.
What to do
-
Enumerate hardware. Call
mcp__meshtastic__list_devices(include_unknown=True). For each entry wherelikely_meshtastic=True, captureport,vid,pid,description. -
Filter by
$ARGUMENTS:- No args,
all→ every likely-meshtastic device. nrf52→ only devices withvid == 0x239a.esp32s3→ only devices withvid == 0x303aorvid == 0x10c4.- A
/dev/cu.*path → only that one port. - Anything else → treat as a substring match against the
portstring.
- No args,
-
For each selected device, in sequence (NOT parallel — SerialInterface holds an exclusive port lock):
mcp__meshtastic__device_info(port=<p>)— capturesmy_node_num,long_name,short_name,firmware_version,hw_model,region,num_nodes,primary_channel.mcp__meshtastic__list_nodes(port=<p>)— count of peers, which ones havepublicKeyset, SNR/RSSI distribution.mcp__meshtastic__get_config(section="lora", port=<p>)— region, preset, channel_num, tx_power, hop_limit.- Optionally, if the device seems unhappy (fails to connect,
num_nodes==1when ≥2 are plugged in, missing firmware*version), open a short firmware log window:mcp__meshtastic__serial_open(port=<p>, env=<inferred-env>), wait 3s,serial_read(session_id=<s>, max_lines=100),serial_close(session_id=<s>). The env should be inferred from the VID map inmcp-server/run-tests.sh(nrf52 → rak4631, esp32s3 → heltec-v3) unlessMESHTASTIC_MCP_ENV*<ROLE>is set.
-
Hub health (call once, not per-device):
mcp__meshtastic__uhubctl_list()— enumerates every USB hub the host can see. Note which hubs advertiseppps=trueand which hub hosts each Meshtastic device (cross-reference by VID). Flag it in the report if:- No hub advertises PPPS →
tests/recovery/can't run on this setup; hard-recovery viauhubctl_cycleisn't available. - A Meshtastic device is on a non-PPPS hub → note it; operator may want to move the device to a PPPS hub to unlock auto-recovery.
uhubctl_listraisesConfigError: uhubctl not found→ just sayuhubctl not installedin the report; don't treat as a fault.
- No hub advertises PPPS →
-
Render per-device report as:
[nrf52 @ /dev/cu.usbmodem1101] fw=2.7.23.bce2825, hw=RAK4631 owner : Meshtastic 40eb / 40eb region/band : US, channel 88, LONG_FAST tx_power : 30 dBm, hop_limit=3 peers : 1 (esp32s3 0x433c2428, pubkey ✓, SNR 6.0 / RSSI -24 dBm) primary ch : McpTest hub : 1-1.3 port 2 (PPPS, uhubctl-controllable) firmware : no panics in last 3s; NodeInfoModule emitted 2 broadcastsKeep it scannable. If a field is missing or abnormal (no pubkey for a known peer, region=UNSET, num_nodes inconsistent with the hub, device on non-PPPS hub), flag it inline with a short
⚠︎ <one-line reason>. -
Cross-device correlation (only when >1 device is inspected):
- Do both sides see each other in
nodesByNum? If one does and the other doesn't, that's asymmetric NodeInfo — flag it. - Do the LoRa configs match? (region, channel_num, modem_preset should all agree; mismatch = no mesh)
- Do the primary channel NAMES match? Mismatch = different PSK = no decode.
- Do both sides see each other in
-
Recorder slice (cheap, always available). The mcp-server runs an autouse log recorder that's been collecting from every connected device. Pull two short slices to surface anything weird that's already happened:
mcp__meshtastic__logs_window(start="-2m", level="WARN|ERROR|CRIT", max_lines=20)— recent firmware errors. If empty, say "no recent errors"; don't manufacture concern.mcp__meshtastic__telemetry_timeline(window="1h", field="free_heap", max_points=60)— heap trend. Ifslope_per_min < -50, flag it and recommend/leakhunt window=6hfor a deeper read; otherwise just note the current free heap.- If
recorder_statusshowsrunning:falseorfiles.telemetry.last_tsis null, note "recorder has no telemetry yet — enableset_debug_log_api(True)to populate" and skip this step gracefully.
-
Suggest next actions only for specific, recognisable failure modes:
- Stale PKI pubkey one-way → "run
/test tests/mesh/test_direct_with_ack.py— the retry + nodeinfo-ping heals this in the test path." - Region mismatch → "re-bake one side via
./mcp-server/run-tests.sh --force-bake." - Device unreachable, reachable via DFU →
touch_1200bps(port=...)+pio_flash. If not even DFU responds AND the device is on a PPPS hub, escalate touhubctl_cycle(role=..., confirm=True). - CP2102-wedged-driver on macOS → see the note in
run-tests.sh. - Heap slope strongly negative → "run
/leakhunt window=6hfor a full timeline + classification."
- Stale PKI pubkey one-way → "run
What NOT to do
- No writes. No
set_config, noreboot, nofactory_reset. This is a read-only diagnostic skill — if the operator wants to change state, they'll ask explicitly. - No
flash/erase_and_flash. Those are separate escalations. - No holding SerialInterface across tool calls — open, query, close; next device. The port lock is exclusive.