Refactor to make some real design docs

2025-10-05 14:35:52 +00:00 · 2020-03-05 18:39:24 -08:00 · 2020-03-05 18:39:24 -08:00 · ffe8bf2be6
commit ffe8bf2be6
parent ab73033e51
8 changed files with 209 additions and 118 deletions
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@ -46,6 +46,7 @@
    },
    "cSpell.words": [
        "Meshtastic",
-        "descs"
+        "descs",
+        "protobufs"
    ]
 }
--- a/docs/mesh-proto.md
+++ b/docs/mesh-proto.md
@ -0,0 +1,20 @@
+TODO:
+* reread the radiohead mesh implementation
+* read about general mesh flooding solutions
+* reread the disaster radio protocol docs
+
+good description of batman protocol: https://www.open-mesh.org/projects/open-mesh/wiki/BATMANConcept
+
+interesting paper on lora mesh: https://portal.research.lu.se/portal/files/45735775/paper.pdf
+It seems like  DSR might be the algorithm used by RadioheadMesh.  DSR is described in https://tools.ietf.org/html/rfc4728
+https://en.wikipedia.org/wiki/Dynamic_Source_Routing
+
+broadcast solution:
+Use naive flooding at first (FIXME - do some math for a 20 node, 3 hop mesh.  A single flood will require a max of 20 messages sent)
+Then move to MPR later (http://www.olsr.org/docs/report_html/node28.html).  Use altitude and location as heursitics in selecting the MPR set
+
+compare to db sync algorithm?
+
+what about never flooding gps broadcasts.  instead only have them go one hop in the common case, but if any node X is looking at the position of Y on their gui, then send a unicast to Y asking for position update.  Y replies.
+
+If Y were to die, at least the neighbor nodes of Y would have their last known position of Y.
--- a/docs/software/bluetooth-api.md
+++ b/docs/software/bluetooth-api.md
@ -0,0 +1,111 @@
+# Bluetooth API
+
+The Bluetooth API is design to have only a few characteristics and most polymorphism comes from the flexible set of Google Protocol Buffers which are sent over the wire.  We use protocol buffers extensively both for the bluetooth API and for packets inside the mesh or when providing packets to other applications on the phone.
+
+## A note on MTU sizes
+
+This device will work with any MTU size, but it is highly recommended that you call your phone's "setMTU function to increase MTU to 512 bytes" as soon as you connect to a service.  This will dramatically improve performance when reading/writing packets.
+
+## MeshBluetoothService 
+
+This is the main bluetooth service for the device and provides the API your app should use to get information about the mesh, send packets or provision the radio.  
+
+For a reference implementation of a client that uses this service see [RadioInterfaceService](https://github.com/meshtastic/Meshtastic-Android/blob/master/app/src/main/java/com/geeksville/mesh/service/RadioInterfaceService.kt).  Typical flow when 
+a phone connects to the device should be the following:
+
+* SetMTU size to 512
+* Read a RadioConfig from "radio" - used to get the channel and radio settings
+* Read (and write if incorrect) a User from "user" - to get the username for this node
+* Read a MyNodeInfo from "mynode" to get information about this local device
+* Write an empty record to "nodeinfo" to restart the nodeinfo reading state machine
+* Read from "nodeinfo" until it returns empty to build the phone's copy of the current NodeDB for the mesh
+* Read from "fromradio" until it returns empty to get any messages that arrived for this node while the phone was away
+* Subscribe to notify on "fromnum" to get notified whenever the device has a new received packet
+* Read that new packet from "fromradio"
+* Whenever the phone has a packet to send write to "toradio"
+
+For definitions (and documentation) of FromRadio, ToRadio, MyNodeInfo, NodeInfo and User  protocol buffers see [mesh.proto](https://github.com/meshtastic/Meshtastic-protobufs/blob/master/mesh.proto)
+
+UUID for the service: 6ba1b218-15a8-461f-9fa8-5dcae273eafd
+
+Each characteristic is listed as follows:
+
+UUID
+Properties
+Description (including human readable name)
+
+8ba2bcc2-ee02-4a55-a531-c525c5e454d5
+read
+fromradio - contains a newly received FromRadio packet destined towards the phone (up to MAXPACKET bytes per packet).
+After reading the esp32 will put the next packet in this mailbox.  If the FIFO is empty it will put an empty packet in this
+mailbox.
+
+f75c76d2-129e-4dad-a1dd-7866124401e7
+write
+toradio - write ToRadio protobufs to this characteristic to send them (up to MAXPACKET len)
+
+ed9da18c-a800-4f66-a670-aa7547e34453
+read,notify,write
+fromnum - the current packet # in the message waiting inside fromradio, if the phone sees this notify it should read messages
+until it catches up with this number.
+
+The phone can write to this register to go backwards up to FIXME packets, to handle the rare case of a fromradio packet was dropped after the esp32 callback was called, but before it arrives at the phone.  If the phone writes to this register the esp32 will discard older packets and put the next packet >= fromnum in fromradio.
+When the esp32 advances fromnum, it will delay doing the notify by 100ms, in the hopes that the notify will never actally need to be sent if the phone is already pulling from fromradio.
+
+Note: that if the phone ever sees this number decrease, it means the esp32 has rebooted.
+
+ea9f3f82-8dc4-4733-9452-1f6da28892a2
+read
+mynode - read this to access a MyNodeInfo protobuf
+
+d31e02e0-c8ab-4d3f-9cc9-0b8466bdabe8
+read, write
+nodeinfo - read this to get a series of NodeInfos (ending with a null empty record), write to this to restart the read statemachine that returns all the node infos
+
+b56786c8-839a-44a1-b98e-a1724c4a0262
+read,write
+radio - read/write this to access a RadioConfig protobuf
+
+6ff1d8b6-e2de-41e3-8c0b-8fa384f64eb6
+read,write
+owner - read/write this to access a User protobuf
+
+Re: queue management
+Not all messages are kept in the fromradio queue (filtered based on SubPacket):
+* only the most recent Position and User messages for a particular node are kept
+* all Data SubPackets are kept
+* No WantNodeNum / DenyNodeNum messages are kept
+A variable keepAllPackets, if set to true will suppress this behavior and instead keep everything for forwarding to the phone (for debugging)
+
+
+## Other bluetooth services
+
+This document focuses on the core mesh service, but it is worth noting that the following other Bluetooth services are also
+provided by the device.
+
+### BluetoothSoftwareUpdate
+
+The software update service.  For a sample function that performs a software update using this API see [startUpdate](https://github.com/meshtastic/Meshtastic-Android/blob/master/app/src/main/java/com/geeksville/mesh/service/SoftwareUpdateService.kt).
+
+SoftwareUpdateService UUID cb0b9a0b-a84c-4c0d-bdbb-442e3144ee30
+
+Characteristics
+
+| UUID                                 | properties       | description|
+|--------------------------------------|------------------|------------|
+
+| e74dd9c0-a301-4a6f-95a1-f0e1dbea8e1e | write,read       | total image size, 32 bit, write this first, then read read back to see if it was acceptable (0 mean not accepted) |
+| e272ebac-d463-4b98-bc84-5cc1a39ee517 | write            | data, variable sized, recommended 512 bytes, write one for each block of file |
+| 4826129c-c22a-43a3-b066-ce8f0d5bacc6 | write            | crc32, write last - writing this will complete the OTA operation, now you can read result |
+| 5e134862-7411-4424-ac4a-210937432c77 | read,notify      | result code, readable but will notify when the OTA operation completes |
+| GATT_UUID_SW_VERSION_STR/0x2a28 | read | We also implement these standard GATT entries because SW update probably needs them: |
+| GATT_UUID_MANU_NAME/0x2a29 | read | |
+| GATT_UUID_HW_VERSION_STR/0x2a27 | read | |
+
+### DeviceInformationService
+
+Implements the standard BLE contract for this service (has software version, hardware model, serial number, etc...)
+
+### BatteryLevelService
+
+Implements the standard BLE contract service, provides battery level in a way that most client devices should automatically understand (i.e. it should show in the bluetooth devices screen automatically)
--- a/docs/software/mesh-alg.md
+++ b/docs/software/mesh-alg.md
@ -0,0 +1,66 @@
+# Mesh broadcast algorithm
+
+FIXME - instead look for standard solutions.  this approach seems really suboptimal, because too many nodes will try to rebroast.  If
+all else fails could always use the stock Radiohead solution - though super inefficient.
+
+TODO:
+* reread the radiohead mesh implementation
+* read about general mesh flooding solutions
+* reread the disaster radio protocol docs
+
+good description of batman protocol: https://www.open-mesh.org/projects/open-mesh/wiki/BATMANConcept
+
+interesting paper on lora mesh: https://portal.research.lu.se/portal/files/45735775/paper.pdf
+It seems like  DSR might be the algorithm used by RadioheadMesh.  DSR is described in https://tools.ietf.org/html/rfc4728
+https://en.wikipedia.org/wiki/Dynamic_Source_Routing
+
+broadcast solution:
+Use naive flooding at first (FIXME - do some math for a 20 node, 3 hop mesh.  A single flood will require a max of 20 messages sent)
+Then move to MPR later (http://www.olsr.org/docs/report_html/node28.html).  Use altitude and location as heursitics in selecting the MPR set
+
+compare to db sync algorithm?
+
+what about never flooding gps broadcasts.  instead only have them go one hop in the common case, but if any node X is looking at the position of Y on their gui, then send a unicast to Y asking for position update.  Y replies.
+
+If Y were to die, at least the neighbor nodes of Y would have their last known position of Y.
+
+## approach 1
+
+* send all broadcasts with a TTL
+* periodically(?) do a survey to find the max TTL that is needed to fully cover the current network.
+* to do a study first send a broadcast (maybe our current initial user announcement?) with TTL set to one (so therefore no one will rebroadcast our request)
+* survey replies are sent unicast back to us (and intervening nodes will need to keep the route table that they have built up based on past packets)
+* count the number of replies to this TTL 1 attempt.  That is the number of nodes we can reach without any rebroadcasts
+* repeat the study with a TTL of 2 and then 3.  stop once the # of replies stops going up.
+* it is important for any node to do listen before talk to prevent stomping on other rebroadcasters...
+* For these little networks I bet a max TTL would never be higher than 3?
+
+## approach 2
+
+* send a TTL1 broadcast, the replies let us build a list of the nodes (stored as a bitvector?) that we can see (and their rssis)
+* we then broadcast out that bitvector (also TTL1) asking "can any of ya'll (even indirectly) see anyone else?"
+* if a node can see someone I missed (and they are the best person to see that node), they reply (unidirectionally) with the missing nodes and their rssis (other nodes might sniff (and update their db) based on this reply but they don't have to)
+* given that the max number of nodes in this mesh will be like 20 (for normal cases), I bet globally updating this db of "nodenums and who has the best rssi for packets from that node" would be useful
+* once the global DB is shared, when a node wants to broadcast, it just sends out its broadcast . the first level receivers then make a decision "am I the best to rebroadcast to someone who likely missed this packet?" if so, rebroadcast
+
+## approach 3
+
+* when a node X wants to know other nodes positions, it broadcasts its position with want_replies=true.  Then each of the nodes that received that request broadcast their replies (possibly by using special timeslots?)
+* all nodes constantly update their local db based on replies they witnessed.
+* after 10s (or whatever) if node Y notices that it didn't hear a reply from node Z (that Y has heard from recently ) to that initial request, that means Z never heard the request from X.  Node Y will reply to X on Z's behalf.
+* could this work for more than one hop?  Is more than one hop needed?  Could it work for sending messages (i.e. for a msg sent to Z with want-reply set). 
+
+## approach 4
+
+look into the literature for this idea specifically.
+
+* don't view it as a mesh protocol as much as a "distributed db unification problem".  When nodes talk to nearby nodes they work together
+to update their nodedbs.  Each nodedb would have a last change date and any new changes that only one node has would get passed to the 
+other node.  This would nicely allow distant nodes to propogate their position to all other nodes (eventually).
+* handle group messages the same way, there would be a table of messages and time of creation.
+* when a node has a new position or message to send out, it does a broadcast.  All the adjacent nodes update their db instantly (this handles 90% of messages I'll bet).  
+* Occasionally a node might broadcast saying "anyone have anything newer than time X?"  If someone does, they send the diffs since that date.
+* essentially everything in this variant becomes broadcasts of "request db updates for >time X - for _all_ or for a particular nodenum" and nodes sending (either due to request or because they changed state) "here's a set of db updates".  Every node is constantly trying to
+build the most recent version of reality, and if some nodes are too far, then nodes closer in will eventually forward their changes to the distributed db.
+* construct non ambigious rules for who broadcasts to request db updates.  ideally the algorithm should nicely realize node X can see most other nodes, so they should just listen to all those nodes and minimize the # of broadcasts. the distributed picture of nodes rssi could be useful here?
+* possibly view the BLE protocol to the radio the same way - just a process of reconverging the node/msgdb database.
--- a/docs/software/power.md
+++ b/docs/software/power.md
@ -1,6 +1,6 @@
-This is a mini design doc for various core behaviors...
+# Power Management State Machine

-# Rules for sleep
+i.e. sleep behavior

 ## States

@ -79,49 +79,3 @@ General ideas to hit the power draws our spreadsheet predicts.  Do the easy ones
 * see section 7.3 of https://cdn.sparkfun.com/assets/learn_tutorials/8/0/4/RFM95_96_97_98W.pdf and have hope radio wake only when a valid packet is received.  Possibly even wake the ESP32 from deep sleep via GPIO.
 * never enter deep sleep while connected to USB power (but still go to other low power modes)
 * when main cpu is idle (in loop), turn cpu clock rate down and/or activate special sleep modes.  We want almost everything shutdown until it gets an interrupt.
-
-# Mesh broadcast algoritm
-
-FIXME - instead look for standard solutions.  this approach seems really suboptimal, because too many nodes will try to rebroast.  If
-all else fails could always use the stock radiohead solution - though super inefficent.
-
-## approach 1
-
-* send all broadcasts with a TTL
-* periodically(?) do a survey to find the max TTL that is needed to fully cover the current network.
-* to do a study first send a broadcast (maybe our current initial user announcement?) with TTL set to one (so therefore no one will rebroadcast our request)
-* survey replies are sent unicast back to us (and intervening nodes will need to keep the route table that they have built up based on past packets)
-* count the number of replies to this TTL 1 attempt.  That is the number of nodes we can reach without any rebroadcasts
-* repeat the study with a TTL of 2 and then 3.  stop once the # of replies stops going up.
-* it is important for any node to do listen before talk to prevent stomping on other rebroadcasters...
-* For these little networks I bet a max TTL would never be higher than 3?
-
-## approach 2
-
-* send a TTL1 broadcast, the replies let us build a list of the nodes (stored as a bitvector?) that we can see (and their rssis)
-* we then broadcast out that bitvector (also TTL1) asking "can any of ya'll (even indirectly) see anyone else?"
-* if a node can see someone I missed (and they are the best person to see that node), they reply (unidirectionally) with the missing nodes and their rssis (other nodes might sniff (and update their db) based on this reply but they don't have to)
-* given that the max number of nodes in this mesh will be like 20 (for normal cases), I bet globally updating this db of "nodenums and who has the best rssi for packets from that node" would be useful
-* once the global DB is shared, when a node wants to broadcast, it just sends out its broadcast . the first level receivers then make a decision "am I the best to rebroadcast to someone who likely missed this packet?" if so, rebroadcast
-
-## approach 3
-
-* when a node X wants to know other nodes positions, it broadcasts its position with want_replies=true.  Then each of the nodes that received that request broadcast their replies (possibly by using special timeslots?)
-* all nodes constantly update their local db based on replies they witnessed.
-* after 10s (or whatever) if node Y notices that it didn't hear a reply from node Z (that Y has heard from recently ) to that initial request, that means Z never heard the request from X.  Node Y will reply to X on Z's behalf.
-* could this work for more than one hop?  Is more than one hop needed?  Could it work for sending messages (i.e. for a msg sent to Z with want-reply set). 
-
-## approach 4
-
-look into the literature for this idea specifically.
-
-* don't view it as a mesh protocol as much as a "distributed db unification problem".  When nodes talk to nearby nodes they work together
-to update their nodedbs.  Each nodedb would have a last change date and any new changes that only one node has would get passed to the 
-other node.  This would nicely allow distant nodes to propogate their position to all other nodes (eventually).
-* handle group messages the same way, there would be a table of messages and time of creation.
-* when a node has a new position or message to send out, it does a broadcast.  All the adjacent nodes update their db instantly (this handles 90% of messages I'll bet).  
-* Occasionally a node might broadcast saying "anyone have anything newer than time X?"  If someone does, they send the diffs since that date.
-* essentially everything in this variant becomes broadcasts of "request db updates for >time X - for _all_ or for a particular nodenum" and nodes sending (either due to request or because they changed state) "here's a set of db updates".  Every node is constantly trying to
-build the most recent version of reality, and if some nodes are too far, then nodes closer in will eventually forward their changes to the distributed db.
-* construct non ambigious rules for who broadcasts to request db updates.  ideally the algorithm should nicely realize node X can see most other nodes, so they should just listen to all those nodes and minimize the # of broadcasts. the distributed picture of nodes rssi could be useful here?
-* possibly view the BLE protocol to the radio the same way - just a process of reconverging the node/msgdb database.
--- a/docs/software/sw-design.md
+++ b/docs/software/sw-design.md
@ -0,0 +1,6 @@
+This is a mini design doc for various core behaviors...
+
+* [Power Management](power.md)
+* [Mesh algorithm](mesh-alg.md)
+* [Bluetooth API](bluetooth-api.md) and porting guide for new clients (iOS, python, etc...)
+* TODO: how to port the device code to a new device.
--- a/lib/BluetoothOTA/src/BluetoothSoftwareUpdate.cpp
+++ b/lib/BluetoothOTA/src/BluetoothSoftwareUpdate.cpp
@ -117,21 +117,7 @@ void bluetoothRebootCheck()
 }

 /*
-SoftwareUpdateService UUID cb0b9a0b-a84c-4c0d-bdbb-442e3144ee30
-
-Characteristics
-
-UUID                                 properties          description
-e74dd9c0-a301-4a6f-95a1-f0e1dbea8e1e write|read          total image size, 32 bit, write this first, then read read back to see if it was acceptable (0 mean not accepted)
-e272ebac-d463-4b98-bc84-5cc1a39ee517 write               data, variable sized, recommended 512 bytes, write one for each block of file
-4826129c-c22a-43a3-b066-ce8f0d5bacc6 write               crc32, write last - writing this will complete the OTA operation, now you can read result
-5e134862-7411-4424-ac4a-210937432c77 read|notify         result code, readable but will notify when the OTA operation completes
-
-We also implement the following standard GATT entries because SW update probably needs them:
-
-ESP_GATT_UUID_SW_VERSION_STR/0x2a28
-ESP_GATT_UUID_MANU_NAME/0x2a29
-ESP_GATT_UUID_HW_VERSION_STR/0x2a27
+See bluetooth-api.md

 */
 BLEService *createUpdateService(BLEServer *server, std::string hwVendor, std::string swVersion, std::string hwVersion)
--- a/src/MeshBluetoothService.cpp
+++ b/src/MeshBluetoothService.cpp
@ -238,60 +238,7 @@ void bluetoothNotifyFromNum(uint32_t newValue)
 BLEService *meshService;

 /*
-MeshBluetoothService UUID 6ba1b218-15a8-461f-9fa8-5dcae273eafd
-
-FIXME - notify vs indication for fromradio output.  Using notify for now, not sure if that is best
-FIXME - in the esp32 mesh managment code, occasionally mirror the current net db to flash, so that if we reboot we still have a good guess of users who are out there.
-FIXME - make sure this protocol is guaranteed robust and won't drop packets
-
-"According to the BLE specification the notification length can be max ATT_MTU - 3. The 3 bytes subtracted is the 3-byte header(OP-code (operation, 1 byte) and the attribute handle (2 bytes)).
-In BLE 4.1 the ATT_MTU is 23 bytes (20 bytes for payload), but in BLE 4.2 the ATT_MTU can be negotiated up to 247 bytes."
-
-MAXPACKET is 256? look into what the lora lib uses. FIXME
-
-Characteristics:
-UUID                                 
-properties          
-description
-
-8ba2bcc2-ee02-4a55-a531-c525c5e454d5                                 
-read                
-fromradio - contains a newly received packet destined towards the phone (up to MAXPACKET bytes? per packet).
-After reading the esp32 will put the next packet in this mailbox.  If the FIFO is empty it will put an empty packet in this
-mailbox.
-
-f75c76d2-129e-4dad-a1dd-7866124401e7                             
-write               
-toradio - write ToRadio protobufs to this charstic to send them (up to MAXPACKET len)
-
-ed9da18c-a800-4f66-a670-aa7547e34453                                  
-read|notify|write         
-fromnum - the current packet # in the message waiting inside fromradio, if the phone sees this notify it should read messages
-until it catches up with this number.
-  The phone can write to this register to go backwards up to FIXME packets, to handle the rare case of a fromradio packet was dropped after the esp32 
-callback was called, but before it arrives at the phone.  If the phone writes to this register the esp32 will discard older packets and put the next packet >= fromnum in fromradio.
-When the esp32 advances fromnum, it will delay doing the notify by 100ms, in the hopes that the notify will never actally need to be sent if the phone is already pulling from fromradio.
-  Note: that if the phone ever sees this number decrease, it means the esp32 has rebooted.
-
-meshMyNodeCharacteristic("ea9f3f82-8dc4-4733-9452-1f6da28892a2", BLECharacteristic::PROPERTY_READ)
-mynode - read this to access a MyNodeInfo protobuf
-
-meshNodeInfoCharacteristic("d31e02e0-c8ab-4d3f-9cc9-0b8466bdabe8", BLECharacteristic::PROPERTY_WRITE | BLECharacteristic::PROPERTY_READ),
-nodeinfo - read this to get a series of node infos (ending with a null empty record), write to this to restart the read statemachine that returns all the node infos
-
-meshRadioCharacteristic("b56786c8-839a-44a1-b98e-a1724c4a0262", BLECharacteristic::PROPERTY_WRITE | BLECharacteristic::PROPERTY_READ),
-radio - read/write this to access a RadioConfig protobuf
-
-meshOwnerCharacteristic("6ff1d8b6-e2de-41e3-8c0b-8fa384f64eb6", BLECharacteristic::PROPERTY_WRITE | BLECharacteristic::PROPERTY_READ)
-owner - read/write this to access a User protobuf
-
-Re: queue management
-Not all messages are kept in the fromradio queue (filtered based on SubPacket):
-* only the most recent Position and User messages for a particular node are kept
-* all Data SubPackets are kept
-* No WantNodeNum / DenyNodeNum messages are kept
-A variable keepAllPackets, if set to true will suppress this behavior and instead keep everything for forwarding to the phone (for debugging)
-
+See bluetooth-api.md for documentation.
 */
 BLEService *createMeshBluetoothService(BLEServer *server)
 {