Just got the latest news about the Continua Alliance and their final choice for a standardized wireless protocol. The results were…mixed? It turns out that Continua created two categories: that they judged the protocols on: LAN and PAN. For the LAN, Zigbee was chosen, which was pretty much a given since the other competitors either were not an open standard or didn’t have multi-hop capabilities. For the PAN, Bluetooth Low Energy was chosen which was a bit of a surprise. The reason I was surprised is that the final Bluetooth Low Energy spec will not be available until January of next year according to the roadmap, and the main killer, in my eyes, is that it won’t be able to leverage the existing Bluetooth Health and Fitness Device profile. This would mean that they need to develop a new spec, a medical device profile based on the new spec, and the compliance and compatibility testing specs before the first BLE-based Continua device can even hit the market. If anyone is wondering why it took Zigbee five years to make it this far, there's your answer. Standards committees are just not known for moving at lightning speeds which is why you can hear all the rusted gears groaning as Obama whips the US smart grid initiative to create a set of open standards.

After some thought and a bit of wine, I’ve started to think this might have been the best decision they could have made, although probably not deliberately. I think that from the beginning, they should have come out with an upper layer spec with generic hooks for defining the transport. After that, the market could decide which transport technology would prevail. Instead, they tried to standardize on one complete protocol and ended up hosting a carnival of competing wireless sensor network lackeys, each spouting marketing poop about the superiority of their protocols. The whole process seemed to be flawed from the beginning, where they had the demonstration in Barcelona and each protocol was allowed to score themselves on how well they did. Uhhh...can you say kindergarten? Of course everyone gave themselves high marks, with Bluetooth Low Energy scoring a perfect for their protocol, even though no protocol is available.

Continua then had the general vote in which Zigbee won both LAN and PAN. Again I was surprised that Zigbee won the PAN. Hell, with such a weak group of PAN competitors, I think nobody should have won the PAN.

Anyways, it ended up coming down to the Continua Technical Working Group vote which I believe was held on May 18th. Since the technical working group vote held the final say, the other two votes, the self-scoring one in Barcelona and the general vote, were basically meaningless. I think by this time, the technical group realized the farce that was going on and got cold feet about committing to one protocol. There was a lot of risk involved if only BLE was chosen because any delay in the spec development would also delay Continua. There was also risk involved if Zigbee won, since there is still a chance that Zigbee, or any other protocol for that matter, might not gain enough popularity to overcome the critical mass required to be ubiquitous.

So finally, if I may be so bold as to assert my mind-reading abilities on the Continua Technical Group, they felt that choosing any one protocol entailed too much risk and decided to make a decision that contained no commitment, ie: no risk. I would have done the same, or perhaps even put on my Fred Flintstone Water Buffaloes hat and declared it all a big practical joke, where the field was open to anyone all along.

In my opinion, choosing two protocols really means that no protocols were chosen. Although it looks like BLE is claiming victory (uhhh...by association with Bluetooth?), they have a mountain of problems they need to deal with themselves, the first of which is that they need a spec. And of course, it’s still up in the air about whether they’ll get the volume cell phone design-ins that they are predicting. See sidebar which is actually located at the bottom.

So now, we've gone full circle and we’re back at the original starting point. Continua-based devices and wireless medical monitoring will finally end up being decided by the market. Hopefully, this issue can be put to rest and Continua, Zigbee, and BLE can get back to doing what they are supposed to do…pump out those specs…

Let me sidetrack a bit and drop some historical knowledge…The USB-IF came up with a spec about 6-7 years ago called USB On-The-Go which allowed low-power, peer-to-peer communications between cell phones via a slightly watered down version of USB. The spec development and working group was mainly sponsored by Qualcomm, but had many big name manufacturers involved. They predicted that having dual-mode chips, where the ICs would support either USB or USB On-The-Go would inherit the ubiquity of USB. And of course the main selling point was that all companies would use hardware that supported dual-mode chips since the prices would be roughly the same as regular USB chips. Of course, purchasing agents for the manufacturers didn’t see it that way, because they know what chip costs are all about. If a chip could support dual-mode and be the same price as a regular USB chip, then that means the USB chip was overpriced since the dual mode chip consumed more die area. Hence, they used that as a leverage point to drive down the cost of regular USB chips. The engineers also balked at needing to have two separate stacks inside the devices to support both protocols, and also the fact that they’d need to pay extra to license the second stack. And there you have it…the dual-mode, ubiquity-inheriting business model went down in flames. Now if you read the BLE propaganda , they are touting exactly the same thing. If you replace Qualcomm with Nokia, it looks like almost an exact replay of what happened previously. It's like watching bad sitcom reruns on the USA channel...not counting MacGyver.

I thought I would spend this portion of the series discussing some of the more detailed parts of the data path. Since the tree and mesh routing is already explained in other articles, I’d like to talk about the other two forwarding methods: broadcasting and the neighbor table.

For those new to the Zigbee spec and trying to implement the broadcast functionality, it can be pretty confusing. At least it was for me when I had to figure it out so hopefully this can help clear some of the haze. Broadcasting plays an important role in Zigbee and is used for many functions. Two of the most prominent are route discovery and group transmissions. Route discovery is the process of locating a path to a destination address whose route is unknown. Zigbee uses a modified form of AODV (Ad-hoc On-demand Distance Vector) which is just fancy terminology for “flood the network with pings until you hit the destination address”. The flooding part occurs by broadcasting route requests and have them propagate through the network until the destination is reached. Group transmissions are a method of transmitting data to all devices within a certain group. A broadcast is used to transmit the data and the frame will be discarded by any members that don’t belong to the group. Along with those functions, there are numerous other smaller functions that utilize broadcasts in both the ZDO (Zigbee Device Object or endpoint 0 on all Zigbee devices) and the ZCL (Zigbee Cluster Library).

To understand broadcasts, it might make sense to discuss some of the different device types a bit. There are three types of Zigbee devices: the coordinator, routers, and end devices. The coordinator is just a router that starts the network. It always has a network address of 0 and mainly performs the function of scanning the network and selecting the channel and ID for the network. A router is a device that has the capability to forward frames and usually, is able to accept child devices. An unfortunate attribute of routers and coordinators in Zigbee is that they’re unable to sleep which is a standard complaint among many people that are investigating using Zigbee for wireless sensor networking. This limitation means that Zigbee routers usually need to be attached to a MAINs power supply.

An end device has no resources to forward frames and can only join and communicate with a parent router. The simplified communication capabilities allow most of the MAC, NWK, and APS management functions to be stripped out and should result in a very small memory footprint. Sleepy end devices are able to be duty-cycled where they sleep most of the time and awaken periodically to poll its parent for any buffered messages. It uses 802.15.4 indirect transmission for the polling, which is discussed in more detail in my 802.15.4 series. Duty cycling the end device allows it to consume very little power, thus increasing the battery life which is one of the most important factors in wireless sensor networking.

So anyways, back to broadcasting. The reason I discussed the device types is because you can target your broadcasts based on the device type. There are four broadcast addresses that can be used depending on your broadcast audience:

Address Audience
0xFFFF
 All Devices
0xFFFD
 All Devices with Receiver on Permanently
0xFFFC
 Routers and Coordinators
0xFFFB
 Low Power Routers

Just a note, although low power routers are specified, I haven’t heard of any actual implementations of them yet. Feel free to correct me on this.

Transmitting a network broadcast frame in Zigbee actually sets off a chain of events. If a new broadcast is received, either from another device or from a higher layer, a broadcast transaction record is created. If the frame was received from another device, a copy of the frame is also made and sent up to the next layer for processing.

The broadcast transaction record is used to track the source address and sequence number of the broadcast. These two pieces of information are used to uniquely identify a broadcast frame. This is important because once the broadcast frame is forwarded, all neighbors within earshot will re-send the broadcast frame and you’ll get multiple copies of it. As long as you have the broadcast transaction record, you’ll know that you’ve already received and processed the frame so you can discard the copies.

Broadcast Transaction Record Entry:

FieldDescription
Src Address
 16-bit Network Address of Broadcast Initiator
Sequence Number  Network Layer Frame Sequence Number
Expiration Time
 Amount of Time Before This Entry Expires


The record that was created actually goes into a table called the Broadcast Transaction Table, or BTT. The BTT implements what’s called a passive acknowledgement system, and is used to ensure that all known neighbors have received the broadcast sent by the device. As I mentioned previously, when a broadcast is transmitted, all devices that receive it will broadcast a copy. Each time a copy of the broadcast arrives, the address of the sender will be added to the BTT to mark that it has relayed the broadcast. After a broadcast timeout, if all neighbors haven’t relayed the broadcast, meaning they aren’t present in the BTT, then the original sender will need to do a broadcast retry. This happens until the max retries (usually 3) or all the neighbors show up in the BTT.

Apologies if it might sound a bit confusing. I had to read the broadcast section more than a couple of times and actually wrote some simple simulation programs to gain an understanding of the behavior and the passive ACK mechanism.
 
The problem with the BTT is that it is not very deterministic. If there are a lot of neighbors, ie: the network is dense, then the broadcast transaction table has the potential to become large, eating into the RAM. Thus there is an option in the Zigbee specification to forgo the broadcast transaction table. The tradeoff is that the device will need to broadcast the frame the maximum amount of retries for any broadcast. At first glance, this would be desirable because you can get rid of the BTT which has an unknown number of entries. However this also means that each broadcast will be retried three times (the default retry number), taking a toll on all devices on the network. Each received frame, whether a duplicate or not, requires RAM since it needs to get to the network layer before it can be checked and discarded. If many devices on the network have no broadcast table, then each broadcast would generate a huge amount of traffic, possibly triggering some devices to run out of memory. So when you see warnings in the software documentation that broadcast transmissions should be used sparingly, they’re usually referring to the fact that a broadcast storm may chew up the available RAM in a node.

Since the mesh and tree routing mechanisms as well as the broadcasts have been covered, it's time to discuss the final method of data forwarding, which is the neighbor table. The neighbor table contains a list of the devices that are within transmission/reception range and provide a convenient single hop transmission to the destination. It is also used during the discovery or rejoin process to see if the joining device was previously a child of the node. According to the specification, the neighbor structure contains both mandatory and optional fields. However in actual usage, the optional fields are required since they will be needed by some of the ZDO functions. Just a little gotcha for those implementing their own stack:

Neighbor Table Entry - Mandatory

Field  Description
Extended Address
 64-bit device address
Network Address
 16-bit network address
Device Type
 Coordinator, router, end device
Rx On When Idle
 Flag to mark sleepy end devices
Relationship
 Parent, child, sibling, no relationship
Transmit Failure
 Transmission failure counter
LQI
 Link quality indicator
Outgoing Cost 
 Cost of outgoing link as measured by neighbor. Only required if symmetrical links are used. 
Age
 Time since link status command was received. Only required if symmetrical links are used. 

Neighbor Table Entry - Optional (Not)

Field
 Description
Extended PAN ID
 64-bit unique PAN ID
Channel
 Operating channel
Depth
 Tree depth of device
Beacon Order
 802.15.4 Beacon Order. Zigbee uses no beacons so this should be 0x0F
Permit Joining
 Flag to indicate whether device is accepting join requests
Potential Parent
 Flag to indicate if neighbor satisfies parent criteria

The neighbor table is initially populated during device discovery when a device is searching for a parent to join to get on the network. I’ll discuss the join procedure later when I get into the network management side of things. Anyways, when a device tries to join a network, it will first perform a device discovery where it broadcasts a beacon request. All routers within earshot will respond with a beacon frame containing information about themselves. This information will get stored into the neighbor table. Unfortunately, the spec is a bit light on details about populating the neighbor table after the initial join procedure. In order to keep the table up-to-date, any beacons that are seen should be compared to the neighbor table and added to it if an entry doesn't exist.

Well, that kind of takes care of most of the main points of the network layer’s data path. Within the Zigbee stack, or even the full protocol stack including 802.15.4, I’d say that the network layer data path is the most complex. Next up should be the network management which includes device discovery, joining, leaving, and network maintenance.

I left off last time explaining the transmit data path side of the Zigbee networking layer. The receive data path is fairly similar, but there are some minor complications.

When a frame arrives over the air, the radio driver will take it out of the buffer and store it somewhere. It should then signal the next higher layer (in this case the MAC) to retrieve the frame.

Incidentally, this part of incoming data handling is common to just about every protocol stack. I refer to it as the “launch” although I don’t know what the formal term is. I call it the launch because it’s the entry point of data into your stack. In general, most stacks that I’ve encountered handle things the same way where the hardware signals data arrival with an interrupt, the data is read out of the hardware buffers by the driver, it’s put into a holding area (list, queue, circular buffer),  and then the next layer is signaled to pick it up when it can. This way, the ISR is kept short and frame dropping is minimized in the case of heavy traffic.

Anyways, back to the story. After the launch, the MAC layer would pick the frame up from its holding area, strip off and decode the MAC header, and then if it’s a MAC data frame, pass it up to the NWK layer.

The frame reaches the NWK layer via the MAC’s “Data Indication” service. Once in the network layer, the network header will get stripped off and processed. This is kind of the generic way to handle an incoming Zigbee frame and I think most stacks will do something fairly similar. From here, it pretty much depends on the implementation so I’ll describe how I handle the network frame processing in the FreakZ stack.

After I have the network header parsed and tucked away neatly inside a structure, the first thing that happens is that the frame needs to be decoded according to its type. There are only two frame types at the network layer: data frames and command frames. The frame type is contained in the header information so it’s easy to figure out what it is. In the case of a data frame, there are three cases that need to be handled:

The first case is the easiest one where the frame is meant for us. To determine this, you need to check the destination network address and see if it matches the device’s network address. As I mentioned in part 1, the destination address in the network header is absolute. So if the destination network address matches our network address, then we can be sure that we’re the final destination for this frame and just send it up to the next layer.

If the destination address doesn’t match our address, then the next thing I do is check to see if it’s a broadcast. If it’s a broadcast, then we need to initiate a special sequence of events because broadcasts need to be handled carefully. The dangerous thing about them is that broadcasts can grow exponentially if they’re not handled properly and they’ll quickly overwhelm the memory of your devices, causing your network to crash. I’ll be discussing broadcasts in more detail in the later parts where I examine each of the network services. Also, since a broadcast hop is still considered a hop, we need to decrement the frame's radius value. If you remember from part 1, the radius is used to limit the max number of hops a frame can travel. Basically any time a frame needs to be re-transmitted, the radius value will be decremented.

And finally, if we’re not the final destination for the frame, and it’s not a broadcast frame, that means that we’re just a hop on it’s way to the final destination. From there, the sequence of events is similar to the transmit data path discussed in part 1. We need to find the next hop address to send the frame, forward it there, and decrement the radius counter.

Since we have a commonality like this, I made a simple optimization where both the transmit and the receive side share the same forwarding function. 

Zigbee Rx Data Path

In the case that the incoming frame is a command frame, then I just decode it based on the command ID. This will always be the first byte of the payload so its fairly easy to get this value. From there, it’s just handled according to the management function that it needs to accomplish. The command frames in the network layer handle common network maintenance tasks like mesh route discovery, link maintenance, and rejoining and leaving the network. I’ll be going into this in more detail later on as well.

Hmmm…that went by a lot quicker than I expected. That's disappointing because it seemed much more difficult when I was writing it...

Updated 2009-05-13: Thanks to some of the feedback from the readers as well as certain Zigbee spec authors, I changed how I discussed the radius handling. I previously had the radius decremented and checked at the entry point of the NWK rx function. However I changed it so that the radius is only checked if the frame will be re-transmitted. This is in compliance with the wording in the Zigbee specification on how the radius should be handled. The code was modified as well to reflect this.

I've been meaning to get around to this for a while now, but things have been so busy that I just haven't had the time. Or maybe I was just too lazy. In any case, today is a good time to kick off a series detailing the Zigbee NWK layer, mostly because I caught a cold and it pretty much rendered me useless for any heavy thinking. 

I know it's a bit late to start a series on the Zigbee NWK layer after they announced that they were transitioning to IP. However, I think that it'd be a good study on how Zigbee stands right now, where you'll be able to see the strengths and weaknesses of the networking protocol. Hopefully, you'll also be able to see what role IP will play in it, and what areas are and aren't a good fit for IP.

So here we go…

The Zigbee 2007 Networking Layer - Part 1

The Zigbee networking layer is probably one of the most complex layers in the protocol stack. Mostly it's because they spent a lot of time outlining all types of different conditions, routing methods, a bunch of tables, etc since it's the backbone of the transport part of the stack. Since it's quite a large topic, I think the best way to go is to divide it up into parts and dissect each part individually.

The first cut will be to split the layer into two sections: the data path and the management path. Both sides are fairly complex, but if you understand the data path well, then you can probably see why the management side is also complicated.

Below, you can see a very simple diagram of how the network layer looks from a data point of view. Data to be transmitted can either flow down from an upper layer, received data can come up, and data can do a U-turn where received data comes up, gets analyzed, and then gets retransmitted (forwarded) to its next hop destination.

Zigbee Simple Network

On the transmit side, the Application Sub Layer (APS) sends data down to the network layer via a data request. Inside the data request, the APS specifies the destination address, whether or not route discovery will be allowed, and the radius. The radius is the maximum hops that the frame will be allowed to travel. This is mostly used in broadcasts where you might not want the broadcast to travel too far, or it's usually set to a default value to prevent endless hopping in case there's a loop in the routing tables.

Once inside the data request, the network header is filled out and then the address is examined. At this point, it's probably best to make a distinction between network addressing and MAC addressing. There are two sets of addresses in each Zigbee frame: the 802.15.4 source/destination addresses and the Zigbee network source/destination addresses. Actually, sometimes only the source or the destination is present depending on the situation, but that's another article.

The key to understanding the Zigbee routing algorithm is to understand how both addresses are used. The network source or destination address is what I'll call the absolute source or destination. This means that if a frame originated from address 0x1234, this will be the source address in the networking field no matter what. The same goes for the networking destination. On the other hand, the MAC address is used as a next hop address which could just be a stepping stone on the way to the destination. It's like if I wanted to go to Akihabara. I know where I came from, and where I want to go, but from a microscopic perspective, I have a couple of interim destinations (train platforms) I need to get to before I reach Akihabara. And one more thing about addressing, for the purpose of this series, I'm only going to be referring to the 16-bit short addresses used by the MAC layer. The MAC actually has a 64-bit address as well, but it would just confuse things.

Zigbee Addressing

Okay, so now where was I…ahhh yes, how the network data request processes the addresses. Inside the network layer on the transmit side, you only have one piece of real information which is the destination address. From that you'll have to figure out the next hop address for the MAC. You can say that most of the network layer exists just to figure out what the next hop address will be.

To figure out the next hop, you have to go through a series of comparisons. Although the order is dependent on the stack, I'll refer to the order that I use. The first comparison is to see if the destination is a broadcast. Zigbee has a few classes of broadcasts: broadcast to everyone, broadcast to routers, and broadcast to non-sleeping nodes. Upon checking the spec again, I see that there is a broadcast for low power routers as well, although I' haven't seen this used in the spec.

I'll go into broadcasting in more detail later, but suffice it to say that if the destination address falls into the broadcast category, then the next hop address is simple…it will be the 802.15.4 universal broadcast address which is 0xFFFF. Case closed, ship that frame out to the MAC…errr…kind of. You'll see later on that broadcasting isn't that simple. Precautions need to be taken because you because you can easily crash a network with a poor broadcast.

Moving right along, if the destination address isn't a broadcast, then you need to determine how to get it to where it's supposed to go. I check the neighbor table first to see if the address matches any destination inside of it. The neighbor table, as you might imagine, is a list of all the neighbors that are within earshot of the node. I'll describe this in more detail later on as well, however this is the next logical place to check for the next hop. If the address is inside your neighbor table, then you can get the frame to its destination in one hop and don't have to go through all the fancy, shmancy routing algorithms.

If it's not inside your neighbor table, then you have to resort to forwarding it to an intermediate node on its way to the final destination and this is where things might get a little complicated. I check my routing tables first because if the destination address is inside my routing tables, the routing table will include the next hop address and I can just send it out. However if it's not inside the routing tables, then you have to decide if you want to do a route discovery or if you want to route it along the tree (my current stack supports tree routing as well...this won't be the case if you have a Zigbee Pro implementation which only supports the mesh routing).

The tree routing is the last resort for me so the next thing I do if I can't find the destination is to check to see if the frame will allow route discovery. Route discovery takes a bit of time, and also floods the network with broadcast frames so there may be some instances that you might not want to do this. However if the route discovery flag is true, then I buffer the frame in a holding queue and kick off a route discovery to find the destination within the network. For more information on this process, check out the article I wrote a long, long time ago on Zigbee mesh routing .

And finally, if I can't do a route discovery, then I just route it along the tree, using the algorithm outlined in the Zigbee spec. Again, for more information about tree routing, as well as why it sucks, check out this article I wrote on it a few months ago .

If all hell breaks loose and you can't do route discovery or tree routing (your upper layers are sadistic, pointy-haired bosses), then you just give up and return an error status in the confirmation.

Whew…that was quite an introduction to just the transmit side of the Zigbee network layer. Join me next time as I discuss the ever exciting receive side data path which is pretty much the same, but with an added twist.

I got an email this morning about the preliminary results of the Continua member company vote from the event in Barcelona. The numbers haven't been verified yet, so take them with a grain of salt, but at least all the readers of this blog can participate in the rumor mill...

There were two votes, one for the radio and protocol best suited for a mesh-routed LAN and one that's best suited for the body PAN. And the results are:

LAN vote:

 ANT
Bluetooth Low Energy
BodyLAN
Zigbee
Median
 55 64 4978
 Std Deviation
 20 23 17 26
 Average 53 61 46 69

 

 

 

 

 

PAN vote:

 ANT Bluetooth Low EnergyBodyLANSensium Zigbee
 Median 59 72 54 38 74
 Std Deviation
 20 23 19 1426
 Average 56 67 51 38 69

 

Zigbee swept both categories!! BLE didn't have much chance on the LAN vote, but I was suprised that they actually took the PAN vote too.  These numbers are from the member company vote. The next step is a vote by the Continua Technical Working Group. If Zigbee makes it past that, then they're in like Flynn.

After attending the BLE Developer's Preview, I was disappointed to hear that the BLE spec release is at least nine months away and that the Bluetooth Health Device Profile isn't compatible with it. In that case, Zigbee is far ahead of BLE in having a spec and at least being close to release on the Personal Home and Healthcare Device Profile. My old post on Continua and Zigbee was completely off target and I was wrong on both counts so I'm very happy that Zigbee did well in Continua!