It's been a pretty busy week for me. Been writing code all week, but still struggling to balance coding, blogging, and the j-o-b.

I got the mesh routing functions running and wrote some basic tests for it. The tests took longer than expected to write because the mesh routing consists of multiple steps. For example, to test route forwarding on an incoming packet, I had to generate an incoming frame, wait for the route request to come out of the stack, process the request in the test, generate a route reply, and then wait for the final forwarded frame to come out. What a pain in the ass. But without that test, it would have been hell debugging the code in a live system.

Since I had to parse so many headers, my eyeballs got tired and I decided to implement an automatic header checking system. I wrote a simple checking engine that you load with the expected values and when the frame comes in, it'll parse it and make sure that the header values are what I expected. I also implemented it so that it would check the headers in the MAC, NWK, and APS layers separately.

After finishing the simple routing simulator that I wrote in C, I was confident that my implementation of the routing algorithm would work. So I then started to integrate the code into the main Zigbee NWK layer. I had to make a couple of decisions, and as usual, there were tradeoffs associated with the decisions.

One of the decisions I had to make was how to handle route discovery. According to the spec, there are two methods to initiate route discovery. The first is that the route discovery service is invoked by a higher layer (APS). The second is that a frame arrives with the route discovery option set (in the NWK frame control header) and the destination address of the frame does not exist in that device's routing table. My problem mostly dealt with the second option, which is a frame with an unknown destination arrives. How should it be handled?

There were a couple of options, but the two most obvious ones were:

  1. Keep the frame and initiate route discovery. Don't forward the frame until the route reply arrives (giving the route) or the route discovery times out.
    • Advantage: Simplest method. Consumes least resources (RAM, code size).
    • Disadvantage: Performance suffers. Cannot process other frames until current one is finished. Route discovery can potentially take up to 10 seconds. 
  2. Buffer the frame in a waiting queue and initiate route discovery. Continue processing other frames until a route reply arrives.
    • Advantage: No hit in performance. Seriously, 10 seconds is tooooo long.
    • Disadvantage: More complexity. Yet another queue that needs to be implemented. RAM and flash required to implement.

Well, I have to admit it. I got sidetracked for a few days. My wife does a lot of acting in independent theater and it turns out that actors and actresses are not very good with computers (if they even own one). So they asked her if she could make the playbill and the pamphlet. Of course she volunteered me to do it so I've been spending the last couple of days brushing up on my Adobe Photoshop, Illustrator, and Indesign skills. With the little time I have between work and the site maintenance, I had to choose between the software and keeping my wife happy. That's a no-brainer. Earning brownie points let's me work in peace and quiet in the future, otherwise if I make her lose face in front of the other thespians, life's gonna be a living hell at home.

So when I could catch a few breaths, I was able to spit out the chip comparison guide which turned out pretty nice. That was actually a benefit of working with the Adobe tools. Indesign is a great piece of software to create layouts including tables. When I started having problems with the HTML tables for the chip comparisons, I decided to do it in Indesign and it was a complete breeze. It even output a nice pdf for me.

On to the development part of this post... 

The APS layer, also known as the application sublayer, is the top of the main data path in a Zigbee stack. Th only things that lie above it are application objects, which are implementation specific. The APS layer handles data transmission and reception as well as table management. The main tables that are located in the APS layer are: binding, discovery cache, address map, and endpoint grouping tables. We'll discuss the tables later on when they get implemented. Right now, I'd just like to focus on reliable and unreliable data transmission.

The APS data service is the main vehicle for device discovery and management. The application objects use this service to communicate descriptors to each other, handle client/server communications, and perform remote management and provisioning. The data path for this layer can get a bit complicated due to the many options available for transmission and reception. The binding, grouping, and address tables are all used when building the frames. However on a fundamental level, there are only two types of transmissions: reliable and unreliable. Reliable transmission is transmission which requires an APS level acknowledgement from the destination device. Unreliable transmission is when you just send it out and you don't care if it arrives or not.

The IEEE 802.15.4 spec already implements a form of acknowledgement, but this is a PHY layer acknowledgement. Since most 802.15.4 radios implement the ACK in hardware, the 802.15.4 ACK just says that the frame was received into the chip's FIFO properly. The APS level ACK says that the frame was processed correctly as well, which is very important. Many things could go wrong from the PHY to the APS layer. Some examples of dropping a frame between the PHY and the APS processing are:

1)    The received frame requires routing, but no route can be found for it.
2)    The received frame goes to a router with no routing resources and no tree routing ability.
3)    The MAC doesn't pull the received frame out of the chip's FIFO quickly enough and it gets overwritten by another frame.

If a reliable transmission is needed, then an APS frame with the ACK required is probably the safest way to send it. However it also is costly in terms of performance.
With the buffer management system chosen and the frame buffer pool finished, the next job was to pass a frame up and down the stack. This meant that code needed to be written to process the headers.

There are three headers that need to be handled: the MAC, NWK, and APS headers. Going downstream, the headers needed to be built from the available information you pass into the function. Going upstream, the headers needed to be stripped off from the frame and processed so that they could be put into a structure for easy access.

Many of the people that work with TCP/IP stacks would laugh at putting the headers into a structure. That's because TCP and IP headers (and Ethernet headers for that matter) are fixed sizes. This means that you can just create a structure pointer (ie: a pointer to an IP header struct) and point it at the start of the header in the buffer. That way, you can process all the fields in the buffer instead of copying them into a separate struct. Hope I didn't lose anyone with that explanation. You can save space by doing in-buffer processing because you don't need to use RAM to hold actual header structures.

After I got the dummy functions in the data path going, I needed to create the data structures that would be traveling through this path. The data structures consists of a frame buffer pool, and the request/indication parameter structures that conform to the Zigbee/802.15.4 specifications. Its at this point that you need to decide on the buffer management strategy. Yeah, I could just hear those yawns. I know buffer management strategy is not the most exciting thing in the world, but its important. Really! So here's a brief and probably incomplete discussion on buffer management...

I'm going to kick off the FreakZ development journal by going back in time a bit. I started working on the stack with the Contiki OS for about a month now, in my spare time. One of the things that I always struggle with on a project is where to start. The problem with creating anything is that you start with nothing. You literally have to pull the design out of thin air.

On the Zigbee stack, this will be somewhat of a rewrite since I already had the driver, PHY, MAC, and part of the NWK layer going on my previous attempt. This gives me a partial head start. The standard way to start a design is usually by using one of the three methods: 

  • Top Down Design
  • Bottom Up Design
  • A mixture of Top Down and Bottom Up Design

My first attempt was basically a bottom up design. I started with the framework, followed by the driver, PHY, MAC, and NWK, roughly in that order. I found that, at least for this project, I started to get into the whole forest from the trees thing. I was spending a lot of time tweaking the MAC layer and driver to get the startup, association, management and data going, but I felt like I was losing sight of the upper layers.

I created a new category that will be used for the FreakZ development journal. Here, you'll see all my dirty thoughts while implementing the stack.

Continuing from Part 1…

At that point, I was pretty dejected thinking about all of the work that had to be done to define, debug, and tweak a new framework. One thing that I didn’t realize going into this was that it would require a lot of services that are provided by an operating system. That’s true in most cases of protocol stack design, which makes running any stack other than a trivial one on a microcontroller with no OS a challenge.

As an aside, the problem with using an operating system (say FreeRTOS) to implement a protocol stack is that you wouldn’t be able to run it on another operating system (say Linux) without significant porting effort. That’s because an OS requires memory to push the current context into when it performs a context switch. What that means is that you basically save all the CPU registers (program counter, etc…) and the stack variables on to memory allocated to that thread (old context), and load the contents of the next thread (new context) into the CPU.

The problem occurs when you try to run the stack under a different OS, like say Linux. You can’t run it natively since your stack has a threading structure and Linux has its own threading structure and mixing the two means almost certain death, if not some really nasty timing and memory conflicts. So normally, you would need to port the code on to the new OS in order to run it. That’s why many stacks come with an OS Abstraction Layer (OSAL) so that they can make it somewhat easier to port between OSes. However you still need to be familiar with the OS and its behavior to get it to run properly.

Okay, back to my story…

I’ve been talking about it for a while now so I guess I should provide more details on the open source Zigbee stack that I've been working on.  Before I start, perhaps I should provide some background info about me, the project, and my struggles wrestling with the Zigbee spec.

 I had actually been involved with Zigbee since 2003 when the 802.15.4 MAC spec was first introduced and the Zigbee Alliance was trying to get the Zigbee spec out. I was helping a friend who had started a company based on the Zigbee and MAC standards. He had actually written a hardware 802.15.4 MAC (one of the first that I knew about) and had it prototyped in an FPGA. He also wrote the MAC software that went with it. At the time, we were able to get a copy of the MAC spec since he was on the IEEE committee (an IEEE fellow nonetheless), however we were unable to get the Zigbee spec because it was only in draft form and apparently, only available to the people working on it. In order to be a Zigbee contributor, you had to pay some big bucks. We spent a lot of time asking around for bootleg copies, however no one would hook us up. Time passed and I started getting busy with my day job. Zigbee kind of faded into the background for me, and I decided to focus on my job and try to help build the Japan office for the company that I was working for.

 So when I stumbled upon the Zigbee site again in 2007, I was surprised that they had made the spec publicly available. I immediately downloaded it and took a quick look at it. It was difficult to understand since the spec kind of jumps around everywhere, and I decided to shelf it since I didn't have the time to go through it thoroughly. I was wondering why I had to implement so many things if all I wanted to do was flip a switch to remotely turn on a light.



Things are starting to look better now. After some elbow grease over the weekend and much research on Linux' interprocess communications, I was finally able to get my simple simulator up and running and connected to Contiki. The development process wasn't easy, but it took about a week (of hard work) to get it to the point where I can actually start simulating actual nodes. I'm actually quite satisfied with it. It's not as sophisticated as Cooja or Netsim (Contiki's native simulators) but it can do what I want and give me fine-grained control over how the simulation is run. Here are some of the details.

The simulator starts up as a command line shell. To add a node, you type "add". The simulator will then fork a process and run the FreakZ stack on it. The command shell also has two duplex communications channels to the node. One of the duplex communications channels is for data tx and rx. The other is for command tx and rx.

The node runs a simple command parser and is constantly listening to the command rx pipe for instructions. When the user types in a command and addresses that node, the command will be sent directly to that node. Once the node receives the instruction, the command parser will then parse the instruction and arguments and carry out the task.

Probably the toughest part was the radio medium. To simulate a radio medium in the simplest case, when one node transmits, all nodes should ideally be able to hear. I know, I's impossible due to range, fading, etc...but I'm just talking about a simple case. When one node transmits, the data needs to be broadcast to all the nodes. The nodes will then check the address and determine if it should be discarded or sent up the stack. This is how a standard 802.15.4 radio works.

The broadcasting part was difficult because there's not really a good way to do one-to-many interprocess communication on Linux. Some of the candidates were "named pipes", shared memory, and client/server using sockets. However each of them had some drawbacks, mostly in complexity. I experimented a bit with some of them, but after awhile, I decided to take the easy way out. I just keep a list of all the nodes, and when one of them transmits data, it goes to the main shell which then forwards the data to each listening node. Its brute force and crude, but it was the easiest way to broadcast data to all my processes reliably. In the future, I'd like to add some features like transmission range or noise to the radio medium to make it more realistic. But right now, I just want to see how the nodes perform in a network setting.

So that brings me to today. I simulated my first node where I actually brought up all the layers and started a network formation request. Unfortunately, I found a bug immediately so I'm working on the fix now. Actually, it's strange because that same bug should have come up in my old test fixture (the single-process one), but I never saw it before. It's kind of a mystery that bears more investigation, but anyways it's better to catch bugs than let them stay dormant inside the code.

I found out the reason why I didn't see the problem in my test fixture. I tested the mac components that made up nwk formation individually but didn't test them running together in the actual nwk formation code. Oops.