I was actually able to pull it off. In  my last devjournal, I expressed some of my uncertainty in being able to port the FreakZ stack over to Cooja. Well it took a lot less time than I expected to get it working under the simulator. There are still a lot of things that need to be done, however I'm almost 100% confident that I can get everything up and running now.

Don't get me wrong. It was anything but easy. It was like a warzone in my development directory after I was finished. Porting a stack under Cooja requires hacking C, makefiles, and Java. Actually the makefile hacking took the longest because there were a lot of dependencies that I needed to take care of as well. But in the end, I was able to get the stack to compile and be recognized by Cooja.

There were also a lot of times that I was completely stumped, and a few times where I just wanted to cry. I ran into some issues where I would have name clashes with Contiki (ie: mac.h exists in Rime and in FreakZ) and assert.h is overridden by the Contiki version. I also had a problem where everything was correct except for a space in my makefile. That one almost killed me. Anyhow, things finally worked out in the end and Cooja can now signal events to the FreakZ stack via Contiki.

The things left to do now are to modify the interfaces and plugins. I need to start examining the radio code and the radio monitor to see how much changes will be required so that I can start transmitting and receiving to other nodes. I also need to figure out how I'm going to start up the network, since I can't just have all nodes turn on at the same time. But at least now I have more confidence in my ability to get things working with Cooja which is a huge plus.

Here are some screenshots of the FreakZ stack working under Cooja.

 

I'm currently doing battle on two fronts. On one, an onslaught of cold viruses seems to be attacking my immune system and rendering my body immobile. Luckily, there is only a slight haze in my head so I still have the ability to ponder the second battle. That is, my battle with Cooja. I've been spending the past week studying Java and poking through the Cooja source code. I was pretty intimidated at first because Java is not a strong language for me at all. It's been a long time since I had to do object oriented programming, and that was back in the college days when I spent most of my time ditching classes and going to dance practice. Cooja is written almost completely in Java and interacts with Contiki via the Java Native Interface (JNI). The JNI is like a translator that allows Java to call C functions and vice versa. 

So here I am again with my Java books, reading through the source code of the Cooja simulator and trying to figure out how it works. There is a user manual available for the simulator, but it's a little bit thin. I think at a bare minimum, it takes the user guide, the PPT slides , and Fred Osterlind's (the author of Cooja) master thesis paper on the simulator to get an understanding of how it works. I've already been through those documents a few times.

I haven't updated the blog on the status of the stack for a while. The week before last, things got really busy due to the start of the Golden Week holidays in Japan so I ended up working a lot. That ate into a lot of the time that I wanted to spend on implementation.

Also, my wife got laid off from her job last week. Although it doesn't affect us too much financially, the fact that she's home all day is taking a toll on my implementation time. Since she has a lot of free time, she's taken to cleaning the apartment, and I mean thoroughly cleaning. I guess she needs to focus her energy somewhere, but she gets mad at me if she starts cleaning and I'm not on hand to help out. So I've been helping her clean the closets, her room, the floors, laundry, throwing things out, etc. My room only gets cleaned by me. The good news is that the apartment never looked better. The bad news is that it's cutting into my coding time about as much as working full time two weeks ago. Ahhh…the price of marriage.

I did manage to sneak out a bit last week and hide in a local coffee shop. Now that the weather's better, I've changed my normal place that I do my coding. Before, it was at the local hamburger shop, where I can buy a cheap coffee and hang out on the second floor for a couple of hours. That’s important because a lot of places get mad at you if you stay too long with just a cheap drink. But now that the weather's improving, I've been going to a café about 15 minutes away by bicycle. It has a nice veranda with a view of the park across the street. You don't get to see too much greenery in Tokyo so it's a nice change of scenery for me.

I feel pretty good today. It feels like I was able to accomplish a lot of things on the stack for a change. Sometimes I feel pretty lethargic when I code. It's like working out; hard to start, but once you get going,  it's easy to lose yourself in it.

One of my major accomplishments was rewriting a lot of functions to remove some useless code. In the 802.15.4 and Zigbee specs, they use a large configuration structure called the Information Base (MAC Information Base (MIB) and Network Information Base (NIB)). They also specify how to access it, which is using set and get functions. The problem is that the structures are quite large and contain varied data types so the code to set and get all of the struct members is quite big.

I was thinking today if it was really worth the code space to have set/get functions instead of just accessing the structures via a pointer. That was when I decided to check the code size of the object files. The total was 3k of flash for the set/get functions. Holy Scheiser!

I sharpened my axe and started slashing left and right. Basically, I removed all of the code in the set/get's and even deleted the files. Then I just wrote a simple one-line function that would return a pointer to each of the structures. I had to do a lot code surgery in other areas since I had the set/gets sprinkled all over the place. But in the end, in return for accessing the structure via a pointer, I got an extra 3kB of code savings. Don't laugh...that's 10% of a 32 kB flash MCU! Wow...I guess I am a geek. Anyways,  accessing the structures directly is a bit unsafe if you're not careful, but I don't think set/get's add much more safety either.

In other unexciting coding news,  I put the finishing touches on the network discovery and mac scan functions and wrote a self-checking test to make sure it worked. After debugging a couple of bone-headed mistakes, I was able to get it to pass the simple test I wrote, as well as the regression where I run all the previous tests. That felt pretty good. 

So off I go on to the network join functions. This one will take a while because there is a client side and server side to the join procedure and both of them are pretty involved. I'm also gonna be a bit busy this week since the part-time job is going to take up at least two days due to unforeseen busy-ness, and I'm spending one day to go to the Sensor show at the convention center in Tokyo. Hopefully, I can take some cool pictures and show everyone some of the new sensors that are coming out over here. 

That's about it for today. TTFN. 

Now that the data path is basically working, I've moved on to the network management functions. Network management services consists of starting, joining, and leaving a network, neighbor discovery, addressing, and transceiver control. There are also a couple of other side functions too, but that's basically it. Many of these functions rely on data transmission which is why I wanted to tackle the data path first. With the data services functional, then implementation of the management services will be much easier.

Management Service Overview
The network management services are actually top level management functions that rely on the MAC management services. The MAC management services are channel scanning, association (joining), disassociation (leaving), polling, and some other side functions.

Well, I finished the last item on my checklist for the initial datapath design. All of the Zigbee data features aren't implemented, such as multi-casting, binding, or endpoint grouping, but data can now be transmitted, received, routed, and sent indirectly. It's a start, and enough for a proof of concept on the design direction.

In my opinion, I'm quite satisfied with how the design is turning out. Contiki has been a real life-saver. Not just the OS, but I've discovered that there are a lot of code gems that are hiding inside. I've re-written most of my tables to use the LIST functions inside Contiki. Those are a library of generic linked list functions, and since they are already being used by the Contiki OS, my use of them to implement some of the queues and tables are basically free (in terms of code space).

Another one I liked was the callback timer. This is a hidden little gem inside the Rime stack that implements two functions in one: a timer and a callback. Once the timer expires, it will call the function of your choice and even send in data to it. Basically, it rocks! I rewrote the whole network layer timing functions to take advantage of this beautiful library. It got rid of one process and a whole lot of code. I also used it in the MAC layer and will probably modify the APS layer to use it as well. 

These two libraries, along with the Contiki process handling, saved me a lot of effort that would have probably extended both the design and debugging time of this project by a factor of 2 to 4.

Another thing that saved my ass multiple times is my self-checking tests. Sure it was a pain to write the tests, and to write the self-checking framework, and the expected values, but it caught so many bugs that it proved to me how much of an idiot I was. I made some of the stupidest mistakes in the world, and I probably wouldn't have caught them without it. A good example is the off-by-one bug I found recently. The self-checker caught a problem where the data length was one byte longer than it should be. It turned out that it was caused by a typo. Ha ha ha. I almost cried. But without the tests, it would have never gotten caught. And those off-by-one bugs really suck to debug in real hardware. 

Anyways, I'm satisifed with the data path in its limited current state, and will proceed on to the MAC and NWK management functions. But first, I'm gonna take a day or two away from this stack. It's become too obsessive for me, and I want to enjoy some of my new free time (I converted to part-time work at my job so I could spend more time with my wife, dog, and this fucking stack). 

 TTYL.

I've spent the past couple of days working on the data path in the MAC layer and it wasn't as easy as I expected it to be. I think the toughest part was making some hard decisions, literally. The MAC layer is where the line between hardware and software starts to get blurry. Since its so close to the hardware layer, you start running into issues where chip architecture starts to matter. Here is an example.

Many chips have a feature called auto-ACK. It's a nice feature because if the ACK request bit is set in the MAC frame, then the chip will automatically detect it and send an ACK back to the node that issued the frame. Easy, right? But here's the rub. IEEE 802.15.4 has a feature called indirect transfers. It's used for devices that sleep a lot, like battery-powered end devices. A router would basically buffer a frame whose destination is for the sleepy end device. When the device wakes up, it will send a data request to the router to see if there were any messages for it. If the router has a message for it, it will indicate it in the ACK to the data request by setting the "frame pending" bit.

This screws up the whole auto-ack thing because now, you have two types of ACKs: a regular one which just acknowledges that the device received the frame, and a special one which indicates that you received the data request and that you have a frame pending for that device.

It was pointed out to me that my introduction in the About button is woefully inadequate. Let me start over and introduce myself.

My name is Chris Wang, aka Akiba. Akiba is slang for Akihabara in Tokyo which is nicknamed Electric City. Its also Japanese slang for geek (otaku).

My current (up to last week) occupation was as part of the sales team (I can already hear the eyes rolling) at a semiconductor company, and my specialty was in USB. I recently quit so I could spend more time with my family and try to finish the Zigbee project. Now let me tell you a bit about my background.

I didn't actually start my career out as an engineer. I was originally a professional dancer and toured with artists doing dance videos and concerts. My specialty at that time was breakdancing, locking, and new-school hip hop dance styles. At that time, I had a dance crew and we would always be together, either practicing or performing at gigs. That crew was called "Freaks of Nature", hence the name FreakLabs.

Unfortunately, as we got more known, people from the crew got recruited to become artists in Taiwan (since we were mostly Chinese). The girl became part of a pop group called Babes, two of the guys went on to form/become Machi , and another is part of a group called F4 . That basically left me alone back in Southern California with nobody to dance or do gigs with. In case you were wondering, I did get one record contract, but my mom made me turn it down and go back to school.

Tired of the destitute life of a dancer, I went back to the university to finish my degree in Applied Physics. Apparently, it was even harder to get a job as a physicist than as a dancer, since you needed a PhD to really get anywhere. Luckily, applied physics carried over easily into electrical engineering, so I took two more classes and was able to squeeze out an EE degree as well.

My first job out of school was doing FPGA design at a startup. That was back when you still had to cobble all the tools together in the Xilinx Foundation package and squeeze your code to fit them into an XC40xx FPGA. I would compile the designs on a Pentium 100 and one compile run would take six hours to a day depending on the version of the Xilinx software. At that time, the state of FPGAs was horribly buggy. It doesn't seem like it's much different today.

Found a nasty bug today. I've been working on the MAC layer recently and stumbled across this one by mistake. Currently, when a frame is received, it will throw it into a queue and post a receive event to the MAC layer. The MAC layer will then process the frame. However, I made a mistake when I was writing one of my tests and accidentally generated two receive interrupts sequentially. Only one frame was taken out of the queue though and the second one was lost.

It turns out that after an event is processed, it will be cleared which means that the second event also got cleared from the event queue. Anyways, I modified the code to process frames until the buffer is empty in case two or more frames arrive back-to-back. It fixed the test and it logically removes the bug as well. Glad I took the time to set up a test environment and write a bunch of tests on the stack. That bug would have been at least a few days to a week to debug in actual hardware.

Now that I'm approximately into the sixth week of the FreakZ stack development, thought I'd provide everyone with a rough status update.

I'm basically finished with the NWK layer data path and the APS data path was also implemented, albeit without the group and binding tables. In my opinion, I think the data path is a major portion of the stack, at least the most important. It provides the majority of the functionality of the stack, with the rest being network management functions such as scanning for new networks, joining devices to the networks, removing them, etc. The data path is not fully implemented in each of the layers, but the basic functionality is there. The APS layer can handle ACK'd and non-ACK'd transmissions, the NWK layer can handle data routing and broadcasting, and the MAC can already handle unicasting and broadcasting although more work needs to be done on this layer.

This all being said, I did a sizing on GCC, currently using the x86 port. I don't know how much it will change if you try a different target, but at least this can give you a rough idea:

Code size: ~16k
RAM size: ~3.2k

These sizes are including a stripped down version of the Contiki OS, which is already quite small. The stack is taking up about 13-14k and Contiki is roughly 3k. The main contributor to RAM usage is the buffer pool and the numerous tables that make up the Zigbee stack. I'm currently using six frame buffers which take up about 1k  of RAM and most of my tables hold about ten entries. It's possible to reduce the frame buffers to about three (saving about 600 bytes of RAM), although you may get more dropped frames. You can even go down to two frame buffers if you feel adventurous.

I've been working on the NWK layer broadcasts all week. When I first started it, I thought it wouldn't be too difficult. I mean broadcasts are just frames that you send out with a broadcast address, right?

Unfortunately, that didn't turn out to be the case. There's this nasty little problem with broadcasts on wireless networks, where if you mishandle it, you end up with an infinite broadcast loop. I had nightmares of generating broadcast storms that would spread from node to node and never end. The problem is that broadcasts are quite sensitive to the stop conditions.

When you receive a broadcast from a neighbor node, you can't just forward it. You need to know more info about it such as:
1) is it a new broadcast?
2) is it broadcast that you previously transmitted and that neighbor node is forwarding it back to you?
3) is it a new broadcast, but has the same identifier as the previous one?

Passive ACK
These are just some of the questions that come up when you handle these things, and the wrong response could make you end up in an infinite broadcast loop. Zigbee takes care of these situations by requiring the use of a broadcast transaction table (BTT). Each entry in the BTT consists of at least the address of the node that forwarded the broadcast and the broadcast identifier. Once you implement the BTT, you then need to create an entry for each broadcast that you receive and keep a record of which device sent it. That way, you can filter new broadcasts from ones that you have already forwarded. They call this a "passive ack" system, since you're not allowed to use the 802.15.4 ACK on broadcast frames.