Well, after running simulations with more nodes, I've determined that having all processes write to the same stdout terminal is hopeless. It's impossible to debug when you have more than a 1-2 processes interleaving debug printouts. Over the weekend, I started working on modifications to the simulator to have each node open its own console window when it starts. This removes the problem of interleaved print statements and makes things much easier to debug. Although it sounds easy, it is pretty difficult to open a new process in its own window. The only way I could figure out how to do it was to fork the process, send an exec command to open a new window, and have the node run as a standalone process inside the new terminal. If anyone has a better method, please let me know.

Anyways, the change means that I need to restructure the simulator and separate the node executable from the simulator executable. It's actually not too bad, and I've already done a lot of it. Since I was making modifications on the sim again, I decided to remove a lot of the hacks and actually try to turn it into a halfway decent application.

I was using pipes and the 'select' command to mux the data previously which was okay for a while, but started to get messy. Using pipes and select means that you're constantly polling the read pipe descriptors, and setting up the 'select' command is pretty ugly. If you're not familiar with the 'select' command, using it means that you have a lot of FD_SET and FD_ISSET macros in your function. In my opinion, the better way to do it is using threads to handle each of the pipe reads since you can just let them block instead of using the 'select' to poll them. It seems more natural to do it this way since the pipe communications were meant to block and the code flow is much better this way. I already converted one of the pipes to a thread and it made a big difference in how clean the code looks. So I decided to make the simulator a multi-threaded, multi-process application. Fun fun fun...

At least if I'm ever starving on the streets, I can try and get a job as a Linux system programmer.

I've fallen into radio silence mode again recently since I got the simulator working. The last few weeks have been pretty tough with all of my simulator issues, but for the past three days, it has been working well. After my big coding session on Thursday, I got things to the point where the sim runs stably which allowed me to move forward on actually testing the stack.

As of today, I was able to get 10 nodes joined on to the same network which was quite a big accomplishment. That means that the first node was able to form the network, and the other nodes were able to communicate the handshake correctly to finish the join procedure. The addressing is also correct and follows the distributed addressing formula correctly in the spec.

Actually, it wasn't easy getting to my current point. Once I got the stack working in the simulator, I found and cleaned up many bugs. They ranged from stupid ones like reversing the bit definitions in one of the header fields to agonizing such as a re-entrancy issue I had when my (simulator) thread called a non-reentrant stack function. Also, things got exponentially more complicated to debug as the size of the network grew. I ran into some problems that would only occur after my network grew to 8 nodes. That one turned out to be a buffer pool issue, but it was hard to track down because there were so many things going on simultaneously. Especially on the join procedure, there's a lot of broadcasts that occur so all of my node consoles were scrolling like crazy.

Recently, I've been doing a lot of testing in my sim. That means that I've been crawling through the code looking for the causes of different types of bugs. Some are obvious, some are mysterious. However, as a result of looking at the code a lot recently, I found many areas that I'm dissatisfied with. One of the areas that I was unhappy with was my APS layer or the Zigbee application sub-layer. This was the first part of the stack that I worked on and it was written about three months ago. Now that I'm more familiar with the needs, behavior, and patterns of the Zigbee stack, I realized that there was a lot of code that didn't need to exist.

In the APS layer, if you recally my old post on it , I implemented a data queue to queue up the data from the different endpoints, a state machine to handle retries and acks, a transmit process, and some miscellaneous logic functions. All in all, it was a complicated way to implement the APS data request service. A lot of the complication was due to handling reliable acknowledgement. To do this, you needed to keep track of a timer and also buffer the data for re-sending.

Well, after a couple months of stack coding under my belt and a better understanding of when and how to use the Contiki services, I decided to rewrite the APS data request service. This is one of the key functions in the stack because all the endpoints including the Zigbee Device Object will be using this service heavily. The good thing about rewrites is that you usually have much more insight than you did when you originally wrote the code. Since I can now see from the application layer down to the radio driver, I knew exactly how the APS layer needed to behave and what the tx function needed to do. Here is a simple flowchart of how the new data request routine looks:

I was able to get rid of the data queue because it wasn't really needed at the APS layer. The tx process, state machine, and temporary retry buffer got replaced with a retry queue and a callback timer. This greatly simplified things because the callback timer performed the equivalent of a state machine, process, and timer all in one. The retry queue is actually an enhancement. Previously, I could only buffer one frame at a time which meant that I could only transmit one reliable frame (ack requested) at a time. Then I had to wait for a timeout or ACK to send the next one. This is the main reason why I needed a data queue on the transmit side. A retry queue allowed me to set aside only those frames that needed to be sent reliably. When they got put in the retry queue, then they would also have a callback timer set. If the callback timer expires, then they would automatically be sent out again.

The revised design simplified the APS layer and reduced the lines of code by about half. Getting rid of all the complication also made things easier to debug. I've already tested it out and it works in the simulator too.

Now, I'm going to do the same for the MAC layer. The MAC layer has a similar behavioral pattern as the APS layer in which it sends out data and needs to wait for the ACK. I'm pretty sure I can use the APS layer as a guide to simplify the MAC layer too. That way, I can decrease the size of the stack and make it more maintainable, and even add functionality. Also, the APS and MAC data request functions will be similar which kind of gives the stack a nice symmetry...I hope.

Hit a major milestone over the weekend. After rewriting the mac, nwk, and aps data paths, and also rewriting most of the mesh networking code, I finally got everything to work. The reason behind the rewriting, besides cleaning things up is because I have a much clearer picture of how things should work and also the architecture that I want. So I wanted to make everything consistent with each other. This also includes the naming, the patterns (no I don't use C design patterns, but many functions share a common pattern of execution), and the file layout.

I also simplified the mesh routing and improved the functionality. I can now handle multiple route requests which I think may be needed at network startup. When a network starts, all the nodes have no entries in their routing tables. So any node sending data to a remote node that isn't an exact neighbor will be doing route discovery which means broadcasting route requests. Previously, I could only handle one at a time which means that I would have to drop any other route requests. This might have been detrimental to finding good routes on network startup. Or I might just be overly cautious. Anyways, the routing code is smaller, cleaner, and can handle multiple requests so I'm satisfied.

After all of this was implemented, then I started testing late last week. I verified the mesh routing and tree routing on Saturday which is a big event since those seem to be the most difficult parts of the stack to get right.

Once I got those working, then I took Sunday off (my wife was complaining) and today, I finished the remaining services on the nwk layer. Things were much easier to implement now that the code was cleaned up. I also finished out the remaining services on the mac layer that were needed by Zigbee. I'm not implementing a full 802.15.4 mac because it's not required by Zigbee. The spec only uses probably about one-third of the available functionality of IEEE 802.15.4.

The remaining things to do are to verify the rest of the NWK layer and  finish off the APS layer. On the APS layer,  I still need to implement the binding table, group table, discovery cache, and address map. I'm hoping those won't be too hard. I'm a pro at implementing tables and queues now...Ha ha ha...that's actually pretty sad...

Also, I'm starting to look over the Zigbee 2007 protocol compliance documentation. My eventual goal is to get this stack certified as Zigbee compliant, although I'd have to cough up the ~$2k to join the Zigbee Alliance. Ugh...gotta figure out a way to make that happen.

I'll be going to California this Friday for a two week stay over there. My part-time job is requesting all the FAE's (yes, I'm a part-time FAE...gotta pay the billz) to go there for training on the product line. Anyways, it'll give me a chance to visit my sister and parents. My sister is up in the bay area so I'll be in Berkeley next week, and then go down to southern california the week after for the training. I'm hoping to claw my way through most of the remaining APS layer by the time I leave so I can have a clear conscience. Otherwise, it's gonna weigh down on me and I'm going to end up working on it while I'm over there. That's the problem with obsessive behavior.

Other than that, nothing much else. My life seems pretty boring if you take the Zigbee part out of it. Hmmm...it seems pretty boring if you leave it in, too...

Greetings from the land of endless rain.

For some reason, the rainy season came early to Japan and its been raining for almost two weeks straight here. I didn't realize it before, but when it rains this much, it becomes very difficult to walk your dog. That means that I have to spend extra time playing with my dog in my futile attempts to tire her out. I didn't go through this last year (its the second year of having a dog for me) because the rainy season was pretty sparse, but interestingly enough, the rainy season is actually eating into my work schedule via dog playtime. Go figure...

On the subject of productivity, I came to the conclusion that my previous method of managing my schedule and tasks (ie: doing whatever popped into my head) was insufficient. I now have a fairly large open source project to take care of, two part time jobs, and a lot of company accounting and administration due to my recent incorporation. On top of that, I have to balance my wife-time, dog-time, and for some reason, I seem to have to do all the household chores. My wife still doesn't believe I have real work since I'm at home all the time and she's out of the house doing her "real" part-time job (ie: waitressing). Which is why she nags me to clean the house since I'm at home. I guess it doesn't matter, even if you pay the rent and the bills. Ha ha ha...sad...

Anyways, I've been experimenting with different methods of managing myself and my time since I had my recent anxiety attack. I believe most of it was due to my inability to fully understand the amount of stuff I had to do. It just felt like I kept on getting hit with so many things to do that I ended up shutting down. Since then, I've been trying to use a task list to manage my tasks and a calendar to start understanding my schedule. It was okay for a while, but I noticed my task list kept on growing as things kept on popping into my head and ending up on my list. Unfortunately, I also noticed my calendar was identical everyday...stay at home...write software...I need to get more of a life...

On a friend's advice, I checked out the book "Getting Things Done" by David Allen. It seems to be the preferred geek method of handling productivity and efficiency. I think it's mostly because there is a lot of logic and organization that's required, which is pretty much the same as writing software, handling software projects, or even maintaining your file system on your computer.

The author pretty much says that simple to-do lists and calendars are insufficient to handle real world projects since they don't capture all the information that's actually floating around in your head. That pretty much was enough for me to hear to try it out. You basically need to dump all the projects into some type of organizing system (I actually made my own paper organizer), and keep running lists of all projects, broken down as far as possible into their simplest tasks. You also need to keep one main running list which is the actual task list (next actions) and its composed of tasks from each project as well as one-off tasks like going grocery shopping. I don't want to get too much into the details here, since it gets a bit technical, however you can check it out on Wikipedia or just by googling GTD (that's the slang for Getting Things Done). It seems like its quite popular among the geek community.

I've been on it for about a week and a half and I'm slowly building my confidence in being able to handle so many things at once. Its a bit overwhelming when you have all your real projects written down on paper and you see the volume of things that need to actually get done. It was no wonder that I had such a problem with it. I'll probably stick with this methodology for a while since it seems like its working out for me. Anyways, now that I'm independent, I figure I need to step up my game a bit.

On to the more interesting things...

Last week, I managed to write a self-checking testbench for the ZDO layer and finish testing it. I fixed all of the bugs that I found during testing so I'm pretty confident that the ZDO functions I implemented are working as I expect.

Unfortunately, the same test fixture can't be used for the testing on the NWK and MAC layers since they require more interaction with other nodes. Hence I've decided to use my current simulator to simulate two nodes and exchange data between them as my basic test. My requirement is that my tests be self-checking so I also decided to write a script parser with some language extensions that can control an independent program (process) running the checker. My last attempt at a script parser failed miserably so I'm hoping this time, I can get things right.

On top of that, I need to re-format all my function headers to be compatible with Doxygen so that I can have at least some crude documentation when I release the code. Well, that and fill in the argument and descriptions for those headers. From the look of the amount of work I have to do, I'm thinking that the release will most likely take place towards the latter part of this month. In case there are an doubting Thomases out there, I've already promised myself that I'm going to release the code in September, so it'll happen even if the code is non-working and there's no documentation. However I'd prefer that not to be the case.

Until then, looks like I'm going to need to bust some ass.

Yaa-Taa is a Japanese expression for excitement. You hear it a lot in anime, which is probably where I picked it up.

Anyways, I just finished multiple 10kB file transfers over four Zigbee router hops in my simulator and the remote files were identical to the originals. It looks like my data transfers are pretty solid now. One step closer to getting this bad boy out...

Now since its a Sunday, I'm going to have a glass or three of wine and watch some anime.