Hi all. I just uploaded the latest FreakZ code to the SourceForge server. I gave up on trying to do a full release by the end of this month. Actually, I'm pretty exhausted and need a couple days rest to clear my head. I'm feeling mentally fatigued lately since I've been immersed in Zigbee for so long. And of course, I want to get my taxes taken care of before it gets too close to the deadline...ugh.

This is basically a code drop with the documentation to come probably around next week. This latest code includes the hardware driver for the AVR and radio, enhancements of the NWK, APS, and AF layers, and I filled in the missing required client/server functions in the ZDO that are needed for compliance. I've also added most of the main ZCL foundation commands and part of the Basic cluster. The ZCL architecture and framework is there so most of the rest of the ZCL is just implementation.

My original plan was to take the stack and go for Zigbee Residential certification, however I recently heard from a Zigbee Alliance insider that nobody really supports Zigbee Residential. Hence there won't be anyone to play with at the interoperability fests. Since it's no fun not having any friends (and believe me, I know exactly what that's like), it looks like I'm going to have to scrap the Zigbee Residential certification plans and go straight to Zigbee Pro. I checked out the compliance checklist and if I just go for the minimum requirements, it's only a couple of extra features. However if I'm going to go Pro, I'll probably look at providing the minimum features for the Smart Energy profile instead. It's slightly more intensive, but probably more useful as well. Of course it's going to take more time to get to certification because of this, but the overall stack architecture is fairly stable now, and I'm getting closer to the last few bends in the learning curve. I'm hoping that translates into me being more efficient at coding the stack.

The main part of the stack that is still missing is the security. However I want to get the functionality up first so that people can actually take it and feel confident that it's working. The current stack is targeted towards the Atmel Raven (the LCD board) and Jackdaw (the USB board), but I'm also planning on making some development boards that are easy to test and add on accessory boards. The Raven and Jackdaw are a great value for the price, but they use non-standard headers and they're tiny which makes development and customization on them difficult. 

I guess that's about it for now. I'll try and get the documentation out by next week and probably do a little more testing on the release. I know it's going to be tough for people to check out  the stack without the docs, but the code is already available at Sourceforge if you want a quick browse. I also tested it on Linux this time to make sure everything was kosher. Here's the link to the Subversion repository at Sourceforge and you can pack things into a tarball and download it from there also. And for the people that ask me why I don't use the Sourceforge Subversion repository as my main version control tool...it took me almost an hour to upload the code...

Subversion Repository Link

And with no further ado, here's the info about the latest code:

New stuff:

- Upgraded memory handling to non-fragmenting dynamic memory allocation
- Added hardware drivers for AVR Raven and Jackdaw boards (AT86RF230 and AVR drivers)
- Integrated FreakUSB device stack for AVR Jackdaw board
- Implemented new client/server handlers in ZDO required for certification
- Implemented ZCL architecture and framework
- Implemented part of ZCL basic clusters and the main foundation commands
- Implemented ZCL reporting
- Added binding table
- Added end device binding
- Added group and group ID tables
- Added new debug dump functions for simulator debugging and ability to turn on/off debug dumps by layer

Footprint:

50 kB flash, ~5.5 kB RAM on an AVR with an AT86RF230 radio

Good night!

 

Looks like I'm back on track with the stack development. I got sidelined a bit because I had to file my corporate taxes (due by March 16th) and fought off a bitter bout of bronchitis. It's difficult to say which felt worse. But anyways, I made it through my first tax year and actually was able to handle the corporate accounting. Let's just say that QuickBooks and TurboTax are your friends if you ever start a small business.

My understanding of the Zigbee spec has greatly improved since my last release in October because I can see the whole stack from top to bottom. Without fully understanding how the ZCL and ZDO tie into everything, it's tough to determine which features are important and which are not. Now that I've gone through the painful exercise of understanding and implementing the top level application objects, I have a clear view of the deficiencies in the lower layers that need to be addressed. These weren't immediately obvious to me when I first wrote those layers because I didn't fully understand the tie in between all the layers.

The sad thing is that if I took a top-down approach to designing the stack, then the opposite would probably have been true since a lot of the ZDO and ZCL features require an understanding of what goes on in the lower layers. The truth is that the Zigbee stack is quite large and much more complex than I initially assumed. I can only fully appreciate it now that I can look back at things a bit.

Luckily, I feel like I have a good grasp of all the layers now and the interdependencies between them. Here's an example: I had to make some major modifications to the heart of the stack (MAC and NWK layers) in order to facilitate implementation of some ZDO features like the LQI request/response.

The LQI request is used by a commissioning tool to read the neighbor table of a remote object. It requires the use of neighbor table fields that are marked as optional in the Zigbee spec and of course the neighbor table is in the NWK layer. The nipple twister is that even though the neighbor table fields are marked as optinal in the NWK layer, LQI handling is a mandatory feature in the Zigbee 2007 Residential compliance document. This means that you must implement those optional fields if you want compliance. Hence, they're not optional. This is just an example of how you can get stung if you don't have a complete view of the stack. You start to see little things like that as you get deeper into the implementation.

Anyhoo, I finally finished implementing the end device bind feature and testing it. This was probably one of the most painful features to implement in the ZDO, since it's a complicated handshake between three devices and there are a lot of hidden dependencies that aren't immediately apparent when you first read the spec. It looks like it's initially working in the simulator, but I'll need to test it more on real hardware to see how it performs. At least this validates that the binding tables are working properly.

Unfortunately, there's still a lot to do and a code drop at the end of the month is looking increasingly more likely with the documentation to follow in April. The task now is to finish off the ZDO features that I started at the last code drop. The architecture of the ZDO has changed to make memory handling more efficient and to eliminate code redundancies, but this breaks some of the old features.

Once I can finish this, then I need to go through the lower layers and implement any features that I missed but are needed for compliance. It's like dotting the i's and crossing the t's on the lower layers. If I can get through all that, then I can finish building out enough of the ZCL to implement a simple home automation example app. 

It sounds like a lot of work, but I seem to have become quite proficient at navigating my way around the stack now. Most of the learning curve and design/architecture is behind me  (except for security) so it's pretty much a matter of coding, fixing, and testing. As I mentioned before, I think the major time sink in developing a Zigbee stack, or probably any stack, is the learning curve and designing the architecture, both of which are interdependent. With that behind me, I'm hoping that I can start boosting the development speed of the stack. 

It's been a while since my last update on the status of the stack development. I'm still targeting a March release, but it looks like it's going to be really close. So as usual, if I can't finish by the end of the month, I'll post what I have and continue with the release into April.

Last week was a rough week. The part-time work suddenly spiked up to full-time and then some with a whole slew of customer sales visits that lasted from the morning well into night. In Japan, dinner and drinks are also part of the workday which just about killed me. All that human interaction really tires you out. Ah well, bills must be paid...

Anyways, enough of my pity-fishing. I've finished implementing the binding and group tables and am working on the parts of the ZDO and ZCL that have to deal with those two. I can't finish the ZCL without them. The end device bind feature is really a big PITA (Pain In The Application sub-layer).  I thought it would be easy street once I had the binding table up, but unfortunately that's not the case. 

Other than that, I'm trying to clean up the ZDO which I seem to have left in a horrible mess. Now that I'm trying to tie everything together, I'm seeing a lot of things that don't really need to be there, and some code that just puzzles me. I should have it all taken care of by the release, though.

Seems like this is going to be a short post because my eyes hurt and I have a glass of wine calling my name....

...chris....chris....drink me...i'm your friend...

Hmmm...I used to be pretty good about keeping everyone updated on the development status. Not sure what happened there, but I thought it would be good to let everyone know what's going on with the project.

So I got my ZCL architecture down now and it works in my little test bed that I had set up. Only the general clusters have been implemented so far, but (at least in my test setup) I can toggle lights on and off and move the levels correctly.

Now, I'm integrating the ZCL into the main stack and bringing back my test simulator again. Yep, re-awakening the giant. I thought I would be through with that thing and just run everything on real hardware, but it turns out that it's going to be useful once again. Basically, it's just so much faster to debug the stack in my PC environment than on real hardware, and catching bad pointers is much easier. The main goal is to confirm that the ZCL gets integrated into the stack correctly and that it behaves like I expect it too. I'm also going to write some code to automatically test the formatting of the ZCL frames. 

Once I can confirm the ZCL is behaving properly, then I need to tie up a lot of loose ends on the ZDO and implement binding and group tables. The current goal is to try and hit a March release, either mid-month or at the end. There are some things that will probably slow me down, such as servicing the part-time job, and of course taking care of my taxes (a major pain in the sphinx, esp since I probably won't be getting a tax return in California, even though I am entitled to one). That's why I'm trying to give myself a bit of wiggle room for the next release.

However I'm hoping that the next release will be a major one and people will actually be able to use the stack on real hardware (targeting Atmel Raven boards for now) and communicate using Zigbee Home Automation compliant frames (sans security). I should have some simple example apps like a simple light switch, dimmer switch, and on/off/dimmable light.

I don't want to give the wrong impression though. I'm not a huge home automation enthusiast, but the HA profile and devices are the easiest to implement for the first stage. The real goal will hopefully be to try and implement more interesting things like patient monitoring, energy savings, or basically devices that have more of a social meaning to them. 

Wish me luck!

I've been working on the project quite a bit recently and I must say that hardware is a completely different ballgame than working out of a simulator. The driver has been fairly stable for about a week and a half now, so that helps out a lot. I spent a couple of days getting the code working on both Atmel Raven boards. If you've never seen the Raven kits, they consist of a USB dongle and two satellite boards...all with radios.

Because the boards are different, they require two separate stack configurations. The USB dongle board requires the FreakUSB stack which also takes some configuration since I implemented a command line menu system to control the nodes. The command table is the same as the simulator so it was pretty nice. It was like a real life version of the PC-based simulator I wrote and it was fun to see things behaving the same way. 

The other two satellite boards actually have two AVRs on them as well as AT86RF230 radios. One of the AVRs is just for handling the wireless stack and the other AVR is for handling the LCD, joystick, temp sensor, and other peripherals. The inter-processor communication is carried out via UART. Luckily, the Contiki project already had a simple protocol for the IPC so I just used that. 

After trying to maintain a network with two different stack configurations, I finally gave up on it because it was getting complicated. Maybe I should spend a few words on why it was so...

In addition to maintaining two different stack configurations for Zigbee, you also need to maintain an additional code base that is completely separate to handle the satellite board's LCD and menu-ing system, as well as the USB device stack for interface to the dongle. The two-MCU satellite boards are especially tricky because when you want to add a new menu item or modify the behavior of an existing one, you have to update both the radio-controlling MCU and the LCD menu-ing MCU. 

Sure, that's no problem you say. Well, it was cool when I just had two nodes and was playing around, but at three nodes, it's started to get irritating. A USB stack, Zigbee stack, and LCD/Peripheral controller spread across three boards with five processors. To tell you the truth, I actually have two sets of Raven kits so I have a total of six boards and 10 processors. Any code modification is a nightmare since you have so many MCUs to download to...

Hence I'm trying to keep my life simple. I've simplified my network configuration to just using the satellite boards so I don't have to deal with the whole PC interface and USB. Then, I only need to maintain the FreakZ stack, LCD/Menu system, and the drivers all stay the same. That gives me four boards across my two kits, but I seemed to have brick'd (pancake'd?) one of the boards somehow so I'm down to three. Well...at least enough to build a network.

I'm beginning to see an over-the-air bootloader in my future. It will definitely save a lot of time, especially since I'm going to need to increase my network size soon to test the mesh networking thoroughly.

But in the meantime, it's nice just to be able to see three nodes up and running and providing bi-directional communication. Right now, I have them sending data to each other via the joystick on the Raven boards. The LCD on the receiving board will display the direction the joystick was pushed and all boards can both send and receive data. 

I've talked about Zigbee a lot in the past, but even with a simple three-node network, I'm amazed at the possible applications. My brother-in-law's beer temperature monitor looks like its coming soon. Can't wait to get the full mesh up and running...

Here's a screenshot of the Zigbee analyzer with the full Zigbee frames. Each Zigbee layer is color coded and goes up to the AF (Application Framework layer). I'm curious if there's a different color for the public profiles. It's like playing Super Mario Brothers where you keep wondering what the next boss is going to be like...

FreakZ Zigbee Analyzer Capture

 

Now that it's finally quiet here in Tokyo, and the rest of the tech industry as well, I finally got a chance to really focus on finishing the driver and getting it to work with the FreakZ stack. It actually took longer than I expected, but I finally got the driver working with the stack. Although I had somewhat of a head start since there was an Atmel driver in Contiki, my requirements are different than those for the Contiki implementation of 6LoWPAN. So I had to add some extra features, remove some unused ones, and also do other modifications.

One of the main issues I had was that the MAC layer needed to be ripped up since the Atmel radio implemented a lot of the MAC functions in hardware. More specifically, it implemented CSMA/CA, transmission retries, auto-ACK, and CRC generation and handling inside the chip, whereas I was previously doing it all manually in software. 

I ended up separating the MAC into two layers, a lower layer that uses the AT86RF230's hardware MAC, and an upper layer that handles all the remaining MAC functions. Other chips like the CC2520 have pretty much similar functionality where the hardware automates alot of the repetitive MAC functions, so I figure that separating it will make it easier to port the stack to other radios in the future as well. So now, the MAC consists of two parts: a hardware dependent portion and a hardware independent one. The hardware dependent part will be located in the same directory as the driver and the hardware independent one will be in the normal place (ie: the mac folder). 

Here's a shot of the Zigbee Network Join process between two nodes. This is one of the trickiest parts because it combines command frames, beacon frames, and indirect transfers and goes from the driver all the way up to the ZDO. Now that this is working, I need to start testing out data transfers and routing.

Zigbee Network Joining

Man...been a while since I did a status update.

I just finished writing the driver for the AVR and the AT86RF230 on the Raven boards. I unashamedly borrowed a lot of code from the Contiki Raven driver, but had to make a bunch of mods to massage it into the FreakZ stack. Boy, my copy and paste fingers sure are tired. I kept all the original SICS headers and credits in case anybody is worried. Anyways, after integrating the driver, modifying parts of the MAC to account for the hardware MAC in the AT86RF230, and writing a sample application, here are the stats for the FreakZ stack.

The memory footprint is as follows:

Flash size: 42.3 kB

RAM size: 4.3 kB

Actually, that size includes the FreakUSB stack since I'm using the Raven AVR USB dongle. If you adjust for the size of the FreakUSB stack, then the memory footprint becomes:

Flash size: ~36 kB

RAM size: 4.2 kB

FYI, I'm using a managed memory heap size of 2048 bytes with 30 memory elements. 

Not bad, since this is the Zigbee router code (as opposed to a Zigbee end device which should be much, much smaller). My original target was for the stack to be less than 60 kB so looks like I can still make it, even after I finish out the ZDO and implement the cluster library and profiles. Also, it looks like the managed memory lib I used from Contiki did a lot in reducing the RAM footprint.

Okay, okay, stop comparing the size to uIP. The code size is still obese compared to it, but not too shabby in comparison with other Zigbee stack sizes. It can't approach the size of uIP until the Zigbee spec goes on a diet and I transplant Adam Dunkel's brain into my head. 

Anyways, now comes the nasty part...testing and debugging. I'll keep everyone updated on the progress...

After my last post, Adam Dunkels commented that the way I accessed the data in the managed memory was incorrect. This was causing the error case that I had written about, and also can possibly create dangling pointers. My original way to access the managed memory looked like this. This examples gets the next hop address from the routing tables:
mem_ptr = list_head(rte_tbl); // grab the first mem ptr from the list
rte_entry = (rte_entry_t *)mem_ptr->mmem_ptr.ptr; // de-ref the ptr
next_hop = rte_entry->next_hop; // get the next hop
That was incorrect because I should not have stored the de-referenced memory pointer. The reason is that the position of the pointer is un-reliable due to the memory compacting that the managed memory performs. So instead, the correct way to use the memory pointers was via the built-in memory macro. Here's the same code again using the correct access method:'
mem_ptr = list_head(rte_tbl); 
next_hop = ((rte_entry_t *)MMEM_PTR(mem_ptr->mmem_ptr))->next_hop;
It's actually more concise, but since the stack has a lot of tables and table accesses, having a freaky line of code like that sprinkled around the stack starts to look intimidating. Because of that, I decided to try to pretty up the code a bit more and created some macros for the various table entries. So now, it looks more like this:
mem_ptr = list_head(rte_tbl);
next_hop = RTE_ENTRY(mem_ptr)->next_hop;

There are other macros for other table entries as well. They are basically just convenience macros that handle the de-referencing of the memory pointers and typecasting them for the particular table element, but they make the code a bit more readable. And now, the code is (hopefully) 100% safe from dangling pointers...at least from the managed memory.

If you're interested to check out the modifications to the stack to go from statically allocated table elements to a dynamically allocated heap, you can access the subversion repository (version control) from the following address:

https://freakz.svn.sourceforge.net/svnroot/freakz

I haven't done a formal release for this code yet since I haven't fully tested the code and updated the documentation. But its already working for the long file test and multi-hop tests and you can grab it straight from the repository.

Well, I'm just finishing up the conversion to using dynamically allocated memory using the Contiki managed memory library. It took a bit longer than
I expected because the memory handling is part of the stack framework and is used in many places. Changing the memory handling while keeping the stack behavior the same was a bit of a challenge. However it's pretty much  finished and the long file transfers are still working.

In my previous post, I don't think I did a good job of explaining why dynamic memory allocation is important. So let me try and re-explain it in terms of how the stack was.

The initial implementation of the stack had a fixed number of elements for each table. An example would be like allocating a max number of 10 entries or the routing table and 5 entries for the indirect queue. A problem occurs if a particular application requires only 8 routing entries but 6 indirect queue entries. In that case, you ran out of memory.

Its very difficult to recover from situations where you run out of table/queue entries because the behavior may be unpredictable. This means that to be on the safe side, you would need to over-specify the max number of entries for each table/queue/list in order to have some margin of safety. This ends up wasting a lot of RAM. Another problem with this is that the stack uses something like twenty different  tables to keep track of everything (yes, I know its a lot...please  complain to the Zigbee Alliance). The amount of effort to tune the stack for each application would be tremendous and the inefficiency of the RAM usage is no good for something that is supposed to be in small, embedded applications.

If a standard memory allocation was used such as malloc, it would result in memory fragmentation. You can see an example of memory fragmentation in the drawing. Memory fragmentation is also an inefficient use of memory because even though the total amount of free memory may be available, you also need the memory to be adjacent so that you can fit the block into it. This becomes a problem in situations where many different sized blocks are being allocated and freed. With fragmentation, it becomes tougher to fit the large blocks due to the need for adjacent memory.

The Contiki memory management library overcomes this problem by having a feature called memory compacting. Every time a memory block is freed, all the used blocks that come after it will be moved down. This leaves all of the free memory in one ajacent heap so that you won't run into the fragmentation issues I mentioned earlier.

The memory management comes with a price though. It accomplishes this by using an added level of indirection. This means that when you allocate memory, you don't get a pointer to the actual memory. You get a pointer to a pointer to the memory. When you allocate managed memory, it will create a memory block structure which contains the size of the block and the pointer to the memory. It will then return a pointer to the managed memory pointer which points to the underlying memory.


The reason for this is that when memory is compacted, the address of the actual memory changes, but the changes are reflected in the managed memory pointer. So as long as you use the managed memory pointer, you will always be pointing to the correct address for the allocated memory. So to use the managed memory, all of the tables needed to be modified to use the managed memory pointer instead of actual memory. This added code size to the stack, however decreasing the RAM utilization takes priority over having a smaller code size for me. Its easier and cheaper to trade up to a microcontroller with a bigger flash, but it sure is tough to find one with a bigger RAM.

Here's an example of how the table handling changed.

Before:

rte_entry = list_head(rte_tbl);  // grab the route entry

next_hop = rte_entry->next_hop;  // get the next hop in the route entry

After:

mem_ptr = list_head(rte_tbl); // grab the memory pointer to the route entry

rte_entry = mem_ptr->mmem_ptr.mem_block_ptr;  // de-reference the memory pointer to get to the data

next_hop = rte_entry->next_hop; // now we can use the entry

 

There is one dangerous point about using the managed memory and I included a diagram so that it can be more easily seen. To actually access table elements, you have to de-reference the managed memory pointer and access the direct memory. However if in between de-referencing the pointer and actually using the table data, a block of memory gets freed somewhere, the de-reference pointer is useless and dangerous because it is now a dangling pointer. Since Contiki is not pre-emptive, you won't have to worry about a stray thread causing this situation, however it can occur if you have a function in between de-referencing and using the pointer. If the function frees memory somewhere, then it will be dangerous. So its best to use the pointer as soon as possible after de-referencing it.

Here's an example of an error case:

mem_ptr = list_head(rte_tbl);  // grab the first mem pointer from the route table

rte_entry = mem_ptr->mmem_ptr.mem_block_ptr; // de-reference the mem pointer

some_random_func(); // a memory block gets freed somewhere in here

next_hop = rte_entry->next_hop;  // this will be incorrect. the entry now points to junk.

Updated 2008-10-26: Just learned from Adam Dunkels, the author of the managed memory library (as well as Contiki and uIP), that the memory should only be dereferenced as its used. And the MMEM_PTR macro should be used to access the memory blocks only. That way, the error case won't be encountered. So looks like its time to re-modify the code to handle this. It should lead to a more robust design without needing to "play with fire" or at least run the risk of dangling pointers.

According to Adam, the problematic case should be rewritten like this:

mem_ptr = list_head(rte_tbl); // grab the first mem pointer from the route table

/* rte_entry = mem_ptr->mmem_ptr.mem_block_ptr; never de-reference and store the mem pointer like this */

some_random_func(); // a memory block gets freed somewhere in here

next_hop = ((struct tbl *)MMEM_PTR(mem_ptr))->next_hop; /* this is the way to access the memory pointed to by mem_ptr - this works even if a memory block was freed in some_random_func(). */

End update. The example error case will now only show what can happen if you abuse the memory management system  ;)

I've tried to keep the managed memory pointer handling away from the main parts of the code and isolated to the table files. That way, from a user point of view, you just call functions like rte_table_get_next_hop() and it will return the next hop data without worrying about de-referencing, dangling pointers, etc... That way, stack modification should be much easier for everyone. In most places, you only need to worry about handling the managed memory pointers if you are modifying the behavior of a table.

Updated 2008-10-26: I should also mention that the managed memory library will only be used for table and list elements. Those are alloc'd and free'd on a relatively leisurely basis. For frame buffers, that I'm assuming will be alloc'd and free'd frequently, I will be implementing two memory pools of 64-byte and 140-byte sizes. This is because there is a performance impact when the managed memory is free'd and then compacted. I'm hoping this will provide a good balance between performance and ease of use. All in all, the mods will take the stack from tweaking approx 20 table sizes and the frame buffer pool to tweaking the shared heap memory and the amount of frame buffers in each pool.


Well, that's all for now. Have a nice weekend!

Just released v0.52 of the stack. Someone found that the simulator is almost unusable on Linux. I had imagined that if it worked on Cygwin, it would work on Linux however I seemed to have been dreaming this utopia. The reality is that there are significant behavioral differences between Cygwin and Linux, so the stack and sim needed to be modified to work on both systems.

Thanks to Jedrzej Solecki who pointed it out and also did a lot of work to get the simulator running on Linux. He had the dubious task of making me get off my ass and install VMWare and Ubuntu so that I could see for myself how bad it was on Linux. I was horrified.

The simulator and stack is now tested and working on both Cygwin and Linux. Most of the fixes were contributed by Jedrzej and I was pretty much the Tonto to his Lone Ranger. Thanks a lot!

The updated code can be found at Sourceforge:

http://www.sourceforge.net/projects/freakz

Whew...what a title. When I first started this project, I had no idea how important it would be to have a non-fragmenting dynamic memory allocator. In most protocol stack cases, you can get away with just having a statically allocated buffer for TX and RX frames. Apart from that, you would only have a few cases where you need to manage tables and queues. For a USB devices stack, you normally only need to allocate a couple packet buffers and some fixed arrays to hold the descriptor data.

However in Zigbee, there seem like an infinite amount of tables required to do anything. Indirect queues, routing, neighbor, broadcast, discovery, scan tables, etc... I don't even want to get into how many there are. The problem with statically allocating elements for each table/queue/map/list is that its a horrible waste of memory. In some cases, you may over specify the amount of elements needed for something like...say...a neighbor table. Then you might underspecify the amount of elements for a routing table. The problem here is that the stack will require tuning of multiple (ie: a fucking lot of) parameters in order to run properly. Underspecification of any one parameter could throw the stack out of whack. It's not a very good situation.

Because of the danger of underspecifying table elements, most people (like me) would overspecify the amount of elements needed, which usually causes a large amount of waste. If those elements are being used, they're just sitting there doing nothing. Its much more efficient to have a pool of memory that is shared by all the tables/queues/maps/lists so that the utilization can be improved. It also helps on tuning the stack because instead of specifying how much memory is required for each element, you only need to specify the size of the statically allocated shared memory block (the heap). 

However its not such an easy task. There is this nasty thing called fragmentation in dynamic memory allocators.  It's basically the reason why I haven't implemented a dynamic memory allocator yet. Fragmentation occurs based on the algorithm that the allocation algorithm uses. Here's an easy example:

You use memory pools and divide them up into 2 sizes: 32 bytes and 64 bytes. If you need to allocate memory for a 28-byte table element, then the first choice is to use a block from the 32-byte memory pool. If no blocks are available, then you would use a block from the 64-byte memory pool instead. Some fragmentation just occurred because you're using up 64 bytes to hold 28 bytes of memory. Say that this happens multiple times and you use up the 64 byte pool. Some time later, you have multiple 32 byte blocks open up, but you have a request for a 33-byte table element. Unfortunately, you're memory starved because you can't accomodate the request even though you have memory available. 

Here's another example:

You use a generic memory allocator that has a large heap and uses a buddy-system algorithm. The buddy system iteratively halves the size of an available memory block until it can accomodate the requested memory allocation. For example say you request 56 bytes of memory for a table element and the heap size is 1024 bytes. For the buddy system, you would separate the heap into two 512 byte blocks. Since the two  blocks are still large, you would separate one of the blocks into two 256 byte blocks. Since thats still too big, you would continue on until you got a 64 byte block. That would just accomodate the requested size while being less than double the size of it.

However if you request 65 bytes of memory, then you would require a 128 byte block to accomodate it. If this happens multiple times, you can use up your 1kB heap while utilizing less than half of it. In fact, even nastier things can happen. Say you request 65 bytes 8 times. That would use up your 1kB heap (128 bytes * 8 blocks = 1024 bytes). Then say that four of those blocks are freed, but they're not adjacent (ie: blocks 1,3,5,and 7 get freed). You then get a request for 140 bytes of memory (say a 802.15.4 frame with a full payload + control flags). You've just become memory starved because you can't accomodate the request.

Your 1kB heap was chopped into 128 byte segments initially. After the memory was freed, the blocks weren't adjacent so they couldn't be coalesced into larger blocks. You just ran out of memory while using 260 bytes out of your 1024, or slightly more than 25% utilization. Ouch!

As you can see, although it's nice to have a dynamic memory allocator, the issue of fragmenting is one that needs to be addressed. It's the reason why malloc is banned from safety-critical and high-reliability applications. Any embedded sytem that does a lot of alloc's and free's would need to confront this issue because the memory performance will be based on the malloc algorithm being used. It's one of those little gotchas in protocol stack design, however it also affects other architectures such as graphics, databases, etc....

Well, the main point of this post is that I've been looking for a usable dynamic memory allocator or algorithm on the side while trying to move forward with the stack design. Now that the initial release is over with, I've had a chance to revisit my search. After much searching on the internet, checking out other open source stacks, and reading documentation and appnotes from various manufacturers, I still couldn't find a satisfying memory allocator. Most of them used fixed size memory pools which is cool, but a bit wasteful. Otherwise, they require statically allocating table elements which is what I have now. That's also not very interesting.

I happened to run across the managed memory library file in Contiki again. I had previously tested it in my sandbox to see if it could be used for my table and linked list elements. The memory allocator in Contiki is seldom used in the actual OS, but it is very elegant. You can request any size memory you want from a fixed size heap. When you request it, a structure is used to point to the block in memory. The pointer to the structure is returned from the allocator. To get to the memory, you need to deref the pointer to the structure, and then the structure's pointer to the memory block. Or effectively, it adds a level of indirection to the memory allocation. The reason this is done is because when a memory block is freed, the whole memory is compacted so that all the used blocks are adjacent to each other. This way, there is no fragmentation.  

Whew...why would I even bother to explain that? Because when I first tested the memory manager, I rejected it. I allocated the memory and used it in my linked lists. However I stored the actual pointer to the memory in my linked lists instead of the pointer to the structure that points to the memory (ugh...it hurts to even type that). Basically, my linked lists and tables got trashed once I freed an element because the memory would get compacted, which means that it would change position. Since my linked lists use pointers to the next memory block, if that block changes position, then I'm just pointing to junk. 

Well, all I can say is..."operator error". I decided what the hell and decided to retry testing the memory allocator in Contiki. This time, I did a more thorough study of what the code was doing. The first time around, I kind of breezed through the comments, and there wasn't any documentation on this lib file (that I could find). This time, I studied the code thoroughly and wrote a couple of small, controlled test programs to see what the behavior was. Once I caught on to the concept (which was basically to provide an abstraction layer to the memory), I was able to get it working. That means that I can have fragmentation free dynamic memory allocation. Yay!!!

The two obstacles I have left on this stack is security and the fragmentation free dynamic memory allocation. Now that I have one of them out of the way, I just need to complete the security portion of the stack. If I can finish that, the rest of the stack development is just filling out the Zigbee features and making sure I comply with the Zigbee compliance checklist. 

The next step is to implement the Contiki memory allocator in the stack, and then port the stack to hardware. The memory allocator alone will probably save 1-2 kB of RAM which is why its important to do it before I go into hardware. In that case, it will be possible to run the stack on something like 2 kB of RAM and still be able to implement a decent sized network. 

Ahhh...the trials and tribulations of stack development...