ATmega1284P memory problem/fix
(Last updated 30 Sep 2009)

Over several months in late '08 and early '09, I struggled with a serious problem in an app I was coding for the Atmel ATmega1284P MCU.  The symptoms were corrupted RAM, as if I was getting a stack overflow or a runaway pointer.

The application in question was a BASIC interpreter, quite large and complex.  Had I only been developing on the '1284p, I would have written off the problem as a code issue.  But I was porting the same application to the ATmega128 and the AT90CAN128 simultaneously, and neither of those ports showed these issues.

I began scaling back the size of the app, trying to pin down where the corruption was happening.  Eventually, I reduced the system to the following minimum set capable of showing the problem:
Here is the entire program for demonstrating the problem:

#include  <avr/io.h>
#include  <stdint.h>

#define  TESTPIN        0

void            toggle(void);

int  main(void)
    volatile  uint16_t            n;

    DDRB = DDRB | (1<<TESTPIN);
    while (1)
        for (n=0; n<20000; n++)  ;

void  toggle(void)

With the above setup running, the LED blinked at the expected rate.  I used a fairly slow rate; anything will work.  All that matters is that you be able to spot a difference if the app hits the tall weeds because of code runaway following a stack corruption.

The corruption can be triggered by injecting sharp-edged pulses onto port line RX0.  The simplest way to do this is to hook up an RS-232 level-shifter wired to a PC's serial port, but any other source of pulses will work.  I can cause the corruption by flashing RX0 using a bare wire connected to ground.  When the corruption occurs, the LED issues at least one blink of incorrect length as the code runs away, then eventually lands on the reset vector or on main() and restarts.

This problem is not unknown in the Atmel world, by the way.  The Uzebox group has noticed similar behavior with the Xmega644P; check here.

I described what I was seeing to the AVRFreaks list (an excellent community resource, by the way!) and asked for help.  Most of the responders were hung up on the RX0 line and the fact that the corruption coincided with serial traffic from the PC; I got lots of responses about baud rate and USART setup.  One respondent, however, understood what I was describing and provided the fix.  Ossi suggested putting a 1K resistor in series with RX0, and a 100 pf capacitor from RX0 to ground.  I followed his suggestion, using a 220 pf cap instead.  All signs of the RAM corruption disappeared.  If I modified the code to use RX0 as a USART, serial data from the PC was received properly.

Based on the Uzebox link above, I believe this corruption involves noise injected onto the internal RAM bus and is somehow related to the picopower version of the '1284p device.  I contacted Atmel about this issue and was told that the '1284p devices I used were pre-Production and that target date for Production devices was Nov '09.  After a couple more emails to Atmel about this subject, I was informed that there was a new target date for Production devices, now Feb '10!

Note that the problem does NOT appear at 16 MHz!  This only seems to be a problem when you push the '1284p close to its maximum rated clock speed.

I further suspect that this corruption could be induced by the application, if the app used PD0 as an output and created pulses on PD0.  Note that I haven't tested this theory, but that would be a bear to track down if it happened!  :-)
