View Full Version : Memscape 2.4.2-2: Corrupted Memory Blocks are Confusing

04-02-2009, 06:49 AM
We've been struggling with random crashes on malloc_consolidate in our software, and have been working on this issue for a few weeks now. We are using memscape, as well as other tools, in order to help diagnose the issue.

When I run our application through Memscape, I pause, and check for corrupted blocks every hour or so. However, when we finally did see the corruption, it has confused us.

Our Pre-Guard block is 8 bytes, filled with 0x77777777.
Our Post-Guard block is 8 bytes, filled with 0x99999999.

This is our preceeding block, which appears to be de-allocated:

Our Corrupted Block is:

It appears that our Pre-Guard block has been corrupted by the preeceeding block, which has been de-allocated, including our first 4 byes of our pre-guard.

Are we misinterepting these results? Or can anyone shed some light on what else we can be looking for? With this crash, we are getting crashes at random times, with 4 or 5 similar back traces.

04-07-2009, 06:49 AM
Jeff sent this to support@totalviewtech.com and I answered this there, but I thought I'd share the response for everyone's benefit.

As you probably know, the guard blocks are extra bits of memory that are allocated along with your original allocation request. In reality we just increase your allocation request by a certain amount and then hand back to your program the address of the middle of that allocation. This gives us space to fill in with the pre and post guard patterns. Note that this size may be smaller or larger depending on page alignment restrictions. We don't actually notify you immediately when these guard blocks are overwritten as this is too expensive to do with the current method. However, when you free up the block, MemoryScape checks the guard blocks to see if they have been overwritten, at which point you should get a notification event.

So, your next question may be, why weren't you notified when you freed the previous block? This probably has to do with one of the properties of guard blocks. They have a certain size (8 bytes) and it is quite possible that the corruption skipped over the guard block and landed in the next one. And you didn't get a chance to notice this until you stopped and did the manual check.

Fortunately you can increase the size of the guard block and thus should be able to get a better handle on where the corruption is happening. That should raise an event what that block is de-allocated and you can then check the code between the allocation and de-allocation to see if you can determine where the corruption is taking place.

If you have a TotalView license and you were seeing this corruption always happening at the same address, I'd suggest setting up a watchpoint at that location so the program would stop immediately when that address was overwritten. We are also working on improving out of bound detection, but that is not yet available. Stay tuned...