Some combinations of GPFS and LInux kernels do not work with TotalView on IA64 Platform

05-16-2008, 11:25 AM
A user was seeing some error messages that would eventual lead to TotalVIew failing or not working correctly with his program. There are a few error messages involved, and the first one seen was

Failed reading lm_name at 0x4000xxxxxx

This might progress a bit further, but eventually gets:

Error unplanting action point 1.
ERROR: Failed, second (gather) read call for replaced instruction at

This was in a parallel session, and TotalView was never able to acquire any but the main process running on the host where TotalView was started.

Significant testing eventually narrowed this down to a failed ptrace call (which TotalView uses to gather data and control the target processes) which only occurred when using 2.4 kernels on IA64 machines.
Various 3.1.0-x versions of GPFS (IBM's parallel file system running on GigE) were tested. The same GPFS version did not show this problem when files were built on a 2.6 kernel. And files that were built on scratch disks, rather than on the GPFS system, worked on the 2.4 systems.