Several users have reported upgrading AIX and then having TotalView hang while debugging an MPI job. TotalView becomes non-responsive and cannot be killed. This leaves several processes in a zombie state and these cannot be removed without rebooting the machine. Typically these jobs are started by poe, but that may not be true in all cases. The first reports were on AIX and, as listed by
lslpp -l | grep This issue has been reported to IBM and a fix is available. At this point there is no official patch for the problem, and an IBM kernel engineer has asked each user that runs into this to file a PMR with IBM support. Once this is done, it seems they are able to supply an ifix fairly quickly.