PDA

View Full Version : About TotalView's ATTACH functionality



Robbie
01-12-2007, 06:55 AM
Hi,

I'm very interested in (also confused by) TotalView's ATTACH funtionality when used with srun.
When TotalView tries to attach a job launched by srun, the pid of the srun is used as the attach target of TotalView.
IMHO, when a debugger attaches to a target, it just sets the target process's "tracing" flag and stops the execution of the process.
That is, TotalView should just stop the execution of the "srun" process.
But how can TotalView really attach to the processes of the job lauched by srun? And even launch tvdsvr on each node which corresponding processes reside on!

Some magic in srun process?

Regards,
Robbie

PeterT-RogueWave
01-16-2007, 08:28 AM
Hey Robbie,

Yes there is some magic there, but it's not very secret. Etnus provides a public interface which is implemented by MPI vendors. This involves a number of structures and call back routines which allows TotalView to discover which processes have been started and on which nodes. When we attach to srun or prun or whatever the starting process is, we check for the existence of the MPIR_Proctable which contains the info about the processes and nodes, and we use that information to fire off the tvdsvrs. Note that attach does not work if the MPI provider has not implemented this interface, and we have some issues with MPI 2 as well. That is being worked on, but attach is not expected to work seamlessly and grab all the spawned MPI 2 processes, many of which use python to implement the MPI starter process.

Does that make sense?

Regards,
Pete