Results 1 to 2 of 2

Thread: About TotalView's ATTACH functionality

  1. #1
    Junior Member
    Join Date
    Nov 2006
    Posts
    12

    About TotalView's ATTACH functionality

    Hi,

    I'm very interested in (also confused by) TotalView's ATTACH funtionality when used with srun.
    When TotalView tries to attach a job launched by srun, the pid of the srun is used as the attach target of TotalView.
    IMHO, when a debugger attaches to a target, it just sets the target process's "tracing" flag and stops the execution of the process.
    That is, TotalView should just stop the execution of the "srun" process.
    But how can TotalView really attach to the processes of the job lauched by srun? And even launch tvdsvr on each node which corresponding processes reside on!

    Some magic in srun process?

    Regards,
    Robbie

  2. #2

    Re: [Robbie] About TotalView's ATTACH functionality

    Hey Robbie,

    Yes there is some magic there, but it's not very secret. Etnus provides a public interface which is implemented by MPI vendors. This involves a number of structures and call back routines which allows TotalView to discover which processes have been started and on which nodes. When we attach to srun or prun or whatever the starting process is, we check for the existence of the MPIR_Proctable which contains the info about the processes and nodes, and we use that information to fire off the tvdsvrs. Note that attach does not work if the MPI provider has not implemented this interface, and we have some issues with MPI 2 as well. That is being worked on, but attach is not expected to work seamlessly and grab all the spawned MPI 2 processes, many of which use python to implement the MPI starter process.

    Does that make sense?

    Regards,
    Pete
    Pete Thompson
    TotalView Customer Services

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •