Results 1 to 8 of 8

Thread: Startup error on Mac OS 10.9.5 : Fatal error: Reporting a wait event for the PROCESS

  1. #1
    Junior Member
    Join Date
    Jan 2015
    Posts
    5

    Startup error on Mac OS 10.9.5 : Fatal error: Reporting a wait event for the PROCESS

    Hi,

    I am encountering the following error when I attempt to start a parallel debugging session on 2 cores on Macbook Pro system with 10.9.5 version of the operating system.

    The error encountered is attached below including the log file:

    I launch Totalview from the commandline to pass all the modules loaded in that shell and then enter the executable path and name with other parameters from the GUI.

    # /Users/aike/Applications/TotalView.app/totalview
    Mac OS X Darwin x86 TotalView 8.13.0-0
    Copyright 2010-2014 by Rogue Wave Software Inc. ALL RIGHTS RESERVED.
    Copyright 2007-2010 by TotalView Technologies, LLC.
    Copyright 1999-2007 by Etnus, LLC.
    Copyright 1999 by Etnus, Inc.
    Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc.
    Copyright 1989-1996 by BBN Inc.

    Testing MPI configuration with 'ompi_info -c'
    Exit value was 0 (expected 0), status: execute_command_t::exited
    Reading symbols for process 1, executing "./dmp.exe"
    Library ./dmp.exe, with 2 asects, was linked at 0x100000000, and initially loaded at 0xff00000090000000
    Reading 1665608 bytes of string table...done
    Reading and digesting 99794 loader symbols...done
    Skimming 99794 debug symbols...done
    Library /usr/lib/dyld(x86_64), with 2 asects, was linked at 0x7fff5fc00000, and initially loaded at 0xff000000f3b8e200
    Reading 45176 bytes of string table...done
    Reading and digesting 1189 loader symbols...done
    Skimming 1189 debug symbols...done
    Reading symbols for runtime loader /usr/lib/dyld(x86_64)
    Fatal error: Reporting a wait event for the PROCESS
    Terminated

    The code was compiled with Intel 2015 and using Open MPI 1.8.3 libraries (again compiled with same compiler).
    I also have a crash log file generated by Totalview, which I can send if it is going to be useful for diagnosing the issue.

    Any suggestions to troubleshoot this problem will be very appreciated.
    Thanks,

    Aytekin

    P.S.

    The same executable which was compiled with Intel Fortran compiler for Mac OSX runs ok with OpenMPI 1.8 to the point where it fails which is the reason I wanted to run the debugger.

    mpirun -np 2 ./dmp.exe


    Run name: TEST2D Time: 1: 2 Date: 1- 6-2015
    Memory required: 9.00 Mb
    __________________________________________________ ______________________


    ************************************************** ********************
    From: GRIDMAP_INIT
    Parallel load balancing statistics:

    Comp. cells Processor
    maximum 3894 1
    minimum 3717 2
    average 3805 -N/A-

    Maximum speedup (Amdahls Law) = 0.250000000000000
    ************************************************** ********************
    ....
    ......
    ....
    Elapsed CPU time = 0.000000E+00 sec
    t= 0.0000 Wrote SPx: 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, Disk= 1.40 Mb
    t= 0.0000 Wrote RES;
    WRITING VTU FILE : TEST2D_0000.vtu .................................................. . DONE.


    Time = 0.0000 Dt = 0.10000E-03 CPU time left = 0.000 s

    PE 0: nrows = 3363,start= 178,end= 3540

    PE 1: nrows = 3186,start= 3364,end= 6549
    libc++abi.dylib: terminating with uncaught exception of type int
    libc++abi.dylib: terminating with uncaught exception of type int

  2. #2
    Hi Aytekin,

    Can you tell me if this fails just for this program? Or does TotalView show the same error for a simple hello_world MPI program? Assuming it is built with the same compiler, and OpenMPI version? I don't have the same combinations that you report, but I can get close. My own Macbook Pro is at 10.9.5 but I don't have the Intel compilers installed. We do have a machine with the Intel 15 compilers and an earlier version of Mavericks installed (10.9.2) and I can try building the 1.8.3 OpenMPI on that. But it would be useful to know if this is a generic or specific error for OpenMPI. We have had occasional problems with OpenMPI versions in the past, but I haven't seen anything about it on the OpenMPI forums yet.

    You should also check to see if the latest TotalView (8.14.1-8 officially, with 8.15 in beta) resolves the issue.

    Let me know what you can and I'll check into some things on this side.

    Regards,
    Last edited by PeterT-RogueWave; 01-06-2015 at 02:35 PM. Reason: Misspelled name...
    Pete Thompson
    TotalView Customer Services

  3. #3
    Junior Member
    Join Date
    Jan 2015
    Posts
    5
    Hi Peter,

    Thanks for your quick reply.
    After your e-mail I tested a simple hello world MPI F90 program compiled with Intel 2015 and OpenMPI 1.8.3.

    Totalview appears to launch the debugging session without encountering any of the problems I reported earlier with my application code.
    I was able to debug a simple 4 core run.

    I have contacted for the upgrade to 8.14 but haven't received a response yet. However, I don't know how to get the 8.15 beta to try as it looks like Intel 2015 is supported in this release. Could you please let me know what would be the fastest way to obtain the beta release for testing?

    Also for testing purposes I will try compiling everything with GNU C/C++ and Fortran (4.8) compilers to obtain the same executable for my application code and see if I get the same issue. Hopefully this should help to isolate whether this is originating from a compiler issue or not.

    If you think the issue might be originating from OpenMPI libraries compiled with Intel 2015 on MacOS then I can also test using MPICH library compiled with Intel compilers. I have MPICH 3.1.3 on my Macbook compiled with Intel 2015.

    On a separate note, for MacOS environment have you observed any issues between compiling debug flags versus compiling with low optimization (e.g. -O1 or -O2) but including symbolic information with -g option. Does this affect how Totalview starts the debug session?

    Thanks for your help,

    Aytekin

  4. #4
    Hi Aytekin,

    The 8.15 beta was somewhat short so we kept in within a small number of sites. The official release will be out soon.

    Since you've been using 8.13 with the gcc 4.8 suite, I'd recommend compiling with -gno-strict-dwarf or -gdwarf-2. Assuming that has an affect on the Mac. On Linux 4.8 defaults to Dwarf 4, and that causes some issues with seeing source code. Debugging with optimization is also a bit tricky. It's best to start with no optimization and get things as good as you can before turning on optimization to see if that brings any new problems. This is true with all compilers and all platforms. Some are better at it than others, but basic optimization techniques can cause the debugger to jump around the source as you're stepping (code movement), variables to disappear or appear to not be updated when they should (code movement gets involved here too). Uninitialized variables are one of the biggest problems when optimizing, as they may be zero'd out automatically when compiling for debugging, but pick up odd values when run optimized.

    I don't know of any particular problems with OpenMPI and the Intel compilers. I'm going to try building with the gcc suite and see what happens on my machine, though I just downloaded 1.8.4 and started building that first. Oh well, I can compare the two.

    Regards,
    Pete Thompson
    TotalView Customer Services

  5. #5
    Junior Member
    Join Date
    Jan 2015
    Posts
    5
    Hi Peter,

    I managed to recompile and link everything using GNU 4.8 with -g -O0 flags and OpenMPI 1.8.3 (compiled with GNU 4.8) after some trouble. The trouble was originating with the issue about whether at link stage libc++ or libstc++ runtime libraries are used to link. It looks like number of things have changed with 10.9 (Mavericks) and some compiler installations get confused. I used the recommended compiler flag (-gno-strict-dwarf)

    I managed to test the executable manually and it ran to the same failing point.
    So I tried with Totalview installation that I have and again got the same exact error:
    I manually started the GUI and entered the parameters for executable and OpenMPI selection.
    Attached is the transcript of the session, I reduced the path and executable names to make it easy to read.

    > /Users/aike/Applications/TotalView.app/totalview

    Mac OS X Darwin x86 TotalView 8.13.0-0
    Copyright 2010-2014 by Rogue Wave Software Inc. ALL RIGHTS RESERVED.
    Copyright 2007-2010 by TotalView Technologies, LLC.
    Copyright 1999 by Etnus, Inc.
    Copyright 1996-1998 by Dolphin Interconnect Solutions, Inc.
    Copyright 1989-1996 by BBN Inc.
    Testing MPI configuration with 'ompi_info -c'
    Exit value was 0 (expected 0), status: execute_command_t::exited
    Reading symbols for process 1, executing "./dmp.exe"
    Library ./dmp.exe, with 2 asects, was linked at 0x100000000, and initially loaded at 0xff00000090000000
    Reading 2801328 bytes of string table...done
    Reading and digesting 104646 loader symbols...done
    Skimming 104646 debug symbols...done
    Library /usr/lib/dyld(x86_64), with 2 asects, was linked at 0x7fff5fc00000, and initially loaded at 0xff000000e6143200
    Reading 45176 bytes of string table...done
    Reading and digesting 1189 loader symbols...done
    Skimming 1189 debug symbols...done
    Reading symbols for runtime loader /usr/lib/dyld(x86_64)
    Fatal error: Reporting a wait event for the PROCESS
    Warning: Recursive call to cleanup_and_shutdown.
    ABORTING WITHOUT CLEANING UP.
    Terminated

    Thanks,

    Aytekin

    P.S.
    I tried to upload the crash log generated with the diagnostic tool but for some reason the web interface didn't permit to upload. I can send it via e-mail if there is interest.

  6. #6
    Hi Aytekin,

    Is the program you are working on open source by any chance? I'm just wondering if I could try it here as well to check. It's always easier when you have a reproducer to work with. I've built 1.8.3 and 1.8.4 and then just ran a simple MPI program, with no problems. So maybe I need something a bit more complex to trigger the problem. Send the crash log to the support@roguewave.com address (along with a case number if you got that originally and I'll pick it up from there. You can let them know I said it was ok ;-)

    Regards,
    Pete Thompson
    TotalView Customer Services

  7. #7
    Junior Member
    Join Date
    Jan 2015
    Posts
    5
    Hi Peter,

    This is strange: I had to change the order of one of the dynamic libraries during compilation for another reason.
    While waiting for something I just fired Totalview to check and interestingly it ran ok. I was able to debug for a while.
    When I added few statements of line and used a new module in the same code, recompiled then tried I again got the same problem and Totalview terminated.
    So I am wondering if it appears to be a sensitivity with libraries linked?

    Regarding the executable, do you need just the executable or whole source tree? It is mostly open source except several parts. However, the build procedure is messy as it requires linking against Trilinos library and Open MPI

    Thanks,

    Aytekin

  8. #8
    Junior Member
    Join Date
    Jan 2015
    Posts
    5
    Hi Peter,

    I tested with both compilers Intel and GNU, there seems to be a strange sensitivity. I change something in the code recompile and link with all dynamically linked libraries, it works then I add something else or change the order of the dynamic library linking, it stops working with the error message that I initially sent. So I don't understand what is the root cause at this point.
    By the way I sent one of the earlier crash logs (based on GNU compiled executable) via support e-mail address but I don't know whether they forwarded to you or not.

    Thanks again for your help and time,

    Aytekin

    Quote Originally Posted by PeterT-RogueWave View Post
    Hi Aytekin,

    Is the program you are working on open source by any chance? I'm just wondering if I could try it here as well to check. It's always easier when you have a reproducer to work with. I've built 1.8.3 and 1.8.4 and then just ran a simple MPI program, with no problems. So maybe I need something a bit more complex to trigger the problem. Send the crash log to the support@roguewave.com address (along with a case number if you got that originally and I'll pick it up from there. You can let them know I said it was ok ;-)

    Regards,

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •