Results 1 to 3 of 3

Thread: Error initializing CUDA when debugging

  1. #1
    Junior Member
    Join Date
    May 2012
    Posts
    1

    Error initializing CUDA when debugging

    Greetings,

    When I debug a program that uses CUDA, I get the following errors on stdout when cudaGetDeviceCount() is called:

    ERROR: cuda_trace_obj::initialize_cuda_library: Cuda initialize() returned CUDBG_ERROR_ALL_DEVICES_WATCHDOGGED(24)!
    ERROR: cuda_system_status_t::initialize: Error CUDBG_ERROR_UNINITIALIZED(5) getting device count

    Outside the debugger, the program works fine.

    CUDA 4.1
    TotalView 8.8.0, 8.9.1, 8.9.2, 8.10.0
    Ubuntu Linux 64-bit,
    CentOS 5.4 Linux 64-bit
    RedHat 5.4 Linux 64-bit

    If I use TotalView 8.6.1, the program works, but of course the CUDA code can't be debugged.

    Googling for this hasn't turned up anything. Any pointers would be appreciated!

    Rodney

  2. #2
    The watchdog error is a common one. It basically indicates the CUDA card is being used by another program. In most cases, if this is your own workstation, the problem may be that the card is busy driving the X11 display. While the watchdog time will allow you to do computation on the card, it will not allow a debugger to attach to the kernel while the card is driving a display. You will see the same error with cuda-gdb. If you have more than one CUDA device, you should be able to use one of them. Otherwise you will need to figure out a way to get around this. You could use the system as a server, with no display. Or it might be possible to run a VNC session (man vncserver) and then use the vnc session as your interface to the system. That MAY work, but I can't guarantee it, as I haven't had the opportunity or setup to try it myself.

    Hope that helps.

    Regards...
    Pete Thompson
    TotalView Customer Services

  3. #3
    It essentially tells the CUDA card for use by another program. In most cases, if not my workstation, the problem may be that the card is piloting the X11 display. Although the monitoring time for you to make the calculation of the card, it does not allow the debugger to connect the card base to drive the display.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •