gpu - Nvidia OpenCL hangs on blocking buffer access -
i have opencl program copies bunch of values input buffer, processes these values, , copies results back.
// map input data buffer, has cl_mem_alloc_host_ptr cl_float* data = clenqueuemapbuffer(queue, data_buffer, cl_true, cl_map_write, 0, data_size, 0, null, null, null); // set input values for(size_t = 0; < n; ++i) data[i] = values[i]; // unmap input buffer clenqueueunmapmemobject(queue, data_buffer, data, 0, null, null); // run kernels ... // map results buffer, has cl_mem_alloc_host_ptr cl_float* results = clenqueuemapbuffer(queue, results_buffer, cl_true, cl_map_read, 0, results_size, 0, null, null, null); // processing ... // unmap results buffer clenqueueunmapmemobject(queue, results_buffer, results, 0, null, null);
(in real code, check errors etc.)
this works great on amd , intel architectures (both cpu , gpu). on nvidia gpus, code incredibly slow. program takes takes 10 seconds run (5 seconds host, 5 seconds device) run more 2 , half minutes on nvidia cards.
however, have found not straightforward optimisation problem or zero-copy speed difference. using profiler, see host time of program 5 seconds, in normal case. , using opencl profiling events, see device time 5 seconds, in normal case!
so used poor mans' profiler trick figure out program spends time on nvidia gpus. , shows program waits idly on both of clenqueuemapbuffer
calls. find incomprehensible on first instance, queue empty @ point.
i repeat, have profiled every map/unmap , kernel invocation, , time not show there, it's not spent on device, , neither on host. can see stack profile waiting on semaphore instead. knows what's causing hang?
Comments
Post a Comment