3D Video learner: study: openCL programming

memory in GPU.

relation between OpenCL memory model with AMD HD6970

Synchronize :

event.getProfilingInfo() for debug

event.clsetEventCallback() callback host function when the event happen

Native Kernel: execute in host. unboxing

fence operation: make sure memory read/write should be done

Atomic operation: atomic_add() atomix_xchg()

for constant data, we can use clDeviceInfo() to get the size and number of divice: CL_DEVICE_MAX_CONSTANT_ARGS CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE

One wavefront execute on all work-time, branch in wavefront have very poor efficient, see below：

memory access: channel and bank.

one wavefront should try to access on channel and bank 64KB, it is most efficient.

memory access: channel and bank.

one wavefront should try to access on channel and bank 64KB, it is most efficient.

Profiler:

AMD:

3D Video learner