bbx15: v4.4.x + OpenCL

Hey Everyone,

Finally got OpenCL working again on v4.4.x, lots of fun "dkms" module
issues, so ripped all those out..

mkbian@beaglebone:/usr/share/ti/examples/opencl/float_compute$ sudo
modprobe cmem
This example computes y[i] = M[i] * x[i] + C on single precision
floating point arrays of size 2097152
- Computation on the ARM is parallelized across the A15s using OpenMP.
- Computation on the DSP is performed by dispatching an OpenCL NDRange
kernel across the compute units (C66x cores) in the compute device.

Running.....

Average across 5 runs:
ARM (2 OpenMP threads) : 0.008669 secs
DSP (OpenCL NDRange kernel) : 0.007781 secs
OpenCL-DSP speedup : 1.114124

For more information on:

This Sunday's lxqt image will have all the fun bits..

For older images do this:

sudo apt-get update
sudo apt-get upgrade

sudo apt-get remove dkms --purge #get rid of dkms/etc..

cd /opt/scripts/tools/
git pull
sudo ./update_kernel.sh
sudo reboot

cd /usr/share/ti/examples/opencl/float_compute/
sudo make
sudo modprobe cmemk
sudo ./float_compute

Regards,

I get this error:

modprobe: FATAL: Module cmemk not found in directory /lib/modules/4.4.23-ti-r51

Any suggestions?

Thanks.

Chris

Well, that's expected on 4.4.23-ti-r51 :wink:

Like i mentioned:

cd /opt/scripts/tools/
git pull
sudo ./update_kernel.sh
sudo reboot

Regards,

For background, turns out the cmemk module as written doesn't like to
be loaded on a kernel built for THUMB2..

That was one of the big changes i made last week..

Regards,

I followed those steps, but I am on the 4.4.23-ti-r51 kernel. What kernel are you using? How do I get there?

OK, I found my problem and I fixed it. I had a "bad PPA" I needed to remove in order for the update_kernal.sh script to complete properly. Here's what I get from the example code:

This example computes y[i] = M[i] * x[i] + C on single precision floating point arrays of size 2097152
- Computation on the ARM is parallelized across the A15s using OpenMP.
- Computation on the DSP is performed by dispatching an OpenCL NDRange kernel across the compute units (C66x cores) in the compute device.

Running.....

Average across 5 runs:
ARM (2 OpenMP threads) : 0.007877 secs
DSP (OpenCL NDRange kernel) : 0.007614 secs
OpenCL-DSP speedup : 1.034475

Is that the expected result?

Chris

Yeah, i was getting around 1.1x on v4.4.x

When i last tried ti's sdk (v4.4.x based on the Alpha -X15 (no support
for the rev b yet)) i was getting around 0.7/0.8 "speedup"...

Back in v4.1.x (about a year ago, with the alpha-x15) i thought it was
around 3x/4x speedup

So there's definitely a speed regression, (maybe we are in a slow
clock state for the dsp?)

But it atleast it's working again... :wink:

Regards,