gcell solves the problem of efficiently distributing potentially small tasks across the SPEs by using a distributed SPE-centric scheduler that pulls work to SPEs as they become available. In addition gcell provides high-performance DMA of arguments to and from the SPEs, task completion notification to client processes, and binding and rendezvous between PPE and SPE code. Benchmarks show near linear speed-up from 1 to 16 SPEs on 2-way Cell blades.

gcell: an SPE Scheduler and Asynchronous RPC Mechanism for the Cell Broadband Engine

The Interface

The primary interface is given in gc_job_manager and gc_job_desc.h.

The benchmark code is benchmark_nop.cc.

Performance

Here's a snapshot of our current scaling performance. The test code submits 500k jobs and times how long it takes to complete all the jobs as a function of the number of spes used. Several runs are made, which vary in how much work each job does. For the purposes of the benchmark, the jobs busy wait for a specified number of microseconds on the SPE. This is called the 'work increment'. Thus, if the work increment is 10 us, the total useful_work is 10 us * 500k jobs = 5 seconds. The Y-axis plots useful_work divided by the real time required to complete all jobs. It's basically "speedup".

http://comsec.com/papers/R-7700-20080214-2213.png