GRWG: Coprocessor Problems

AKA: Putting Blocks Somewhere Besides the General Purpose Processor on Which GNU Radio is Running

  • Diverse hardware platforms each with unique attributes and challenges
  • Not practical to make GR a replacement for existing development tools (Xilinx ISE, TI Code Composer, etc.)
  • Dynamically scheduling when to do what where is hard
    • Goal: enable hardware accelerator users, developers, and researchers to adopt GR as a framework for applications
  • Moving data
    • Creating buffers in desired memory region
    • Facilitating command/control and parameter loading
  • Permit “chains of operations” and “superblocks”
    • Allows configuration of accelerated portion at start-up (or not)
  • Need a unified accelerator API
    • Wrap the necessary parts of the driver interface
    • Present the desired functional interface to the flowgraph
    • Provide accelerator developers an easy, effective, and efficient way to use GR

Initial Goals

  • C++ Class API for GR buffer interface
    • Allow for multiple types of buffer allocation and usage, each of which all must provide the same data guarantees to scheduler
      • VM Circular; non-circular; non-host based via DMA (circular or not); others
      • Specifics defined by actual interface, inherited from parent class
    • Move current GR buffers to use this, or this to use generic GR buffer interface if that is already in place
    • Arbitrary size, depending on usage and need of block, but default to a specific value for buffer type
  • C++ Class API for coprocessor interface
    • Supports means for creating buffers for data transport between a specific coprocessor and main CPU memory (via new buffer API)
    • Separate data transport and kernel execution if/where possible, to minimize latency to coprocessor work, and maximize data throughput when handling processing on coprocessor
    • Supports means for executing a single kernel on the coprocessor
    • No support for multiple-kernel scheduling yet; multi-kernel combined into single kernel initially
    • Single threaded; asynchronous / no blocking (use internal state to keep tabs on processing)
    • Work flow: push data to coprocessor, kernel execution, pull data from coprocessor
    • Hopefully data push and pull can be made asynchronous to kernel execution

Future Goals

  • Allow kernel-per-block/thread, multi-kernel control via current host CPU-based scheduler, while maintaining data storage on coprocessor in-between relevant blocks
  • Dynamic block allocation on host CPU or coprocessor at flow graph start time
  • Dynamic block work location selection on host CPU or coprocessor during runtime
  • Supports means for creating buffers for data transport between any specific coprocessors, to avoid having to return data to the host CPU