<< Loading FPGA Bit Stream ^ Top ^ FX2 USB Controller

Timing Latency Questions

  • We can calculate the latency introduced by the USB since the USB packet is only sent when we have a sufficient amount of data collected in the USRP buffer, i.e., the smallest allowed USB packet is 512 byte, and the largest one is specified by the user by two parameters, fusb_nblock and fusb_block_size. As a paper mentions, the USB delay (Tu) is according to the equation:
Tu = f(512, fusb_block_size*fusb_nblock) / (fs*sample_size )
Where:f(x,y) is depends on the data in the buffer and it is at least x and at most y and fs is the sampling frequency. Since we use complex 16 bit samples, the sample size is
sample_ size = 2 * 16bit= 4 bytes
So, does that mean if we have large product of fusb_block_size and fusb_nblock, then the theoretical maximum delay will be increased? Yes that is true. If you're trying to minimize latency you want the smallest values that work reliably (no over/underruns) and with acceptable overhead. If you enable real time scheduling, you can reliably use smaller values. Try fusb_block_size 2048 and fusb_nblock 4 or 8. You may be able to run with fusb_block_size 1024. It depends on your data rate across the USB.
  • What is the theoretical minimum roundtrip latency that we can achieve with the USRP?

    Different cases:

  1. On the host I think that the minimum number of packet buffers we can live with is 2 in each direction. This is to ensure that the EHCI USB host controller end point queue always has something in it for our end points. This is required to keep the throughput up and to keep us from suffering underrun or overruns on the USRP.
  2. On the FX2 the minimum we can use is two in each direction.
  3. On the output path to the FPGA, there's a FIFO, but the pipeline runs if there's any data in the FIFO. This doesn't appear to add any delay. In USRP, we're currently running with 512 byte packets.

    Theoretical Calculations:

    Let the PKTSIZE = 512. In both directions we can't start the transport until you've assembled a full packet. Assume we're running at 32MB/sec, t0 = 16usec. The transport time of a single packet from the host controller to the FX2 is

    t1 =    PKTSIZE (bytes)/32(Mbyte/sec) = 16 usec

    The transport time of a single packet from the FX2 to the FPGA is about

    t2 =   4 us + PKTSIZE (bytes)/96(Mbyte/sec) = 9.33 usec

    [The 4 us is firmware overhead in the FX2. It could be improved]

    The best case:

    Outbound looks like = t0 + t1 + t2 = 41.33 usec. Inbound is the same 41.33 usec.

    With 512 byte packets, total round trip latency

    2 * (t0 + t1 + t2) = 83 usec

    Lowering the packet size to 64 bytes would reduce this to

    2 * (2 + 2 + 4.7) = 8.7 us

    [This is optimistic, since our throughput will drop, and there are lots of other unaccounted factors. E.g., greater USB overhead because of the small packets]

  • How we can practically measure the round trip latency?

    Here's an easy way to measure round trip latency:

    Hook up a signal generator to one of the Rx basic inputs and to one input of an oscilloscope. Hook up one Tx basic output to another input on the o'scope. Write some code on the host that reads from usrp.source_c and writes to usrp.sink_c Look at the delay between the two traces on your o'scope.

<< Loading FPGA Bit Stream ^ Top ^ FX2 USB Controller