Transfer latency

Transfer latency


Preamble

In Prophesee event-based sensor manual, it is made reference to pixel latency and jitter as KPIs. This latency is defined as the pixel response delay to a temporal contrast step, typically expressed in micro-seconds. The jitter corresponds to the temporal precision at pixel level as illustrated in the figure below.

Fig1: Pixel latency and pixel jitter.
Note 1) Latency and jitter values are nearly equivalent and sensor settings have the same influence on both. Sometimes latency is incorrectly mentioned instead of jitter.
Note 2) The timestamping precision (two pixels that respond to the same temporal contrast must produce events with the same timestamp) depends on the pixel jitter but also on the ReadOut. On the Gen4.1, the pixel array can perform more than 10 millions detections per second. The maximum throughput of the ReadOut is 3 Giga Events. In the case of saturation of the ReadOut, a pixel detection can be pending for several hundreds microsecond before being served. To avoid this situation, pixel analog parameters (biases) should be tuned to reduce background noise and a Region Of Interest should be defined to enable only the relevant pixels.

Algorithms can take benefit from this timestamping precision ie. few hundred micro-seconds (could be better in particular conditions). A precise causality can be determined from events with a different timestamps.

Events produced by the sensor have to be transfered to a processing unit. The transfer latency corresponds to the delay between the timestamping by the sensor in the ReadOut to the moment when the event is taken into account by the algorithm. This transfer latency is of course different from the pixel latency, it depends on the communication interfaces used in the transfer.

Prophesee provides Evaluation Kits (EVKs) to stream events over USB. This article details what influence the transfer latency.

Evaluation Kit streaming pipeline

Prophesee EVK2 and EVK3 are based on a processing unit which transfers events received through MIPI from the sensor to the USB interface:
  1. on EVK2, a Zynq Ultrascale+ fpga is programmed to execute data transfer
  2. on EVK3 CX3 micro-controller from Cypress handles the transfer
EVKs enumerate as USB3.0 devices with two bulk endpoints in addition to the control endpoint. On top of the USB stack, Prophesee developed a custom protocol named "Treuzell" to control at a time the sensor and the EVK Processing Unit and to stream data.

Prophesee sensors are connected to the EVK processing unit through a module called  CCAM5 . As Gen4.1/IMX636 sensors embed a MIPI TX interface, CCAM5 is different depending on the sensor:
  1. CCAM5-Gen4.1/IMX636 module simply routes the MIPI signals to the EVK processing unit as illustrated in Fig. 2,
  2. CCAM5-Gen3.1 module has a small fpga to pack events sampled on a parallel interface into MIPI frames as illustrated in Fig. 3.
EVKs have been designed to avoid bottleneck due to the communication interfaces. The bandwidth offered by USB3.0 bulk endpoints is not guaranteed and can be shared with other devices connected to the same host. Prophesee recommends to connect only one device to a USB host controller.



Fig 2: EVK streaming pipeline with a Gen4.1 sensor.




Fig 3: EVK streaming pipeline with a Gen 3.1 sensor.

The EVK processing unit does data transfer from the MIPI  interface to the USB device port. Data received are packed into MIPI frames. The frame period is fixed while the frame size is variable meaning that the number of packets and the size of the last packet within a frame are variable as depicted in Fig. 4.

Fig4: MIPI frame
(*) The date used to cloture a Frame Period comes from timestamps carried by internal events. There is a variable delay from the event timestamping and the MIPI packetization of the data due to the digital filtering pipeline of the sensor. The worst-case delay on Gen4.1 does not exceed 200us. The consequence is that the Frame Period observed from the MIPI receiver may vary of 400us around the specified frame period.
(**) Position of the frame start is variable and not “as close as possible to the minimum packet spacing” as recommended in the MIPI CSI-2 specification. This position depends on the activity of the sensor.

The USB protocol is a master/slave protocol. An application such Metavision creates USB Requests of any size with a buffer attached. The USB host controller initiates multiple fixed-size packet transfers though bulk IN enpoints populated by the device. A transfer can be aborted by the host if the buffer is not full after a certain delay (timeout) or it can be shorten by the device if the device sends a short packet of a zero length packet.

Influence of packetization on the worst case latency

Transport latency is critically dependent on the buffering required to pack the data or to perform burst transfers. The worst latency corresponds to situations with few contrast detections. Time required to fill packets is longer due to low datarate which delays transfers.

As illustrated in Fig. 5, data are packed and unpacked at different places in the EVK pipeline:
  1. the sensor packs events in a MIPI frames,
  2. MIPI frames are unpacked by the EVK processing unit memory,
  3. the EVK processing unit does burst transfer to the USB device endpoint.

Fig. 5 Data packing within the EVK pipeline.

Within the sensor, the delay between the Readout and the MIPI TX interface is negligible. The sensor produces a MIPI packet when a packet is full or at the end of the frame period to flush data which are pending to be sent. Therefore, a single event may wait a full MIPI frame period before being packed. By defaut, the MIPI period is set to 1ms.

MIPI frames received by the EVK processing unit are transfered into USB packets. A DMA executes the copy operation by multiple bursts (of 16KBytes by default):
  1. On EVK2, there is no timeout then, the operation can be indefinitely delayed until a block of 16KBytes have been received through the MIPI RX interface. Therefore, in theory, the worst latency can not be bounded on EVK2. In practice, the background noise produces enough events to trigger DMA copy in a reasonnable amount of time.
  2. On EVK3, a DMA transfer is started when 16KBytes have to be transfered OR when a MIPI End of Frame has been detected (ie. every ms by default). The worst case latency due to bust transfer is one MIPI frame period.
The USB protocol is a master/slave protocol. The application executed on the computer creates USB Requests with a pre-allocated buffer attached. The DMA of the USB host interface copies data from the USB packets received into buffers. Multiple requests can be prepared and chained in advance so that the USB host never suspends its requests to the EVK. Once a request has been served, a callback is invoked. In Metavision, data are copied and provided to the EVT format decoder, and the request is recycled.

By default, Metavision plugin is configured to perform USB Requests of 128KB. There is a timeout (100ms by default) associated with these requests. The maximim packet size depends on the USB Bulk IN endpoint configuration. Packets size is 1024 bytes. The last one of the current transfer can be smaller. When the timeout is triggered while less than 128KB have been received, data are immediately transfered to the application. Therefore, if an event is packed in the first 1024 Bytes sent, it may be transfered to the application 100ms after.

Kit
Max Sensor Latency
Max transfer latency (MIPI IF to USB IF within the EVK processing unit)
Max latency due to USB transfer request
EVK2
1ms
Not bounded
100ms*
EVK3
1ms
1ms
100ms*

* In theory,  to fill a 1024 bytes USB packet may take time depending on the sensor activity (furthermore, EVK2 will start filling USB packets only once a 16K Bytes transfer from the MIPI RX IF has been completed). In this case, several successive USB Requests can abort due to a timeout and the few data will remain pending on the EVK. This use case is unlikely because background noise maintains a minimal data rate.

Reducing transfer latency

As explained previously, the worst case for transfer latency corresponds to low data rate ie. low contrast detection activity and low background noise. The whole pipeline can be tuned for low-latency, low bandwidth, but the setting is likely to decrease performances on usual use cases.

The longer delay is due to the the USB communication. A transfer can start with one 1024 Bytes packet but the applicatiuon can be notified 100ms later if the reception of the expected 128KBytes is not finished before.
This delay can be shorten changing the tomeout in the Metavision plugin. In theory, the timeout can be reduced down to 1ms. In this case the worst case for the overall latency is 3ms. However, taking into account notifications in less than 1ms on the computer size is challenging for a non real time operating system. While most USB host hardware interfaces are designed to queue new requests in advance, software tasks will process burst of completed transfers after an uncontrolled delay.

Reducing the 2ms latency on the EVKs requires to change the MIPI configuration and the EVK processing unit firmware requesting a significant effort.





 






    As a Prophesee customer, get access to your personal ticketing tool, application notes product manuals and more resources. 
    Request your free access today.