Build your Event-based Application 2/5: Camera Settings

Camera setting

Focus

To ensure you get relevant and usable data you must focus your camera system. Similarly to frame-based cameras, poor focus will provide low quality or even useless data. For event-based sensors, poor focus may produce lots of events, thus increasing the processing time, but also prevent accurate visibility of the target, preventing the acquisition of useful vision data.

Depending on optics selection, focus state and accuracy will be more or less critical. In particular, the depth of field, corresponding to the distance range where objects will be focused, will be directly impacted by the lens.

To correctly set the focus of your event-based camera, the steps are described in our SDK Documentation in the Focus Adjustment page. In short, it consists of displaying a blinking pattern on a screen, visualizing events on another, and adjusting with the focus setting to achieve sharp edges. A dedicated video is also available on our Youtube channel.

Figure 1. Example of focus pattern

Calibration

Intrinsic calibration

For some applications (but not systematically), the cameras need to be calibrated, individually to begin with, through the intrinsics calibration. This calibration estimates the optical lens distortion as well as the intrinsic parameters of the camera (focal length, aperture, field-of-view, resolution, etc). It is a widely known procedure for frame-based cameras which is useful when the user wants to relate pixel measurements to real-world metric measurements. For event-based ones, it requires adaptation since no frames are natively produced. The most straightforward procedure is to use a blinking pattern and to integrate events into a frame. The framed representation of events can then be used with typical frame-based calibration pipelines.

Note

Intrinsic calibration is not always mandatory. For instance, if a surveillance system needs to classify people in a scene, it does not need any metric measurement, while if it does people detection and needs to locate the detected person in the scene, intrinsic calibration will be necessary. Other examples where intrinsic calibration is necessary: Particle Size Measurement, SLAM, EB-FB stereo system, etc.

This calibration procedure is provided in Metavision SDK in the calibration module, and described in the Metavision SDK documentation.

Figure 2. Blinking checkerboard events integrated into a frame and associated detected corners

Extrinsic calibration

If the product is composed of several cameras, or systems in general (laser projector, frame- or event-based camera, robotic arm, etc), users often need to describe the pose (translation and rotation) of all systems relative to each other. It will be the case for instance for stereo-camera systems, or more generally in multi-camera systems.

For the main case where only two cameras need to be associated, an extrinsic calibration needs to be operated. It usually consists in detecting features in each field of view, then extracting common ones, and finally provide them to an algorithm that computes the relative pose of the cameras.

Figure 3. Stereo-camera setup extrinsic calibration (Rotation R and Translation T)

Bias tuning

Bias tuning is a key part in the setup of an event-based camera. Not only will bias tuning directly impact the quality of the data to be processed, affecting the quality of the processed output data, the tuning can also affect the determination of overall project feasibility.

Note	Bias setting defines how the sensor will track light variation and produce events. Thus, it needs to be done before data acquisition and cannot be changed on a recording.

Poor bias selection can lead to both an excess of generated events and a loss of the useful data. It is a deeply use-case specific setting that needs to be analyzed with attention.

There is no "perfect bias set", first because all sensors don’t have the same tuning ability, but also because a set of biases is tailored to a specific target. Some biases will be more adapted to detect very high-speed changes while others will be better at detecting people walking.

Bias tuning consists mainly in defining a trade-off between pixel sensitivity, pixel reactivity and background noise generation. The ultimate goal is to produce data only for relevant changes while keeping control on the data rate.

At the sensor level, biases can be set to control:

Contrast sensitivity: the relative amount of light variation a pixel can detect
Low-pass and High-pass filters: they filter respectively fast and slow light changes
Refractory period: the ability of pixels to generate an event directly after a previous one
Background rate: the amount of noisy events

To illustrate the importance of bias tuning, let’s have a look at a the Metavision SDK Active Markers sample using IMX636. It requires blinking LEDs in the camera FOV. Using sensor default biases, there can be a lot of information to process, in particular useless information, while only the LEDs blinking is desired. In that case, tuning the biases using the sample dedicated biases described below allows to filter out all events not related to the LEDs blinking.

0     % bias_diff
180   % bias_diff_off
60    % bias_diff_on
30    % bias_fo
140   % bias_hpf
0     % bias_refr

In both scenes below, a hand is passed in the FOV, and another one holds the LED board. As a result of bias tuning, only the LEDs are visible with appropriate biases.

Impact of bias tuning
Figure 4. Default biases	Figure 5. Tuned biases

It is clearly visible on this comparison that tuning the biases can lower the background noise while providing more relevant events to the algorithm pipeline. In particular, the event rate is divided by more than two only by slightly tuning the biases. This will lower the burden on the whole event processing pipeline, facilitate algorithm design and tuning, and improve their robustness.

Synchronization/Trigger

Prophesee’s event-based sensors support two important functionalities: synchronization and triggers. These are detailed in the SDK documentation.

Synchronization allows to use a common timebase for all synchronized systems. For instance, two synchronized event- based cameras would produce events with timestamps based on the same time origin. This allows further processing of both event streams in a single algorithm. In other words, a change occurring at time t in the first camera, will be seen at same time t in the second one. In practice, a master camera is chosen. The latter will generate a SYNC OUT signal retrieved by all slave cameras as the SYNC IN signal. Note that the master camera should be turned on last, after all slave cameras have been turned on. For instance, this feature will be necessary to work with a full event-based stereo setup.

Figure 6. Example of possible synchronization setup between three event-based cameras

Triggers allow a level change of a sensor I/O pin to be translated into a specific event-type that is inserted into the event stream. Typically, a trigger signal can be inserted in the event stream at the exact time when a frame-based camera starts the exposure of a new frame, for a hybrid stereo setup. This way, the event pipeline has a common timing reference with the frame camera.

Figure 7. Example of possible synchronization setup between a frame-based and an event-based camera

Warning

Some cameras using Prophesee’s event-based sensors might not support synchronization or triggers, depending on the camera maker and the product version. Contact your camera provider for more information.

Region of Interest (ROI)

By default, the full pixel array is used to produce events. In several cases, it might not be interesting to do so in certain regions of the pixel array. In other cases, an action occuring in a given zone of the field of view (FOV) might impact negatively the event production and event processing.

To answer this problem, the ROI feature available on some sensors allows to activate only certain regions, thus producing less events and only in interesting zones. Less data is produced and thus processed. It helps to reduce the output event rate, and consequently the power consumption, as well as burden on the processing unit.

The granularity of the masking depends on the sensor itself. It can range from simple rectangle ROIs to individual pixel masking. Line ROIs can also be possible. The latter can be very useful for some algorithms, such as Particle Size Monitoring (PSM), which deals with particles falling in the FOV and processes the events generated only on a small number of horizontal lines (around 6 for instance), as particles pass through them.

Below is an example of a rectangle ROI for IMX636 sensor and a pixelwise masking for GenX320 (representing Prophesee logo). In the first case, a hand is passing in front of the whole FOV but only the part in the rectangle ROI produces events, while in the second case, a sequence of lines moving downwards are displayed in front of the sensor set in pixelwise ROI mode, with an active zone set in the shape of Prophesee logo.

ROI examples
Figure 8. Example of rectangle ROI with IMX636	Figure 9. Example of pixelwise ROI for GenX320

Event Signal Processing (ESP)

To accelerate and facilitate event processing by extracting only relevant information, Event Signal Processing (ESP) blocks can be used for a variety of tasks. These are hardware processing blocks directly embedded in the sensor. Being hardcoded, they have temporal and spatial limits which are defined for each of them.

Alternatively, some ESP sensor blocks can have a software counterpart at host platform level, which provides more flexibility for development purposes, often at the cost of efficiency. Here, only sensor processing blocks will be described.

Anti-Flickering (AFK)

In many typical scenes, flickering light are present: phone screens, indoor lighting, outdoor flood lighting, etc. Flickering can be identified as any source of illumination rapidly turning on and off. In the case of event-based cameras, this luminance variation can result in the generation of many events that do not correspond to a movement or change of interest. Anti-Flickering aims to mitigate the impact of those very fast light variations on the event stream. It works by pixel blocks, typically 4x4.

The example below is typical: a screen in the FOV (a TV here) producing flicker at 120Hz. In this situation, AFK can be set to filter frequencies between 100Hz and 140Hz. It drastically filters out the events generated by the flickering, while allowing changes out of this frequency band to produce events, as shown on the last image. Thus, the event rate drops from 60 MEvts/sec to 400kEvts/sec, which is a significant improvement to further extract information from events.

Figure 10. Reference scene

Figure 11. TV flickering, AFK OFF, Event rate 60 MEvts/s

Figure 12. TV flickering, AFK ON, Event rate 400 kEvts/s

Figure 13. TV flickering, AFK ON, Event rate 300 kEvts/s

Event Rate Control (ERC)

In some scenes, the light changes or movement might reach various working points. In the context of a subway surveillance camera for instance, we might have the right biases set for very few activity happening in-between trains, but a peak of events will happen when a train arrives and passengers flood the docks. To avoid flooding the computing unit with peaks of events, the sensor can spatially and/or temporally drop events to limit the event rate. This event dropping mechanism can be tuned both in time and space. It works by groups of neighboring pixels (typically 32x32).

The target event rate is configured by the user as the target number of events for a given reference period. The ERC block will then compute the corresponding event rate and the associated drop rate according to the programmed threshold. Event decimation is applied to the events inside the event period.

Figure 14. ERC removing activity peaks

Spatio-Temporal Contrast filter (STC) / NFL Filter

Event filtering at sensor level allows to remove some targeted information produced at sensor level, such as background noise or trail of events after an edge pass, which can be considered redundant or undesired information.

Considering that a moving edge may generate a trail of several events, the STC for instance is designed at pixel level to filter out low frequency noise and keep only the relevant "motion" event with highest temporal accuracy. More precisely, it removes the first event of a trail, forwards the second one and can discard the next ones. Note that a time threshold is often necessary to tune those type of filters, and requires to be set accordingly with the observed movement (typically, the faster the movement, the lower the threshold, but it will depend on the light level, lens aperture etc.).

Figure 15. STC filtering output

The following figure demonstrates the impact of STC. The first two images show how STC natively filters out the background noise as well as the trailing events after the pass of an edge over a pixel (more event density when visualizing events over a integration time) as an option.

In practice, STC is used frequently to reduce redundant data, but depending on the use-case, it might be counter productive. For instance, it is not used for the active marker use-case, for which LEDs produce very fast and contrasted light changes. If STC is applied, events corresponding to the blinking of the LEDs are filtered as well (as visible on the next two images), which prevents the algorithm to work.

Impact of STC
Figure 16. STC OFF	Figure 17. STC ON (threshold at 10ms)
Figure 18. Active Marker biases + STC OFF	Figure 19. Active Marker biases + STC ON (threshold at 10ms)

On the other side, NFL is a band-pass global rate filter, it does not work locally as does the STC but on the whole pixel matrix at the same time. The low threshold can remove the background noise, while the high threshold targets bursts or flashes of events.

Event Data Formatter (EDF)

This last common ESP block allows do choose the event format to be used.

About the event formats, there are currently three event formats used in Prophesee technology: EVT2.0, EVT2.1 and EVT3.0. Natively, the readout process provides events with EVT2.1 format. Therefore, if this is the desired event format, this block can be bypassed. More formats will come in the future, enabling better performance to the data streaming for certain use cases.

In practice:

EVT2.1 is natively produced by the readout process. It is a vectorized format (vectors along a sensor row).
EVT2.0 format is easier to decode as platform level as it does not apply data vectorization. It is thus more appropriate for low event-rate scenes.
EVT3.0 format applies event compression, which is useful for high-event rate scenes (typically when lots of events are generated on the same rows).

Not only the event format will impact the data streaming performance, but also the sensor consumption. The graph below provides an example of power consumption for Prophesee GenX320 sensor as a function of the event rate. As shown on the graph, high event-rate scenes will consume less power using EVT3.0 format than EVT2.1. Also, EVT3.0 can handle higher event rates than EVT2.1 before overloading thanks to data compression.

Figure 20. Impact of event format on overall sensor consumption

Output mode

Most event-based sensors provide a stream of data relative to a sequence of events, mostly described by its (x,y) position, polarity (corresponding to an increasing or decreasing light intensity), as well as its timestamp. This information can be encoded into different formats as described above, and they can be appropriate for lots of applications and provide high granularity for quality processing.

On the other hand, some event-based sensors can provide other formats, such as pre-processed ones. For instance, histograms can be used in event-based algorithms. While software algorithms can build histograms from events, directly retrieving them from the sensor itself can save a lot of computation time as well as power. This feature is for instance available on Prophesee’s GenX320 sensor.

Figure 21. Example of possible sensor preprocessing