面向自动驾驶场景的脉冲视觉研究

2024年第26卷第1期

摘要

关键词

图片

参考文献

相关研究

回顶部

《中国工程科学》 >> 2024年第26卷第1期 doi: 10.15302/J-SSCAE-2024.01.012

面向自动驾驶场景的脉冲视觉研究

1. 视频与视觉技术国家工程研究中心，北京 100871；

2. 北京大学人工智能研究院，北京 100871

资助项目：中国工程院咨询项目“新一代人工智能及产业集群发展战略研究”(2022-PP-07) 收稿日期： 2023-11-14 修回日期： 2024-01-05 发布日期： 2024-02-02

HTML100 PDF 77 收藏 0

摘要

自动驾驶是计算机视觉重要的研究方向，具有广阔的应用前景。纯视觉感知方案在自动驾驶场景中具有重要的研究价值。不同于传统相机，脉冲视觉传感器能更加灵敏地感受光子，具备比传统视频快千倍以上的成像速度，具有高时间分辨率、高动态范围、低数据冗余和低功耗等优势。本文面向自动驾驶场景，总结了脉冲相机的成像原理、感知能力与优势；围绕自动驾驶相关视觉任务，详细介绍了脉冲视觉影像重建原理与方法，讨论了基于传感器融合的影像增强技术路线；归纳总结了基于脉冲相机的运动光流估计、目标识别检测分割与跟踪，以及三维场景深度估计算法进展及技术路线；梳理了脉冲相机数据及感知系统的发展现状，分析了脉冲视觉的研究挑战；研究提出了潜在解决方案及未来研究方向。脉冲相机及其算法和系统在自动驾驶领域具有巨大潜力，是未来计算机视觉的主要研究方向之一。

关键词

脉冲视觉 ; 脉冲相机 ; 自动驾驶 ; 人工智能

图片

图1

图2

图3

图4

图5

图6

图7

图8

图9

图10

图11

图12

图13

图14

图15

参考文献

[ 1 ] Lichtsteiner P, Posch C, Delbruck T. A 128$/times$ 128 120 dB 15 $/mu$s latency asynchronous temporal contrast vision sensor [J]. IEEE Journal of Solid-State Circuits, 2008, 43(2): 566‒576.

[ 2 ] Posch C, Matolin D, Wohlgenannt R. An asynchronous time-based image sensor [C]. Seattle: 2008 IEEE International Symposium on Circuits and Systems, 2008.

[ 3 ] Brandli C, Berner R, Yang M H, et al. A 240 × 180 130 dB 3 µs latency global shutter spatiotemporal vision sensor [J]. IEEE Journal of Solid-State Circuits, 2014, 49(10): 2333‒2341.

[ 4 ] Moeys D P, Corradi F, Li C H, et al. A sensitive dynamic and active pixel vision sensor for color or neural imaging applications [J]. IEEE Transactions on Biomedical Circuits and Systems, 2018, 12(1): 123‒136.

[ 5 ] Huang J, Guo M H, Chen S S. A dynamic vision sensor with direct logarithmic output and full-frame picture-on-demand [C]. Baltimore: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017.

[ 6 ] Huang T J, Zheng Y J, Yu Z F, et al. 1000 × faster camera and machine vision with ordinary devices [J]. Engineering, 2023, 25: 110‒119.

[ 7 ] Posch C, Matolin D, Wohlgenannt R. A QVGA 143 dB dynamic range frame-free PWM image sensor with lossless pixel-level video compression and time-domain CDS [J]. IEEE Journal of Solid-State Circuits, 2011, 46(1): 259‒275.

[ 8 ] Chen D G, Matolin D, Bermak A, et al. Pulse-modulation imaging—Review and performance analysis [J]. IEEE Transactions on Biomedical Circuits and Systems, 2011, 5(1): 64‒68.

[ 9 ] Son B, Suh Y, Kim S, et al. 4.1 A 640 × 480 dynamic vision sensor with a 9 µm pixel and 300 Meps address-event representation [C]. San Diego: 2017 IEEE International Solid-State Circuits Conference (ISSCC), 2017.

[10] Culurciello E, Etienne-Cummings R, Boahen K A. A biomorphic digital image sensor [J]. IEEE Journal of Solid-State Circuits, 2003, 38(2): 281‒294.

[11] 李家宁, 田永鸿. 神经形态视觉传感器的研究进展及应用综述 [J]. 计算机学报, 2021, 44(6): 1258‒1286.
Li J N, Tian Y H. Recent advances in neuromorphic vision sensors: A survey [J]. Chinese Journal of Computers, 2021, 44(6): 1258‒1286.

[12] Gallego G, Delbrück T, Orchard G, et al. Event-based vision: A survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1): 154‒180.

[13] Bardow P, Davison A J, Leutenegger S. Simultaneous optical flow and intensity estimation from an event camera [C]. Las Vegas: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[14] Li H M, Li G Q, Shi L P. Super-resolution of spatiotemporal event-stream image [J]. Neurocomputing, 2019, 335: 206‒214.

[15] Munda G, Reinbacher C, Pock T. Real-time intensity-image reconstruction for event cameras using manifold regularisation [J]. International Journal of Computer Vision, 2018, 126(12): 1381‍‒1393.

[16] Scheerlinck C, Barnes N, Mahony R. Continuous-time intensity estimation using event cameras [C]. Los Angeles: Asian Conference on Computer Vision, 2019.

[17] Scheerlinck C, Barnes N, Mahony R. Asynchronous spatial image convolutions for event cameras [J]. IEEE Robotics and Automation Letters, 2019, 4(2): 816‒822.

[18] Barua S, Miyatani Y, Veeraraghavan A. Direct face detection and video reconstruction from event cameras [C]. Lake Placid: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016.

[19] Aharon M, Elad M, Bruckstein A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311‒4322.

[20] Rebecq H, Ranftl R, Koltun V, et al. Events-to-video: Bringing modern computer vision to event cameras [C]. Long Beach: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[21] Scheerlinck C, Rebecq H, Gehrig D, et al. Fast image reconstruction with an event camera [C]. Snowmass Village: 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020.

[22] Cadena P R G, Qian Y Q, Wang C X, et al. SPADE-E2VID: Spatially-adaptive denormalization for event-based video reconstruction [J]. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2021, 30: 2488‒2500.

[23] Weng W M, Zhang Y Y, Xiong Z W. Event-based video reconstruction using transformer [C]. Beijing: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[24] Zhu L, Wang X, Chang Y, et al. Event-based video reconstruction via potential-assisted spiking neural network [C]. New Orleans: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

[25] Pini S, Borghi G, Vezzani R. Learn to see by events: Color frame synthesis from event and RGB cameras [C]. Beijing: International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2020.

[26] Wang L, Mohammad Mostafavi I S, Ho Y S, et al. Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks [C]. Long Beach: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[27] Wang L, Kim T K, Yoon K J. EventSR: From asynchronous events to image reconstruction, restoration, and super-resolution via end-to-end adversarial learning [C]. Seattle: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[28] Paredes-Vallés F, de Croon G C H E. Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy [C]. Online: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[29] Liu S C, Rueckauer B, Ceolini E, et al. Event-driven sensing for efficient perception: Vision and audition algorithms [J]. IEEE Signal Processing Magazine, 2019, 36(6): 29‒37.

[30] Barranco F, Fermuller C, Ros E. Real-time clustering and multi-target tracking using event-based sensors [C]. Madrid: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.

[31] Piątkowska E, Belbachir A N, Schraml S, et al. Spatiotemporal multiple persons tracking using Dynamic Vision Sensor [C]. Providence: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2012.

[32] Lagorce X, Meyer C, Ieng S H, et al. Asynchronous event-based multikernel algorithm for high-speed visual features tracking [J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(8): 1710‒1720.

[33] Rodríguez-Gomez J P, Eguíluz A G, Martínez-de Dios J R, et al. Asynchronous event-based clustering and tracking for intrusion monitoring in UAS [C]. Paris: 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020.

[34] Mitrokhin A, Fermüller C, Parameshwara C, et al. Event-based moving object detection and tracking [C]. Madrid: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.

[35] Chen H S, Wu Q Q, Liang Y J, et al. Asynchronous tracking-by-detection on adaptive time surfaces for event-based object tracking [C]. New York: The 27th ACM International Conference on Multimedia, 2019.

[36] Chen H S, Suter D, Wu Q Q, et al. End-to-end learning of object motion estimation from retinal events for event-based object tracking [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 10534‒10541.

[37] Rebecq H, Gallego G, Mueggler E, et al. EMVS: Event-based multi-view stereo—3D reconstruction with an event camera in real-time [J]. International Journal of Computer Vision, 2018, 126(12): 1394‒1414.

[38] Rebecq H, Gallego G, Scaramuzza D. EMVS: Event-based multi-view stereo [C]. York: The British Machine Vision Conference, 2016.

[39] Hidalgo-Carrió J, Gehrig D, Scaramuzza D. Learning monocular dense depth from events [C]. Fukuoka: 2020 International Conference on 3D Vision (3DV), 2020.

[40] Gehrig D, Rüegg M, Gehrig M, et al. Combining events and frames using recurrent asynchronous multimodal networks for monocular depth prediction [J]. IEEE Robotics and Automation Letters, 2021, 6(2): 2822‒2829.

[41] Schraml S, Belbachir A N, Milosevic N, et al. Dynamic stereo vision system for real-time tracking [C]. Paris: 2010 IEEE International Symposium on Circuits and Systems, 2010.

[42] Kogler J, Sulzbachner C, Humenberger M, et al. Address-event based stereo vision with bio-inspired silicon retina imagers [C]. Rijeka: Advances in Theory and Applications of Stereo Vision, 2011.

[43] Kogler J, Humenberger M, Sulzbachner C. Event-based stereo matching approaches for frameless address event stereo data [C]. Las Vegas: The 7th International Conference on Advances in Visual Computing, 2011.

[44] Rogister P, Benosman R, Ieng S H, et al. Asynchronous event-based binocular stereo matching [J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(2): 347‒353.

[45] Lee J, Delbruck T, Park P K J, et al. Live demonstration: Gesture-based remote control using stereo pair of dynamic vision sensors [C]. Seoul: 2012 IEEE International Symposium on Circuits and Systems (ISCAS), 2012.

[46] Camuñas-Mesa L A, Serrano-Gotarredona T, Ieng S H, et al. On the use of orientation filters for 3D reconstruction in event-driven stereo vision [J]. Frontiers in Neuroscience, 2014, 8: 48.

[47] Tulyakov S, Fleuret F, Kiefel M, et al. Learning an event sequence embedding for dense event-based deep stereo [C]. Seoul: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.

[48] Ahmed S H, Jang H W, Uddin S N, et al. Deep event stereo leveraged by event-to-image translation [C]. Vancouver: The AAAI Conference on Artificial Intelligence, 2021.

[49] Nam Y, Mostafavi M, Yoon K J, et al. Stereo depth from events cameras: Concentrate and focus on the future [C]. New Orleans: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

[50] Zhang K X, Che K W, Zhang J G, et al. Discrete time convolution for fast event-based stereo [C]. New Orleans: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

[51] Cho H, Cho J, Yoon K J. Learning adaptive dense event stereo from the image domain [C]. Vancouver: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[52] Rançon U, Cuadrado-Anibarro J, Cottereau B R, et al. StereoSpike: Depth learning with a spiking neural network [J]. IEEE Access, 2022, 10: 127428‒127439.

[53] Hadviger A, Marković I, Petrović I. Stereo dense depth tracking based on optical flow using frames and events [J]. Advanced Robotics, 2021, 35(3‒4): 141‒152.

[54] Wang Z W, Pan L Y, Ng Y, et al. Stereo hybrid event-frame (SHEF) cameras for 3D perception [C]. Prague: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.

[55] Zuo Y F, Cui L, Peng X, et al. Accurate depth estimation from a hybrid event-RGB stereo setup [C]. Prague: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021.

[56] Mostafavi I S M, Yoon K J, Choi J. Event-intensity stereo: Estimating depth by the best of both worlds [C]. Montreal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[57] 黄铁军‍. 脉冲连续摄影原理与超高速高动态成像验证 [J]. 电子学报, 2022, 50(12): 2919‒2927.
Huang T J. Spiking continuous photographing principle and demonstration on ultrahigh speed and high dynamic imaging [J]. Acta Electronica Sinica, 2022, 50(12): 2919‒2927.

[58] 黄铁军, 余肇飞, 李源, 等‍. 脉冲视觉研究进展 [J]. 中国图象图形学报, 2022, 27(6): 1823‒1839.
Huang T J, Yu Z F, Li Y, et al. Advances in spike vision [J]. Journal of Image and Graphics, 2022, 27(6): 1823‒1839.

[59] Masland R H. The neuronal organization of the retina [J]. Neuron, 2012, 76(2): 266‒280.

[60] Wässle H. Parallel processing in the mammalian retina [J]. Nature Reviews Neuroscience, 2004, 5: 747‒757.

[61] Chang Y K, Zhou C, Hong Y C, et al. 1000 FPS HDR video with a spike-RGB hybrid camera [C]. Vancouver: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[62] Chen S Y, Zhang J Y, Zheng Y J, et al. Enhancing motion deblurring in high-speed scenes with spike streams [C]. New Orleans: Thirty-Seventh Conference on Neural Information Processing Systems, 2023.

[63] Zhu L, Dong S W, Huang T J, et al. A retina-inspired sampling method for visual texture reconstruction [C]. Shanghai: 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019.

[64] Zhao J, Xiong R Q, Huang T J. High-speed motion scene reconstruction for spike camera via motion aligned filtering [C]. Online: 2020 IEEE International Symposium on Circuits and Systems (ISCAS), 2020.

[65] Zhao J, Xiong R Q, Liu H F, et al. Spk2ImgNet: learning to reconstruct dynamic scene from continuous spike stream [C]. Nashville: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[66] Zhu L, Zheng Y L, Geng M Y, et al. Recurrent spike-based image restoration under general illumination [EB/OL]. (2023-08-06)‍[2023-11-01]. http://arxiv.org/abs/2308.03018.pdf.

[67] Zhang J, Jia S, Yu Z, et al. Learning temporal-ordered representation for spike streams based on discrete wavelet transforms [C]. Washington, DC: The AAAI Conference on Artificial Intelligence, 2023.

[68] Zhu L, Dong S W, Li J N, et al. Retina-like visual image reconstruction via spiking neural model [C]. Seattle: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[69] Zheng Y J, Zheng L X, Yu Z F, et al. High-speed image reconstruction through short-term plasticity for spiking cameras [C]. Nashville: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[70] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation [C]. Munich: International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015.

[71] Bi G Q, Poo M M. Synaptic modifications in cultured hippocampal neurons: Dependence on spike timing, synaptic strength, and postsynaptic cell type [J]. The Journal of Neuroscience, 1998, 18(24): 10464‒10472.

[72] Tsodyks M, Pawelzik K, Markram H. Neural networks with dynamic synapses [J]. Neural Computation, 1998, 10(4): 821‒835.

[73] Tsodyks M V, Markram H. The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability [J]. Proceedings of the National Academy of Sciences of the United States of America, 1997, 94(2): 719‒723.

[74] Chen S, Yu Z, Huang T. Self-supervised joint dynamic scene reconstruction and optical flow estimation for spiking camera [C]. Washington, DC: The AAAI Conference on Artificial Intelligence, 2023.

[75] Dong Y C, Zhao J, Xiong R Q, et al. 3D residual interpolation for spike camera demosaicing [C]. Bordeaux: 2022 IEEE International Conference on Image Processing (ICIP), 2022.

[76] Zhao J, Xie J Y, Xiong R Q, et al. Super resolve dynamic scene from continuous spike streams [C]. Montreal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[77] Mostafavi I S M, Choi J, Yoon K J. Learning to super resolve intensity images from events [C]. Seattle: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[78] Weng W M, Zhang Y Y, Xiong Z W. Boosting event stream super-resolution with a recurrent neural network [C]. Tel Aviv: Computer Vision-ECCV 2022: 17th European Conference, 2022.

[79] Zhao J, Xiong R, Zhang J, et al. Learning to super-resolve dynamic scenes for neuromorphic spike camera [C]. Washington, DC: The AAAI Conference on Artificial Intelligence, 2023.

[80] Xiang X J, Zhu L, Li J N, et al. Learning super-resolution reconstruction for high temporal resolution spike stream [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(1): 16‒29.

[81] Duan P Q, Ma Y, Zhou X Y, et al. NeuroZoom: Denoising and super resolving neuromorphic events and spikes [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(12): 15219‒15232.

[82] Duan P Q, Wang Z W, Zhou X Y, et al. EventZoom: Learning to denoise and super resolve neuromorphic events [C]. Nashville: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[83] Zhang S, Zhang Y, Jiang Z, et al. Learning to see in the dark with events [C]. Cham: European Conference on Computer Vision, 2020.

[84] Rebecq H, Ranftl R, Koltun V, et al. High speed and high dynamic range video with an event camera [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6): 1964‒1980.

[85] Zou Y H, Zheng Y Q, Takatani T, et al. Learning to reconstruct high speed and high dynamic range videos from events [C]. Nashville: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.

[86] Xu F, Yu L, Wang B S, et al. Motion deblurring with real events [C]. Montreal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[87] Shang W, Ren D W, Zou D Q, et al. Bringing events into video deblurring with non-consecutively blurry frames [C]. Montreal: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021.

[88] Sun L, Sakaridis C, Liang J Y, et al. Event-based fusion for motion deblurring with cross-modal attention [C]. Tel Aviv: Computer Vision-ECCV 2022: 17th European Conference, 2022.

[89] Sun L, Sakaridis C, Liang J Y, et al. Event-based frame interpolation with ad-hoc deblurring [C]. Vancouver: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[90] Zhang X, Yu L, Yang W, et al. Generalizing event-based motion deblurring in real-world scenarios [EB/OL]. (2023-08-11)[2023-11-01]. http://arxiv.org/abs/2308.05932.pdf.

[91] Cho H, Jeong Y, Kim T, et al. Non-coaxial event-guided motion deblurring with spatial alignment [C]. Montreal: The IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

[92] Han J, Zhou C, Duan P Q, et al. Neuromorphic camera guided high dynamic range imaging [C]. Seattle: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[93] Shaw R, Catley-Chandar S, Leonardis A, et al. HDR reconstruction from bracketed exposures and events [EB/OL]. (2022-03-28)[2023-11-01]. http://arxiv.org/abs/2203.14825.pdf.

[94] Messikommer N, Georgoulis S, Gehrig D, et al. Multi-bracket high dynamic range imaging with event cameras [C]. New Orleans: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022.

[95] Yang Y X, Han J, Liang J X, et al. Learning event guided high dynamic range video reconstruction [C]. Vancouver: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.

[96] Liang J, Yang Y, Li B, et al. Coherent Event Guided Low-Light Video Enhancement [C]. Paris: The IEEE/CVF International Conference on Computer Vision (ICCV), 2023.

[97] Janai J, Güney F, Behl A, et al. Computer vision for autonomous vehicles: Problems, datasets and state of the art [J]. Foundations and Trends® in Computer Graphics and Vision, 2020, 12(1‒3): 1‒308.

[98] Gallego G, Rebecq H, Scaramuzza D. A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation [C]. Salt Lake City: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.

[99] Gallego G, Gehrig M, Scaramuzza D. Focus is all you need: Loss functions for event-based vision [C]. Long Beach: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[100] Gallego G, Gehrig M, Scaramuzza D. Focus is all you need: Loss functions for event-based vision [C]. Long Beach: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[101] Stoffregen T, Kleeman L. Simultaneous Optical Flow and Segmentation (SOFAS) using dynamic vision sensor [EB/OL]. (2018-05-31)[2023-11-01]. http://arxiv.org/abs/1805.12326.pdf.

[102] Shiba S, Aoki Y, Gallego G. Secrets of event-based optical flow [C]. Cham: European Conference on Computer Vision, 2022.

[103] Hu L W, Zhao R, Ding Z L, et al. Optical flow estimation for spiking camera [C]. New Orleans: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

[104] Ponghiran W, Liyanagedera C M, Roy K. Event-based temporally dense optical flow estimation with sequential learning [EB/OL]. (2022-10-03)[2023-11-01]. http://arxiv.org/abs/2210.01244.pdf.

[105] Liu H T, Chen G, Qu S Q, et al. TMA: Temporal motion aggregation for event-based optical flow [EB/OL]. (2023-03-21)[2023-11-01]. http://arxiv.org/abs/2303.11629.pdf.

[106] Gehrig M, Muglikar M, Scaramuzza D. Dense continuous-time optical flow from events and frames [EB/OL]. (2022-03-25)[2023-11-01]. http://arxiv.org/abs/2203.13674.pdf.

[107] Zhao R, Xiong R, Zhao J, et al. Learning optical flow from continuous spike streams [J]. Advances in Neural Information Processing Systems, 2022, 35: 7905‒7920.

[108] Gehrig M, Millhäusler M, Gehrig D, et al. E-RAFT: Dense optical flow from event cameras [C]. London: 2021 International Conference on 3D Vision (3DV), 2021.

[109] Orchard G, Benosman R, Etienne-Cummings R, et al. A spiking neural network architecture for visual motion estimation [C]. Rotterdam: 2013 IEEE Biomedical Circuits and Systems Conference (BioCAS), 2013.

[110] Paredes-Valles F, Scheper K Y W, de Croon G C H E. Unsupervised learning of a hierarchical spiking neural network for optical flow estimation: From events to global motion perception [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2051‒2064.

[111] Haessig G, Cassidy A, Alvarez R, et al. Spiking optical flow for event-based sensors using IBM´s TrueNorth neurosynaptic system [J]. IEEE Transactions on Biomedical Circuits and Systems, 2018, 12(4): 860‒870.

[112] Lee C, Kosta A K, Zhu A Z, et al. Spike-FlowNet: Event-based optical flow estimation with energy-efficient hybrid neural networks [C]. Glasgow: Computer Vision—ECCV 2020: 16th European Conference, 2020.

[113] Lee C, Kosta A K, Roy K. Fusion-FlowNet: Energy-efficient optical flow estimation using sensor fusion and deep fused spiking-analog network architectures [C]. Philadelphia: 2022 International Conference on Robotics and Automation (ICRA), 2022.

[114] Wan Z X, Dai Y C, Mao Y X. Learning dense and continuous optical flow from an event camera [J]. IEEE Transactions on Image Processing, 2022, 31: 7237‒7251.

[115] Wan Z X, Mao Y X, Zhang J, et al. RPEFlow: Multimodal fusion of RGB-PointCloud-event for joint optical flow and scene flow estimation [EB/OL]. (2023-09-26)[2023-11-01]. http://arxiv.org/abs/2309.15082.pdf.

[116] Chen N F Y. Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion [C]. Salt Lake City: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018.

[117] Perot E, de Tournemire P, Nitti D, et al. Learning to detect objects with a 1 megapixel event camera [C]. Vancouver: The 34th International Conference on Neural Information Processing Systems, 2020.

[118] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection [C]. Las Vegas: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.

[119] Cannici M, Ciccone M, Romanoni A, et al. Asynchronous convolutional networks for object detection in neuromorphic cameras [C]. Long Beach: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019.

[120] Hu Y H, Delbruck T, Liu S C. Learning to exploit multiple vision modalities by using grafted networks [C]. Glasgow: Computer Vision-ECCV 2020: 16th European Conference, 2020.

[121] Zhang J Q, Dong B, Zhang H W, et al. Spiking transformers for event-based single object tracking [C]. New Orleans: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

[122] Zhu Y Y, Zhang Y, Xie X D, et al. An FPGA accelerator for high-speed moving objects detection and tracking with a spike camera [J]. Neural Computation, 2022, 34(8): 1812‒1839.

[123] Li J N, Dong S W, Yu Z F, et al. Event-based vision enhanced: A joint detection framework in autonomous driving [C]. Shanghai: 2019 IEEE International Conference on Multimedia and Expo (ICME), 2019.

[124] Li J N, Wang X, Zhu L, et al. Retinomorphic object detection in asynchronous visual streams [C]. Vancouver: The AAAI Conference on Artificial Intelligence, 2022.

[125] Zhang J, Wang Y, Liu W, et al. Frame-event alignment and fusion network for high frame rate tracking [C]. Vancouver: The IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023.

[126] Zhang J Y, Tang L L, Yu Z F, et al. Spike transformer: Monocular depth estimation for spiking camera [C]. Tel Aviv: Computer Vision-ECCV 2022: 17th European Conference, 2022.

[127] Wang Y X, Li J N, Zhu L, et al. Learning stereo depth estimation with bio-inspired spike cameras [C]. Taiwan: 2022 IEEE International Conference on Multimedia and Expo (ICME), 2022.

[128] Zheng Y J, Zhang J Y, Zhao R, et al. SpikeCV: Open a continuous computer vision era [EB/OL]. (2023-05-21)‍[2023-11-01].https://arxiv.org/abs/2303.11684.

[129] Kim J, Bae J, Park G, et al. N-imagenet: Towards robust, fine-grained object recognition with event cameras [C]. Montreal: The IEEE/CVF International Conference on Computer Vision (ICCV), 2021.