
记忆成像
Lu Fang, Mengqi Ji, Xiaoyun Yuan, Jing He, Jianing Zhang, Yinheng Zhu, Tian Zheng, Leyao Liu, Bin Wang, Qionghai Dai
工程(英文) ›› 2023, Vol. 25 ›› Issue (6) : 101-109.
记忆成像
Engram-Driven Videography
宽视场高分辨光场成像是大场景多对象感知与理解的关键。现有的结构阵列成像系统沿用单像感器均匀感知原理,系统固化难扩展、拼接重建欠鲁棒。与此同时,现有视觉理解架构严格遵循先成像后处理的单一前馈路径,图像的感知重建与语义理解相互独立。不同的是,人类视觉感知和理解系统具有前馈和反馈两条通路,前馈通路从视觉输入中提取对象表征(称为记忆印迹),而在反馈通路中,相关的印迹被重新激活,用于生成对象的高分辨想象。受此启发,本文提出了一种双路径成像机制,称为记忆成像。我们先提取大场景的整体表征,将实例级的局部对象抽象为记忆印迹,再建立场景与局部对象的双向关联关系来实现感知与理解。成像系统会在激发-抑制状态和关联状态之间交替:在前一种状态下,像素级细节会被动态巩固为记忆印迹或受到抑制。而在关联状态下,基于当前观测的对象图像信息,对象对应的印迹会被激活,通过时空一致的映射合成当前时刻的高分辨图像细节。实验结果表明,记忆成像系统有望改变传统的成像范式,在大场景多对象复杂关系感知与理解中潜力巨大。
Sensing and understanding large-scale dynamic scenes require a high-performance imaging system. Conventional imaging systems pursue higher capability by simply increasing the pixel resolution via stitching cameras at the expense of a bulky system. Moreover, they strictly follow the feedforward pathway: that is, their pixel-level sensing is independent of semantic understanding. Differently, a human visual system owns superiority with both feedforward and feedback pathways: The feedforward pathway extracts object representation (referred to as memory engram) from visual inputs, while, in the feedback pathway, the associated engram is reactivated to generate hypotheses about an object. Inspired by this, we propose a dual-pathway imaging mechanism, called engram-driven videography. We start by abstracting the holistic representation of the scene, which is associated bidirectionally with local details, driven by an instance-level engram. Technically, the entire system works by alternating between the excitation–inhibition and association states. In the former state, pixel-level details become dynamically consolidated or inhibited to strengthen the instance-level engram. In the association state, the spatially and temporally consistent content becomes synthesized driven by its engram for outstanding videography quality of future scenes. The association state serves as the imaging of future scenes by synthesizing spatially and temporally consistent content driven by its engram. Results of extensive simulations and experiments demonstrate that the proposed system revolutionizes the conventional videography paradigm and shows great potential for videography of large-scale scenes with multi-objects.
Instance-level videography / Memory engram / Large-scale dynamic scene / Feedforward and feedback / Entropy equilibrium
[1] |
Zhang J, Zhu T, Zhang A, Yuan X, Wang Z, Beetschen S, et al. Multiscale-VR: multiscale gigapixel 3D panoramic videography for virtual reality. In: Proceedings of 2020 IEEE International Conference on Computational Photography (ICCP); 2020 Apr 24‒26; LouisSt., MO, USA. New York City: IEEE; 2020. p. 1‒12.
|
[2] |
Li F, Yu J, Chai J. A hybrid camera for motion deblurring and depth map superresolution. In: Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition; 2008 Jun 23‒28; Anchorage, AK, USA. New York City: IEEE; 2008. p. 1‒8.
|
[3] |
Brady DJ, Gehm ME, Stack RA, Marks DL, Kittle DS, Golish DR, et al. Multiscale gigapixel photography. Nature 2012;486(7403):386‒9.
|
[4] |
Li G, Zhao Y, Ji M, Yuan X, Fang L. Zoom in to the details of human-centric videos. In: Proceedings of 2020 IEEE International Conference on Image Processing (ICIP); 2020 Oct 25‒28; Abu Dhabi, United Arab Emirates. New York City: IEEE; 2020. p. 3089‒93.
|
[5] |
Xu Y, Deng Z, Wang M, Xu W, So A-C, Cui S. Voting-based multiagent reinforcement learning for intelligent IoT. IEEE Internet Things J 2021;8(4):2681‒93.
|
[6] |
Zhang J, Koppel A, Bedi AS, Szepesvari C, Wang M. Variational policy gradient method for reinforcement learning with general utilities. 2020. arXiv:2007.02151.
|
[7] |
Ilie A, Welch G. Online control of active camera networks for computer vision tasks. ACM Trans Sens Netw 2014;10(2):1‒40.
|
[8] |
Gu J, Hitomi Y, Mitsunaga T, Nayar S. Coded rolling shutter photography: flexible space‒time sampling. In: Proceedings of 2010 IEEE International Conference on Computational Photography (ICCP); 2010 Mar 29‒30; Cambridge, MA, USA. New York City: IEEE; 2010. p. 1‒8.
|
[9] |
Josselyn SA, Tonegawa S. Memory engrams: recalling the past and imagining the future. Science 2020;367(6473):eaaw4325.
|
[10] |
Tonegawa S, Morrissey MD, Kitamura T. The role of engram cells in the systems consolidation of memory. Nat Rev Neurosci 2018;19(8):485‒98.
|
[11] |
Tonegawa S, Liu X, Ramirez S, Redondo R. Memory engram cells have come of age. Neuron 2015;87(5):918‒31.
|
[12] |
Josselyn SA, Köhler S, Frankland PW. Finding the engram. Nat Rev Neurosci 2015;16(9):521‒34.
|
[13] |
Frankland PW, Bontempi B. The organization of recent and remote memories. Nat Rev Neurosci 2005;6(2):119‒30.
|
[14] |
Dudai Y. The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 2004;55(1):51‒86.
|
[15] |
Marr D. A theory for cerebral neocortex. Proc R Soc Lond B 1970;176(1043):161‒234.
|
[16] |
Kandel ER, Schwartz JH, Jessell TM. Principles of neural science. 4th ed. New York: McGraw-Hill; 2000.
|
[17] |
Kim KI, Kwon Y. Single-image super-resolution using sparse regression and natural image prior. IEEE Trans Pattern Anal Mach Intell 2010;32(6):1127‒33.
|
[18] |
Yang J, Wang Z, Lin Z, Cohen S, Huang T. Coupled dictionary training for image super-resolution. IEEE Trans Image Process 2012;21(8):3467‒78.
|
[19] |
Cao F, Cai M, Tan Y, Zhao J. Image super-resolution via adaptive lp (0 < p < 1) regularization and sparse representation. IEEE Trans Neural Networks Learn Syst 2016;27(7):1550‒61.
|
[20] |
Yu J, Gao X, Tao D, Li X, Zhang K. A unified learning framework for single image super-resolution. IEEE Trans Neural Networks Learn Syst 2013;25(4):780‒92.
|
[21] |
Yang J, Lin Z, Cohen S. Fast image super-resolution based on in-place example regression. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition; 2013 Jun 23‒28; Portland, OR, USA. New York City: IEEE; 2013. p. 1059‒66.
|
[22] |
Freeman WT, Jones TR, Pasztor EC. Example-based super-resolution. IEEE Comput Graphics Appl 2002;22(2):56‒65.
|
[23] |
Kim J, Lee JK, Lee KM. Accurate image super-resolution using very deep convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27‒30; VegasLas, NV, USA. New York City: IEEE; 2016. p. 1646‒54.
|
[24] |
Tai Y, Yang J, Liu X. Image super-resolution via deep recursive residual network. In: Proceedings of 2017 IEEE conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21‒26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 3147‒55.
|
[25] |
Kim J, Lee JK, Lee MK. Deeply-recursive convolutional network for image super-resolution. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27‒30; VegasLas, NV, USA. New York City: IEEE; 2016. p. 1637‒45.
|
[26] |
Tong T, Li G, Liu X, Gao Q. Image super-resolution using dense skip connections. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22‒29; Venice, Italy. New York City: IEEE; 2017. p. 4799‒807.
|
[27] |
Johnson J, Alahi A, Li FF. Perceptual losses for real-time style transfer and superresolution. In: Proceedings of European Conference on Computer Vision (ECCV); 2016 Oct 11‒14; Amsterdam, The Netherlands. Springer; 2016. p. 694‒711.
|
[28] |
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, et al. Photorealistic single image super-resolution using a generative adversarial network. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21‒26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 4681‒90.
|
[29] |
Boominathan V, Mitra K, Veeraraghavan A. Improving resolution and depth-offield of light field cameras using a hybrid imaging system. In: Proceedings of 2014 IEEE International Conference on Computational Photography (ICCP); 2014 May 2‒4; ClaraSanta, CA, USA. New York City: IEEE; 2014. p. 1‒10.
|
[30] |
Wu J, Wang H, Wang X, Zhang Y. A novel light field super-resolution framework based on hybrid imaging system. In: Proceedings of 2015 Visual Communications and Image Processing (VCIP); 2015 Dec 13‒16; Singapore. New York City: IEEE; 2015. p. 1‒4.
|
[31] |
Wang Y, Liu Y, Heidrich W, Dai Q. The light field attachment: turning a DSLR into a light field camera using a low budget camera ring. IEEE Trans Visualization Comput Graphics 2017;23(10):2357‒64.
|
[32] |
Zhang Z, Wang Z, Lin Z, Qi H. Image super-resolution by neural texture transfer. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15‒20; LongBeach, CA, USA. New York City: IEEE; 2019. p. 7982‒91.
|
[33] |
Tan Y, Zheng H, Zhu Y, Yuan X, Lin X, Brady D, et al. CrossNet++: cross-scale large-parallax warping for reference-based super-resolution. IEEE Trans Pattern Anal Mach Intell 2021;43(12):4291‒305.
|
[34] |
Zheng H, Ji M, Wang H, Liu Y, Fang L. Crossnet: an end-to-end reference-based super resolution network using cross-scale warping. In: Proceedings of European Conference on Computer Vision (ECCV); 2018 Sep 8‒14; Munich, Germany. New York City: IEEE; 2018. p. 88‒104.
|
[35] |
Kopf J, Uyttendaele M, Deussen O, Cohen MF. Capturing and viewing gigapixel images. In: Proceedings of Special Interest Group on Computer Graphics and Interactive Techniques Conference; 2007 Aug 5‒9; San Diego, CA, USA. New York City: ACM; 2007. p. 93‒es.
|
[36] |
Brady DJ, Hagen N. Multiscale lens design. Opt Express 2009;17(13):10659‒74.
|
[37] |
Marks DL, Brady DJ. Gigagon: a monocentric lens design imaging 40 gigapixels. In: Proceedings of Imaging Systems 2010; 2010 Jun 7‒8; Tucson, AZ, USA. OSA; 2010. p. ITuC2.
|
[38] |
Cossairt OS, Miau D, Nayar SK. Gigapixel computational imaging. In: Proceedings of 2011 IEEE International Conference on Computational Photography (ICCP); 2011 Apr 8‒10; Pittsburgh, PA, USA. New York City: IEEE; 2011. p. 1‒8.
|
[39] |
Fan J, Suo J, Wu J, Xie H, Shen Y, Chen F, et al. Video-rate imaging of biological dynamics at centimetre scale and micrometre resolution. Nat Photonics 2019;13(11):809‒16.
|
[40] |
Yuan X, Fang L, Dai Q, Brady DJ, Liu Y. Multiscale gigapixel video: a cross resolution image matching and warping approach. In: Proceedings of 2017 IEEE International Conference on Computational Photography (ICCP); 2017 May 12‒14; Stanford, CA, USA. New York City: IEEE; 2017. p. 1‒9.
|
[41] |
Vaseghi SV. Advanced digital signal processing and noise reduction. 3rd ed. West Sussex: John Wiley & Sons; 2006.
|
[42] |
Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2017;39 (6):1137‒49.
|
[43] |
He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22‒29; Venice, Italy. New York City: IEEE; 2017. p. 2961‒9.
|
[44] |
Clark RN. Visual astronomy of the deep sky. Cambridge: Cambridge University Press; 1990.
|
[45] |
Curcio CA, Sloan KR, Kalina RE, Hendrickson AE. Human photoreceptor topography. J Comp Neurol 1990;292(4):497‒523.
|
[46] |
Wauthier FL, Jordan MI, Jojic N. Efficient ranking from pairwise comparisons. In: Proceedings of 30th International Conference on Machine Learning; 2013 Jun 16‒21; Atlanta, GA, USA. ACM; 2013. p. 109‒17.
|
[47] |
Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, et al. FlowNet: learning optical flow with convolutional networks. In: Proceedings of 2015 IEEE International Conference on Computer Vision; 2015 Dec 7‒13; Santiago, Chile. New York City: IEEE; 2015. p. 2758‒66.
|
[48] |
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition; 2017 Jul 21‒26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 2462‒70.
|
[49] |
Bruhn A, Weickert J, Schnörr C. Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int J Comput Vision 2005;61(3):1‒21.
|
[50] |
Wang X, Zhang X, Zhu Y, Guo Y, Yuan X, Xiang L, et al. PANDA: a gigapixel-level human-centric video dataset. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13‒19; Seattle, WA, USA. New York City: IEEE; 2020. p. 3268‒78.
|
[51] |
Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced deep residual networks for single image super-resolution. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2017 Jul 21‒ 26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 136‒44.
|
[52] |
Kolchinsky A, Tracey BD. Estimating mixture entropy with pairwise distances. Entropy 2017;19(7):361.
|
/
〈 |
|
〉 |