Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2023, Volume 25, Issue 6 doi: 10.1016/j.eng.2021.12.012

Engram-Driven Videography

a Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
b Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
c Tsinghua-Berkeley Shenzhen Institute, Shenzhen 518071, China
d Institute for Brain and Cognitive Science, Tsinghua University, Beijing 100084, China
e Department of Automation, Tsinghua University, Beijing 100084, China
f Beijing Laboratory of Brain and Cognitive Intelligence, Beijing Municipal Education Commission, Beijing 100010, China
g Hangzhou Hikvision Digital Technology Co., Ltd., Hangzhou 310012, China

Received: 2021-06-10 Revised: 2021-12-03 Accepted: 2021-12-08 Available online: 2022-02-17

Next Previous

Abstract

Sensing and understanding large-scale dynamic scenes require a high-performance imaging system. Conventional imaging systems pursue higher capability by simply increasing the pixel resolution via stitching cameras at the expense of a bulky system. Moreover, they strictly follow the feedforward pathway: that is, their pixel-level sensing is independent of semantic understanding. Differently, a human visual system owns superiority with both feedforward and feedback pathways: The feedforward pathway extracts object representation (referred to as memory engram) from visual inputs, while, in the feedback pathway, the associated engram is reactivated to generate hypotheses about an object. Inspired by this, we propose a dual-pathway imaging mechanism, called engram-driven videography. We start by abstracting the holistic representation of the scene, which is associated bidirectionally with local details, driven by an instance-level engram. Technically, the entire system works by alternating between the excitation–inhibition and association states. In the former state, pixel-level details become dynamically consolidated or inhibited to strengthen the instance-level engram. In the association state, the spatially and temporally consistent content becomes synthesized driven by its engram for outstanding videography quality of future scenes. The association state serves as the imaging of future scenes by synthesizing spatially and temporally consistent content driven by its engram. Results of extensive simulations and experiments demonstrate that the proposed system revolutionizes the conventional videography paradigm and shows great potential for videography of large-scale scenes with multi-objects.

Figures

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

References

[ 1 ] Zhang J, Zhu T, Zhang A, Yuan X, Wang Z, Beetschen S, et al. Multiscale-VR: multiscale gigapixel 3D panoramic videography for virtual reality. In: Proceedings of 2020 IEEE International Conference on Computational Photography (ICCP); 2020 Apr 24‒26; LouisSt., MO, USA. New York City: IEEE; 2020. p. 1‒12. link1

[ 2 ] Li F, Yu J, Chai J. A hybrid camera for motion deblurring and depth map superresolution. In: Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition; 2008 Jun 23‒28; Anchorage, AK, USA. New York City: IEEE; 2008. p. 1‒8. link1

[ 3 ] Brady DJ, Gehm ME, Stack RA, Marks DL, Kittle DS, Golish DR, et al. Multiscale gigapixel photography. Nature 2012;486(7403):386‒9. link1

[ 4 ] Li G, Zhao Y, Ji M, Yuan X, Fang L. Zoom in to the details of human-centric videos. In: Proceedings of 2020 IEEE International Conference on Image Processing (ICIP); 2020 Oct 25‒28; Abu Dhabi, United Arab Emirates. New York City: IEEE; 2020. p. 3089‒93. link1

[ 5 ] Xu Y, Deng Z, Wang M, Xu W, So A-C, Cui S. Voting-based multiagent reinforcement learning for intelligent IoT. IEEE Internet Things J 2021;8(4):2681‒93. link1

[ 6 ] Zhang J, Koppel A, Bedi AS, Szepesvari C, Wang M. Variational policy gradient method for reinforcement learning with general utilities. 2020. arXiv:2007.02151.

[ 7 ] Ilie A, Welch G. Online control of active camera networks for computer vision tasks. ACM Trans Sens Netw 2014;10(2):1‒40. link1

[ 8 ] Gu J, Hitomi Y, Mitsunaga T, Nayar S. Coded rolling shutter photography: flexible space‒time sampling. In: Proceedings of 2010 IEEE International Conference on Computational Photography (ICCP); 2010 Mar 29‒30; Cambridge, MA, USA. New York City: IEEE; 2010. p. 1‒8. link1

[ 9 ] Josselyn SA, Tonegawa S. Memory engrams: recalling the past and imagining the future. Science 2020;367(6473):eaaw4325. link1

[10] Tonegawa S, Morrissey MD, Kitamura T. The role of engram cells in the systems consolidation of memory. Nat Rev Neurosci 2018;19(8):485‒98. link1

[11] Tonegawa S, Liu X, Ramirez S, Redondo R. Memory engram cells have come of age. Neuron 2015;87(5):918‒31. link1

[12] Josselyn SA, Köhler S, Frankland PW. Finding the engram. Nat Rev Neurosci 2015;16(9):521‒34. link1

[13] Frankland PW, Bontempi B. The organization of recent and remote memories. Nat Rev Neurosci 2005;6(2):119‒30. link1

[14] Dudai Y. The neurobiology of consolidations, or, how stable is the engram? Annu Rev Psychol 2004;55(1):51‒86. link1

[15] Marr D. A theory for cerebral neocortex. Proc R Soc Lond B 1970;176(1043):161‒234. link1

[16] Kandel ER, Schwartz JH, Jessell TM. Principles of neural science. 4th ed. New York: McGraw-Hill; 2000.

[17] Kim KI, Kwon Y. Single-image super-resolution using sparse regression and natural image prior. IEEE Trans Pattern Anal Mach Intell 2010;32(6):1127‒33. link1

[18] Yang J, Wang Z, Lin Z, Cohen S, Huang T. Coupled dictionary training for image super-resolution. IEEE Trans Image Process 2012;21(8):3467‒78. link1

[19] Cao F, Cai M, Tan Y, Zhao J. Image super-resolution via adaptive lp (0 < p < 1) regularization and sparse representation. IEEE Trans Neural Networks Learn Syst 2016;27(7):1550‒61. link1

[20] Yu J, Gao X, Tao D, Li X, Zhang K. A unified learning framework for single image super-resolution. IEEE Trans Neural Networks Learn Syst 2013;25(4):780‒92. link1

[21] Yang J, Lin Z, Cohen S. Fast image super-resolution based on in-place example regression. In: Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition; 2013 Jun 23‒28; Portland, OR, USA. New York City: IEEE; 2013. p. 1059‒66. link1

[22] Freeman WT, Jones TR, Pasztor EC. Example-based super-resolution. IEEE Comput Graphics Appl 2002;22(2):56‒65. link1

[23] Kim J, Lee JK, Lee KM. Accurate image super-resolution using very deep convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27‒30; VegasLas, NV, USA. New York City: IEEE; 2016. p. 1646‒54. link1

[24] Tai Y, Yang J, Liu X. Image super-resolution via deep recursive residual network. In: Proceedings of 2017 IEEE conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21‒26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 3147‒55. link1

[25] Kim J, Lee JK, Lee MK. Deeply-recursive convolutional network for image super-resolution. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27‒30; VegasLas, NV, USA. New York City: IEEE; 2016. p. 1637‒45. link1

[26] Tong T, Li G, Liu X, Gao Q. Image super-resolution using dense skip connections. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22‒29; Venice, Italy. New York City: IEEE; 2017. p. 4799‒807. link1

[27] Johnson J, Alahi A, Li FF. Perceptual losses for real-time style transfer and superresolution. In: Proceedings of European Conference on Computer Vision (ECCV); 2016 Oct 11‒14; Amsterdam, The Netherlands. Springer; 2016. p. 694‒711. link1

[28] Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, et al. Photorealistic single image super-resolution using a generative adversarial network. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21‒26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 4681‒90. link1

[29] Boominathan V, Mitra K, Veeraraghavan A. Improving resolution and depth-offield of light field cameras using a hybrid imaging system. In: Proceedings of 2014 IEEE International Conference on Computational Photography (ICCP); 2014 May 2‒4; ClaraSanta, CA, USA. New York City: IEEE; 2014. p. 1‒10. link1

[30] Wu J, Wang H, Wang X, Zhang Y. A novel light field super-resolution framework based on hybrid imaging system. In: Proceedings of 2015 Visual Communications and Image Processing (VCIP); 2015 Dec 13‒16; Singapore. New York City: IEEE; 2015. p. 1‒4. link1

[31] Wang Y, Liu Y, Heidrich W, Dai Q. The light field attachment: turning a DSLR into a light field camera using a low budget camera ring. IEEE Trans Visualization Comput Graphics 2017;23(10):2357‒64. link1

[32] Zhang Z, Wang Z, Lin Z, Qi H. Image super-resolution by neural texture transfer. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15‒20; LongBeach, CA, USA. New York City: IEEE; 2019. p. 7982‒91. link1

[33] Tan Y, Zheng H, Zhu Y, Yuan X, Lin X, Brady D, et al. CrossNet++: cross-scale large-parallax warping for reference-based super-resolution. IEEE Trans Pattern Anal Mach Intell 2021;43(12):4291‒305. link1

[34] Zheng H, Ji M, Wang H, Liu Y, Fang L. Crossnet: an end-to-end reference-based super resolution network using cross-scale warping. In: Proceedings of European Conference on Computer Vision (ECCV); 2018 Sep 8‒14; Munich, Germany. New York City: IEEE; 2018. p. 88‒104. link1

[35] Kopf J, Uyttendaele M, Deussen O, Cohen MF. Capturing and viewing gigapixel images. In: Proceedings of Special Interest Group on Computer Graphics and Interactive Techniques Conference; 2007 Aug 5‒9; San Diego, CA, USA. New York City: ACM; 2007. p. 93‒es. link1

[36] Brady DJ, Hagen N. Multiscale lens design. Opt Express 2009;17(13):10659‒74. link1

[37] Marks DL, Brady DJ. Gigagon: a monocentric lens design imaging 40 gigapixels. In: Proceedings of Imaging Systems 2010; 2010 Jun 7‒8; Tucson, AZ, USA. OSA; 2010. p. ITuC2. link1

[38] Cossairt OS, Miau D, Nayar SK. Gigapixel computational imaging. In: Proceedings of 2011 IEEE International Conference on Computational Photography (ICCP); 2011 Apr 8‒10; Pittsburgh, PA, USA. New York City: IEEE; 2011. p. 1‒8. link1

[39] Fan J, Suo J, Wu J, Xie H, Shen Y, Chen F, et al. Video-rate imaging of biological dynamics at centimetre scale and micrometre resolution. Nat Photonics 2019;13(11):809‒16. link1

[40] Yuan X, Fang L, Dai Q, Brady DJ, Liu Y. Multiscale gigapixel video: a cross resolution image matching and warping approach. In: Proceedings of 2017 IEEE International Conference on Computational Photography (ICCP); 2017 May 12‒14; Stanford, CA, USA. New York City: IEEE; 2017. p. 1‒9. link1

[41] Vaseghi SV. Advanced digital signal processing and noise reduction. 3rd ed. West Sussex: John Wiley & Sons; 2006. link1

[42] Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 2017;39 (6):1137‒49. link1

[43] He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22‒29; Venice, Italy. New York City: IEEE; 2017. p. 2961‒9. link1

[44] Clark RN. Visual astronomy of the deep sky. Cambridge: Cambridge University Press; 1990.

[45] Curcio CA, Sloan KR, Kalina RE, Hendrickson AE. Human photoreceptor topography. J Comp Neurol 1990;292(4):497‒523. link1

[46] Wauthier FL, Jordan MI, Jojic N. Efficient ranking from pairwise comparisons. In: Proceedings of 30th International Conference on Machine Learning; 2013 Jun 16‒21; Atlanta, GA, USA. ACM; 2013. p. 109‒17. link1

[47] Dosovitskiy A, Fischer P, Ilg E, Häusser P, Hazirbas C, Golkov V, et al. FlowNet: learning optical flow with convolutional networks. In: Proceedings of 2015 IEEE International Conference on Computer Vision; 2015 Dec 7‒13; Santiago, Chile. New York City: IEEE; 2015. p. 2758‒66. link1

[48] Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T. FlowNet 2.0: evolution of optical flow estimation with deep networks. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition; 2017 Jul 21‒26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 2462‒70. link1

[49] Bruhn A, Weickert J, Schnörr C. Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods. Int J Comput Vision 2005;61(3):1‒21. link1

[50] Wang X, Zhang X, Zhu Y, Guo Y, Yuan X, Xiang L, et al. PANDA: a gigapixel-level human-centric video dataset. In: Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2020 Jun 13‒19; Seattle, WA, USA. New York City: IEEE; 2020. p. 3268‒78. link1

[51] Lim B, Son S, Kim H, Nah S, Lee KM. Enhanced deep residual networks for single image super-resolution. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2017 Jul 21‒ 26; Honolulu, HI, USA. New York City: IEEE; 2017. p. 136‒44. link1

[52] Kolchinsky A, Tracey BD. Estimating mixture entropy with pairwise distances. Entropy 2017;19(7):361. link1

Related Research