基于参考网格的差分眼部外观网络的视线估计

2021年第7卷第6期

摘要

关键词

图片

参考文献

相关研究

回顶部

《工程（英文）》 >> 2021年第7卷第6期 doi: 10.1016/j.eng.2020.08.027

基于参考网格的差分眼部外观网络的视线估计

^a Chengdu Aeronautic Polytechnic, Chengdu 610100, China
^b Department of Production Engineering, KTH Royal Institute of Technology, Stockholm 10044, Sweden

收稿日期： 2019-11-08 修回日期： 2020-06-11 录用日期： 2020-08-06 发布日期： 2021-04-30

HTML145 PDF 75 收藏 0

摘要

人类的视线可以有效地传递人们的意图，因此，视线估计方法是智能制造中意图传递的重要研究内容。很多方法通过分析眼部图像，称为眼部图像片，实现视线方向的回归运算。但是，由于眼部图像存在个体差异，这类方法很难建立一个样本无关模型进行视线估计。在本文中，作者假设人眼的外观差异与视线方向差异有直接联系。基于这个假设，本文利用双眼眼部图像片在不同视线时的图像差异估计相应两种视线的差值，构建了差分眼部外观网络（differential eyes’ appearances network, DEANet），并在公共数据集中进行训练。本文提出的DEANet主要基于孪生神经网络（Siamese neural network, SNNet）构建，包含两个结构相同的网络分支。多流数据分别输入到此孪生神经网络的两个分支中。两个网络分支共享相同的权值，实现眼部图像片的特征提取，然后对特征进行拼接，从而获得视线方向的差异。只要完成了视线方向差异模型的训练，在少量的校准图像片的情况下，就可以对其他样本的视线方向差异进行估计。由于测试阶段包含了被测试者的眼部信息，因此估计精度进一步提高。此外，本文提出的方法还有效地避免了在训练样本相关模型时需要大量数据的问题。本文还提出了一种参考网格策略，以便在测试阶段有效地选择一些参考眼部图像片，将它们作为网络的一部分输入，从而进一步提高估计精度。在公共数据集上的实验表明，本文提出的方法优于当前的方法。

关键词

视线估计 ; 视线方向差 ; 孪生神经网 ; 跨样本估计 ; 人机协作

图片

图1

图2

图3

图4

图5

图6

图7

图8

参考文献

[ 1 ] Palinko O, Rea F, Sandini G, Sciutti A. Robot reading human gaze: why eye tracking is better than head tracking for human-robot collaboration. In: Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2016 Oct 9–14; Daejeon, Republic of Korea. New York: IEEE; 2016. p. 5048–54.

[ 2 ] Duarte NF, Rakovic M, Tasevski J, Coco MI, Billard A, Santos-Victor J. Action anticipation: reading the intentions of humans and robots. IEEE Robot Autom Lett 2018;3(4):4132–9. 链接1

[ 3 ] Thies J, Zollhöfer M, Stamminger M, Theobalt C, Niener M. FaceVR: real-time facial reenactment and eye gaze control in virtual reality. ACM T Graphic 2018;37(2):1–15. 链接1

[ 4 ] Krafka K, Khosla A, Kellnhofer P, Kannan H, Bhandarkar S, Matusik W, et al. Eye tracking for everyone. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016 Jun 27–30; Las Vegas, NV, USA. New York: IEEE; 2016. p. 2176–84. 链接1

[ 5 ] Liu H, Wang L. Gesture recognition for human-robot collaboration: a review. Int J Ind Ergon 2018;68:355–67. 链接1

[ 6 ] Liu H, Wang L. Human motion prediction for human-robot collaboration. J Manuf Syst 2017;44(Pt 2):287–94. 链接1

[ 7 ] Wang L. From intelligence science to intelligent manufacturing. Engineering 2019;5(4):615–8. 链接1

[ 8 ] Liu H, Fang T, Zhou T, Wang L. Towards robust human–robot collaborative manufacturing: multimodal fusion. IEEE Access 2018;6:74762–71. 链接1

[ 9 ] Day CP. Robotics in industry-their role in intelligent manufacturing. Engineering 2018;4(4):440–5. 链接1

[10] Bulling A, Roggen D, Tröster G, Tröster G. Wearable EOG goggles: seamless sensing and context-awareness in everyday environments. J Ambient Intell Smart Environ 2009;1(2):157–71. 链接1

[11] Hansen DW, Ji Q. In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans Pattern Anal Mach Intell 2010;32(3):478–500. 链接1

[12] Valenti R, Sebe N, Gevers T. Combining head pose and eye location information for gaze estimation. IEEE Trans Image Process 2011;21(2):802–15. 链接1

[13] Alberto Funes Mora K, Odobez JM. Geometric generative gaze estimation (G3E) for remote RGB-D cameras. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. New York: IEEE; 2014. p. 1773–80. 链接1

[14] Zhang X, Sugano Y, Bulling A. Evaluation of appearance-based methods and implications for gaze-based applications. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; 2019 May; Glasgow Scotland, UK. New York: Association for Computing Machinery; 2019. p. 1–13. 链接1

[15] Lu W, Li Y, Cheng Y, Meng D, Liang B, Zhou P. Early fault detection approach with deep architectures. IEEE Trans Instrum Meas 2018;67(7):1679–89. 链接1

[16] Zhang X, Sugano Y, Fritz M, Bulling A. MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans Pattern Anal Mach Intell 2019;41(1):162–75. 链接1

[17] Yu Y, Liu G, Odobez JM. Deep multitask gaze estimation with a constrained landmark-gaze model. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 456–74. 链接1

[18] Fischer T, Jin Chang H, Demiris Y. Rt-gene: real-time eye gaze estimation in natural environments. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 334–52. 链接1

[19] Choe KW, Blake R, Lee SH. Pupil size dynamics during fixation impact the accuracy and precision of video-based gaze estimation. Vision Res 2016;118:48–59. 链接1

[20] Guestrin ED, Eizenman M. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans Biomed Eng 2016;53 (6):1124–33. 链接1

[21] Lu F, Sugano Y, Okabe T, Sato Y. Adaptive linear regression for appearancebased gaze estimation. IEEE Trans Pattern Anal Mach Intell 2014;36 (10):2033–46. 链接1

[22] Huang MX, Kwok TC, Ngai G, Leong HV, Chan SC. Building a self-learning eye gaze model from user interaction data. In: Proceedings of the 22nd ACM international conference on Multimedia; 2014 Nov; Orlando Florida, USA. New York: Association for Computing Machinery; 2014. p. 1017–20. 链接1

[23] Liu G, Yu Y, Mora KAF, Odobez JM. A differential approach for gaze estimation. IEEE Trans Pattern Anal Mach Intell 2019;43(3):1092–9. 链接1

[24] Venturelli M, Borghi G, Vezzani R, Cucchiara R. From depth data to head pose estimation: a siamese approach. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 5); 2017 Feb 27–Mar 1; Porto, Portugal. New York: Springer; 2017. p. 194–201. 链接1

[25] Liu G, Yu Y, Mora KAF, Odobez JM. A differential approach for gaze estimation with calibration. IEEE Trans Pattern Anal Mach Intell 2021;43(3):1092–9. 链接1

[26] Bromley J, Guyon I, Lecun Y, Säckinger E, Shah R. Signature verification using a ‘‘siamese” time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems; 1993 Nov; Denver, CO, USA: Morgan Kaufmann Publishers; 1994. p. 737–44. 链接1

[27] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1; 2014 Dec. Cambridge: MIT Press; 2014. p. 568–76. 链接1

[28] Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. In: Proceeding of International Conference on Learning Representations 2015; 2015 May 7–9; San Diego, CA, USA. New York: WikiCFP; 2015. 链接1

[29] Zhang X, Sugano Y, Fritz M, Bulling A. Appearance-based gaze estimation in the wild. In: Proceeding of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 7–12; Boston, MA, USA. New York: IEEE; 2015. p. 4511–20. 链接1

[30] Lian D, Hu L, Luo W, Xu Y, Duan L, Yu J, et al. Multiview multitask gaze estimation with deep convolutional neural networks. IEEE Trans Neural Netw Learn Syst 2019;30(10):3010–23. 链接1

[31] Park S, Spurr A, Hilliges O. Deep pictorial gaze estimation. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 741–57. 链接1

[32] Liu J, Francis BSL, Rajan D. Free-head appearance-based eye gaze estimation on mobile devices. In: Proceeding of 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2019 Feb 11–13; Okinawa, Japan. New York: IEEE; 2019. p. 232–7. 链接1

[33] Wong ET, Yean S, Hu Q, Lee BS, Liu J, Deepu R. Gaze estimation using residual neural network. In: Proceeding of 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops); 2019 Mar 11–15; Kyoto, Japan. New York: IEEE; 2019. p. 411–4. 链接1

[34] Dubey N, Ghosh S, Dhall A. Unsupervised learning of eye gaze representation from the web. In: Proceeding of 2019 International Joint Conference on Neural Networks (IJCNN); 2019 Jul 14–19; Budapest, Hungary. New York: IEEE; 2019. arXiv:1904.02459v1. 链接1

[35] Funes-Mora KA, Odobez JM. Gaze estimation in the 3D space using RGB-D sensors. Int J Comput Vis 2016;118(2):194–216. 链接1

[36] Funes Mora KA, Monay F, Odobez JM. Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceeding of the Symposium on Eye Tracking Research and Applications; 2014 Mar 26–28; Florida, UF, USA. New York: Association for Computing Machinery; 2014. p. 255–8. 链接1

[37] Sugano Y, Fritz M, Andreas Bulling X, et al. It’s written all over your face: fullface appearance-based gaze estimation. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2017 Jul 21–26; Honolulu, HI, USA. New York: IEEE; 2017. p. 51–60. 链接1

[38] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Proceeding of the 25th International Conference on Neural Information Processing Systems—Volume 1; 2012 Dec 14–16; Siem Reap, Cambodia. LaneRed Hook: Curran Associates Inc; 2012. p. 1097–105. 链接1

[39] Ogusu R, Yamanaka T. Lpm: learnable pooling module for efficient full-face gaze estimation. In: Proceeding of 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019); 2019 May 14–18; Lille, France. New York: IEEE; 2019. p. 1–5. 链接1

[40] Sugano Y, Matsushita Y, Sato Y. Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. New York: IEEE; 2014. p. 1821–8. 链接1

[41] Zhang X, Sugano Y, Bulling A. Revisiting data normalization for appearancebased gaze estimation. In: Proceeding of the 2018 ACM Symposium on Eye Tracking Research & Applications. 2018 Jun 14–17; Warsaw, Poland. New York: Association for Computing Machinery; 2018. p. 1–9. 链接1

[42] Sugano Y, Matsushita Y, Sato Y, Koike H. An incremental learning method for unconstrained gaze estimation. In: Proceeding of European Conference on Computer Vision; 2008 Oct 12–18; Marseille, France. Berlin: Springer; 2008. p. 656–67. 链接1

[43] Zhang X, HuangMX, Sugano Y, Bulling A. Training person-specific gaze estimators from user interactions with multiple devices. In: Proceeding of the 2018 CHI Conference on Human Factors in Computing Systems; 2018 Apr 21–26;Montréal, QC, Canada. New York: Association for Computing Machinery; 2018. p. 624. 链接1

[44] Yu Y, Liu G, Odobez JM. Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: Proceeding of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15–20; Long Beach, CA, USA. New York: IEEE; 2019. p. 2019. 链接1

[45] Veges M, Varga V, Lorincz A, András L. 3D human pose estimation with } Siamese equivariant embedding. Neurocomputing 2019;339:194–201. 链接1

[46] Doumanoglou A, Balntas V, Kouskouridas R, Kim TK. Siamese regression networks with efficient mid-level feature extraction for 3D object pose estimation. In: Proceeding of 29th Conference on Neural Information Processing Systems (NIPS 2016); 2016 Dec 5–10; Barcelona, Spain; 2016.

[47] Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F. Discriminative learning of deep convolutional feature point descriptors. In: Proceeding of 2015 IEEE International Conference on Computer Vision (ICCV); 2015 Dec 7–13; Santiago, Chile. New York: IEEE; 2015. p. 118–26. 链接1

[48] Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, et al. Learning negrained image similarity with deep ranking. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. Washington, DC: IEEE Computer Society; 2014. p. 1386–93. 链接1

[49] Baltrusaitis T, Robinson P, Morency LP. Continuous conditional neural fields for structured regression. In: Proceeding of European conference on computer vision; 2014 Sep 6–12; Zurich, Switzerland. Berlin: Springer; 2014. p. 593–608. 链接1

[50] Lepetit V, Moreno-Noguer F, Fua P. EPnP: an accurate o(n) solution to the PnP problem. Int J Comput Vis 2009;81(2):155–66. 链接1

[51] Schneider T, Schauerte B, Stiefelhagen R. Manifold alignment for person independent appearance-based gaze estimation. In: Proceeding of 2014 22nd International Conference on Pattern Recognition; 2014 Aug 24–28; Stockholm, Sweden. New York: IEEE; 2014. p. 1167–72. 链接1

[52] Mora KAF, Odobez JM. Person independent 3D gaze estimation from remote RGB-D cameras. In: Proceeding of 2013 IEEE International Conference on Image Processing. 2013 Sep 15–18; Melbourne, Australia. New York: IEEE; 2013. p. 2787–91. 链接1

[53] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–324. 链接1