基于参考网格的差分眼部外观网络的视线估计

Song Gu, Lihui Wang, Long He, Xianding He, Jian Wang

工程(英文) ›› 2021, Vol. 7 ›› Issue (6) : 777-786.

PDF(1605 KB)
PDF(1605 KB)
工程(英文) ›› 2021, Vol. 7 ›› Issue (6) : 777-786. DOI: 10.1016/j.eng.2020.08.027
研究论文
Article

基于参考网格的差分眼部外观网络的视线估计

作者信息 +

Gaze Estimation via a Differential Eyes’ Appearances Network with a Reference Grid

Author information +
History +

摘要

人类的视线可以有效地传递人们的意图,因此,视线估计方法是智能制造中意图传递的重要研究内容。很多方法通过分析眼部图像,称为眼部图像片,实现视线方向的回归运算。但是,由于眼部图像存在个体差异,这类方法很难建立一个样本无关模型进行视线估计。在本文中,作者假设人眼的外观差异与视线方向差异有直接联系。基于这个假设,本文利用双眼眼部图像片在不同视线时的图像差异估计相应两种视线的差值,构建了差分眼部外观网络(differential eyes' appearances network, DEANet),并在公共数据集中进行训练。本文提出的DEANet主要基于孪生神经网络(Siamese neural network, SNNet)构建,包含两个结构相同的网络分支。多流数据分别输入到此孪生神经网络的两个分支中。两个网络分支共享相同的权值,实现眼部图像片的特征提取,然后对特征进行拼接,从而获得视线方向的差异。只要完成了视线方向差异模型的训练,在少量的校准图像片的情况下,就可以对其他样本的视线方向差异进行估计。由于测试阶段包含了被测试者的眼部信息,因此估计精度进一步提高。此外,本文提出的方法还有效地避免了在训练样本相关模型时需要大量数据的问题。本文还提出了一种参考网格策略,以便在测试阶段有效地选择一些参考眼部图像片,将它们作为网络的一部分输入,从而进一步提高估计精度。在公共数据集上的实验表明,本文提出的方法优于当前的方法。

Abstract

A person's eye gaze can effectively express that person's intentions. Thus, gaze estimation is an important approach in intelligent manufacturing to analyze a person's intentions. Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes, also known as eye patches. However, it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences. In this paper, we hypothesize that the difference in the appearance of each of a person's eyes is related to the difference in the corresponding gaze directions. Based on this hypothesis, a differential eyes' appearances network (DEANet) is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual. Our proposed DEANet is based on a Siamese neural network (SNNet) framework which has two identical branches. A multi-stream architecture is fed into each branch of the SNNet. Both branches of the DEANet that share the same weights extract the features of the patches; then the features are concatenated to obtain the difference of the gaze directions. Once the differential gaze model is trained, a new person's gaze direction can be estimated when a few calibrated eye patches for that person are provided. Because personspecific calibrated eye patches are involved in the testing stage, the estimation accuracy is improved. Furthermore, the problem of requiring a large amount of data when training a person-specific model is effectively avoided. A reference grid strategy is also proposed in order to select a few references as some of the DEANet's inputs directly based on the estimation values, further thereby improving the estimation accuracy. Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.

关键词

视线估计 / 视线方向差 / 孪生神经网 / 跨样本估计 / 人机协作

Keywords

Gaze estimation / Differential gaze / Siamese neural network / Cross-person evaluations / Human–robot collaboration

引用本文

导出引用
Song Gu, Lihui Wang, Long He. 基于参考网格的差分眼部外观网络的视线估计. Engineering. 2021, 7(6): 777-786 https://doi.org/10.1016/j.eng.2020.08.027

参考文献

[1]
Palinko O, Rea F, Sandini G, Sciutti A. Robot reading human gaze: why eye tracking is better than head tracking for human-robot collaboration. In: Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2016 Oct 9–14; Daejeon, Republic of Korea. New York: IEEE; 2016. p. 5048–54.
[2]
Duarte NF, Rakovic M, Tasevski J, Coco MI, Billard A, Santos-Victor J. Action anticipation: reading the intentions of humans and robots. IEEE Robot Autom Lett 2018;3(4):4132–9.
[3]
Thies J, Zollhöfer M, Stamminger M, Theobalt C, Niener M. FaceVR: real-time facial reenactment and eye gaze control in virtual reality. ACM T Graphic 2018;37(2):1–15.
[4]
Krafka K, Khosla A, Kellnhofer P, Kannan H, Bhandarkar S, Matusik W, et al. Eye tracking for everyone. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016 Jun 27–30; Las Vegas, NV, USA. New York: IEEE; 2016. p. 2176–84.
[5]
Liu H, Wang L. Gesture recognition for human-robot collaboration: a review. Int J Ind Ergon 2018;68:355–67.
[6]
Liu H, Wang L. Human motion prediction for human-robot collaboration. J Manuf Syst 2017;44(Pt 2):287–94.
[7]
Wang L. From intelligence science to intelligent manufacturing. Engineering 2019;5(4):615–8.
[8]
Liu H, Fang T, Zhou T, Wang L. Towards robust human–robot collaborative manufacturing: multimodal fusion. IEEE Access 2018;6:74762–71.
[9]
Day CP. Robotics in industry-their role in intelligent manufacturing. Engineering 2018;4(4):440–5.
[10]
Bulling A, Roggen D, Tröster G, Tröster G. Wearable EOG goggles: seamless sensing and context-awareness in everyday environments. J Ambient Intell Smart Environ 2009;1(2):157–71.
[11]
Hansen DW, Ji Q. In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans Pattern Anal Mach Intell 2010;32(3):478–500.
[12]
Valenti R, Sebe N, Gevers T. Combining head pose and eye location information for gaze estimation. IEEE Trans Image Process 2011;21(2):802–15.
[13]
Alberto Funes Mora K, Odobez JM. Geometric generative gaze estimation (G3E) for remote RGB-D cameras. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. New York: IEEE; 2014. p. 1773–80.
[14]
Zhang X, Sugano Y, Bulling A. Evaluation of appearance-based methods and implications for gaze-based applications. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; 2019 May; Glasgow Scotland, UK. New York: Association for Computing Machinery; 2019. p. 1–13.
[15]
Lu W, Li Y, Cheng Y, Meng D, Liang B, Zhou P. Early fault detection approach with deep architectures. IEEE Trans Instrum Meas 2018;67(7):1679–89.
[16]
Zhang X, Sugano Y, Fritz M, Bulling A. MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans Pattern Anal Mach Intell 2019;41(1):162–75.
[17]
Yu Y, Liu G, Odobez JM. Deep multitask gaze estimation with a constrained landmark-gaze model. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 456–74.
[18]
Fischer T, Jin Chang H, Demiris Y. Rt-gene: real-time eye gaze estimation in natural environments. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 334–52.
[19]
Choe KW, Blake R, Lee SH. Pupil size dynamics during fixation impact the accuracy and precision of video-based gaze estimation. Vision Res 2016;118:48–59.
[20]
Guestrin ED, Eizenman M. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans Biomed Eng 2016;53 (6):1124–33.
[21]
Lu F, Sugano Y, Okabe T, Sato Y. Adaptive linear regression for appearancebased gaze estimation. IEEE Trans Pattern Anal Mach Intell 2014;36 (10):2033–46.
[22]
Huang MX, Kwok TC, Ngai G, Leong HV, Chan SC. Building a self-learning eye gaze model from user interaction data. In: Proceedings of the 22nd ACM international conference on Multimedia; 2014 Nov; Orlando Florida, USA. New York: Association for Computing Machinery; 2014. p. 1017–20.
[23]
Liu G, Yu Y, Mora KAF, Odobez JM. A differential approach for gaze estimation. IEEE Trans Pattern Anal Mach Intell 2019;43(3):1092–9.
[24]
Venturelli M, Borghi G, Vezzani R, Cucchiara R. From depth data to head pose estimation: a siamese approach. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 5); 2017 Feb 27–Mar 1; Porto, Portugal. New York: Springer; 2017. p. 194–201.
[25]
Liu G, Yu Y, Mora KAF, Odobez JM. A differential approach for gaze estimation with calibration. IEEE Trans Pattern Anal Mach Intell 2021;43(3):1092–9.
[26]
Bromley J, Guyon I, Lecun Y, Säckinger E, Shah R. Signature verification using a ‘‘siamese” time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems; 1993 Nov; Denver, CO, USA: Morgan Kaufmann Publishers; 1994. p. 737–44.
[27]
Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1; 2014 Dec. Cambridge: MIT Press; 2014. p. 568–76.
[28]
Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. In: Proceeding of International Conference on Learning Representations 2015; 2015 May 7–9; San Diego, CA, USA. New York: WikiCFP; 2015.
[29]
Zhang X, Sugano Y, Fritz M, Bulling A. Appearance-based gaze estimation in the wild. In: Proceeding of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 7–12; Boston, MA, USA. New York: IEEE; 2015. p. 4511–20.
[30]
Lian D, Hu L, Luo W, Xu Y, Duan L, Yu J, et al. Multiview multitask gaze estimation with deep convolutional neural networks. IEEE Trans Neural Netw Learn Syst 2019;30(10):3010–23.
[31]
Park S, Spurr A, Hilliges O. Deep pictorial gaze estimation. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 741–57.
[32]
Liu J, Francis BSL, Rajan D. Free-head appearance-based eye gaze estimation on mobile devices. In: Proceeding of 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2019 Feb 11–13; Okinawa, Japan. New York: IEEE; 2019. p. 232–7.
[33]
Wong ET, Yean S, Hu Q, Lee BS, Liu J, Deepu R. Gaze estimation using residual neural network. In: Proceeding of 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops); 2019 Mar 11–15; Kyoto, Japan. New York: IEEE; 2019. p. 411–4.
[34]
Dubey N, Ghosh S, Dhall A. Unsupervised learning of eye gaze representation from the web. In: Proceeding of 2019 International Joint Conference on Neural Networks (IJCNN); 2019 Jul 14–19; Budapest, Hungary. New York: IEEE; 2019. arXiv:1904.02459v1.
[35]
Funes-Mora KA, Odobez JM. Gaze estimation in the 3D space using RGB-D sensors. Int J Comput Vis 2016;118(2):194–216.
[36]
Funes Mora KA, Monay F, Odobez JM. Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceeding of the Symposium on Eye Tracking Research and Applications; 2014 Mar 26–28; Florida, UF, USA. New York: Association for Computing Machinery; 2014. p. 255–8.
[37]
Sugano Y, Fritz M, Andreas Bulling X, et al. It’s written all over your face: fullface appearance-based gaze estimation. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2017 Jul 21–26; Honolulu, HI, USA. New York: IEEE; 2017. p. 51–60.
[38]
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Proceeding of the 25th International Conference on Neural Information Processing Systems—Volume 1; 2012 Dec 14–16; Siem Reap, Cambodia. LaneRed Hook: Curran Associates Inc; 2012. p. 1097–105.
[39]
Ogusu R, Yamanaka T. Lpm: learnable pooling module for efficient full-face gaze estimation. In: Proceeding of 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019); 2019 May 14–18; Lille, France. New York: IEEE; 2019. p. 1–5.
[40]
Sugano Y, Matsushita Y, Sato Y. Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. New York: IEEE; 2014. p. 1821–8.
[41]
Zhang X, Sugano Y, Bulling A. Revisiting data normalization for appearancebased gaze estimation. In: Proceeding of the 2018 ACM Symposium on Eye Tracking Research & Applications. 2018 Jun 14–17; Warsaw, Poland. New York: Association for Computing Machinery; 2018. p. 1–9.
[42]
Sugano Y, Matsushita Y, Sato Y, Koike H. An incremental learning method for unconstrained gaze estimation. In: Proceeding of European Conference on Computer Vision; 2008 Oct 12–18; Marseille, France. Berlin: Springer; 2008. p. 656–67.
[43]
Zhang X, HuangMX, Sugano Y, Bulling A. Training person-specific gaze estimators from user interactions with multiple devices. In: Proceeding of the 2018 CHI Conference on Human Factors in Computing Systems; 2018 Apr 21–26;Montréal, QC, Canada. New York: Association for Computing Machinery; 2018. p. 624.
[44]
Yu Y, Liu G, Odobez JM. Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: Proceeding of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15–20; Long Beach, CA, USA. New York: IEEE; 2019. p. 2019.
[45]
Veges M, Varga V, Lorincz A, András L. 3D human pose estimation with } Siamese equivariant embedding. Neurocomputing 2019;339:194–201.
[46]
Doumanoglou A, Balntas V, Kouskouridas R, Kim TK. Siamese regression networks with efficient mid-level feature extraction for 3D object pose estimation. In: Proceeding of 29th Conference on Neural Information Processing Systems (NIPS 2016); 2016 Dec 5–10; Barcelona, Spain; 2016.
[47]
Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F. Discriminative learning of deep convolutional feature point descriptors. In: Proceeding of 2015 IEEE International Conference on Computer Vision (ICCV); 2015 Dec 7–13; Santiago, Chile. New York: IEEE; 2015. p. 118–26.
[48]
Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, et al. Learning negrained image similarity with deep ranking. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. Washington, DC: IEEE Computer Society; 2014. p. 1386–93.
[49]
Baltrusaitis T, Robinson P, Morency LP. Continuous conditional neural fields for structured regression. In: Proceeding of European conference on computer vision; 2014 Sep 6–12; Zurich, Switzerland. Berlin: Springer; 2014. p. 593–608.
[50]
Lepetit V, Moreno-Noguer F, Fua P. EPnP: an accurate o(n) solution to the PnP problem. Int J Comput Vis 2009;81(2):155–66.
[51]
Schneider T, Schauerte B, Stiefelhagen R. Manifold alignment for person independent appearance-based gaze estimation. In: Proceeding of 2014 22nd International Conference on Pattern Recognition; 2014 Aug 24–28; Stockholm, Sweden. New York: IEEE; 2014. p. 1167–72.
[52]
Mora KAF, Odobez JM. Person independent 3D gaze estimation from remote RGB-D cameras. In: Proceeding of 2013 IEEE International Conference on Image Processing. 2013 Sep 15–18; Melbourne, Australia. New York: IEEE; 2013. p. 2787–91.
[53]
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–324.
PDF(1605 KB)

Accesses

Citation

Detail

段落导航
相关文章

/