Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2021, Volume 7, Issue 6 doi: 10.1016/j.eng.2020.08.027

Gaze Estimation via a Differential Eyes’ Appearances Network with a Reference Grid

a Chengdu Aeronautic Polytechnic, Chengdu 610100, China
b Department of Production Engineering, KTH Royal Institute of Technology, Stockholm 10044, Sweden

Received: 2019-11-08 Revised: 2020-06-11 Accepted: 2020-08-06 Available online: 2021-04-30

Next Previous

Abstract

A person’s eye gaze can effectively express that person’s intentions. Thus, gaze estimation is an important approach in intelligent manufacturing to analyze a person’s intentions. Many gaze estimation methods regress the direction of the gaze by analyzing images of the eyes, also known as eye patches. However, it is very difficult to construct a person-independent model that can estimate an accurate gaze direction for every person due to individual differences. In this paper, we hypothesize that the difference in the appearance of each of a person’s eyes is related to the difference in the corresponding gaze directions. Based on this hypothesis, a differential eyes’ appearances network (DEANet) is trained on public datasets to predict the gaze differences of pairwise eye patches belonging to the same individual. Our proposed DEANet is based on a Siamese neural network (SNNet) framework which has two identical branches. A multi-stream architecture is fed into each branch of the SNNet. Both branches of the DEANet that share the same weights extract the features of the patches; then the features are concatenated to obtain the difference of the gaze directions. Once the differential gaze model is trained, a new person’s gaze direction can be estimated when a few calibrated eye patches for that person are provided. Because personspecific calibrated eye patches are involved in the testing stage, the estimation accuracy is improved. Furthermore, the problem of requiring a large amount of data when training a person-specific model is effectively avoided. A reference grid strategy is also proposed in order to select a few references as some of the DEANet’s inputs directly based on the estimation values, further thereby improving the estimation accuracy. Experiments on public datasets show that our proposed approach outperforms the state-of-theart methods.

Figures

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

References

[ 1 ] Palinko O, Rea F, Sandini G, Sciutti A. Robot reading human gaze: why eye tracking is better than head tracking for human-robot collaboration. In: Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); 2016 Oct 9–14; Daejeon, Republic of Korea. New York: IEEE; 2016. p. 5048–54.

[ 2 ] Duarte NF, Rakovic M, Tasevski J, Coco MI, Billard A, Santos-Victor J. Action anticipation: reading the intentions of humans and robots. IEEE Robot Autom Lett 2018;3(4):4132–9. link1

[ 3 ] Thies J, Zollhöfer M, Stamminger M, Theobalt C, Niener M. FaceVR: real-time facial reenactment and eye gaze control in virtual reality. ACM T Graphic 2018;37(2):1–15. link1

[ 4 ] Krafka K, Khosla A, Kellnhofer P, Kannan H, Bhandarkar S, Matusik W, et al. Eye tracking for everyone. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016 Jun 27–30; Las Vegas, NV, USA. New York: IEEE; 2016. p. 2176–84. link1

[ 5 ] Liu H, Wang L. Gesture recognition for human-robot collaboration: a review. Int J Ind Ergon 2018;68:355–67. link1

[ 6 ] Liu H, Wang L. Human motion prediction for human-robot collaboration. J Manuf Syst 2017;44(Pt 2):287–94. link1

[ 7 ] Wang L. From intelligence science to intelligent manufacturing. Engineering 2019;5(4):615–8. link1

[ 8 ] Liu H, Fang T, Zhou T, Wang L. Towards robust human–robot collaborative manufacturing: multimodal fusion. IEEE Access 2018;6:74762–71. link1

[ 9 ] Day CP. Robotics in industry-their role in intelligent manufacturing. Engineering 2018;4(4):440–5. link1

[10] Bulling A, Roggen D, Tröster G, Tröster G. Wearable EOG goggles: seamless sensing and context-awareness in everyday environments. J Ambient Intell Smart Environ 2009;1(2):157–71. link1

[11] Hansen DW, Ji Q. In the eye of the beholder: a survey of models for eyes and gaze. IEEE Trans Pattern Anal Mach Intell 2010;32(3):478–500. link1

[12] Valenti R, Sebe N, Gevers T. Combining head pose and eye location information for gaze estimation. IEEE Trans Image Process 2011;21(2):802–15. link1

[13] Alberto Funes Mora K, Odobez JM. Geometric generative gaze estimation (G3E) for remote RGB-D cameras. In: Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. New York: IEEE; 2014. p. 1773–80. link1

[14] Zhang X, Sugano Y, Bulling A. Evaluation of appearance-based methods and implications for gaze-based applications. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems; 2019 May; Glasgow Scotland, UK. New York: Association for Computing Machinery; 2019. p. 1–13. link1

[15] Lu W, Li Y, Cheng Y, Meng D, Liang B, Zhou P. Early fault detection approach with deep architectures. IEEE Trans Instrum Meas 2018;67(7):1679–89. link1

[16] Zhang X, Sugano Y, Fritz M, Bulling A. MPIIGaze: real-world dataset and deep appearance-based gaze estimation. IEEE Trans Pattern Anal Mach Intell 2019;41(1):162–75. link1

[17] Yu Y, Liu G, Odobez JM. Deep multitask gaze estimation with a constrained landmark-gaze model. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 456–74. link1

[18] Fischer T, Jin Chang H, Demiris Y. Rt-gene: real-time eye gaze estimation in natural environments. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 334–52. link1

[19] Choe KW, Blake R, Lee SH. Pupil size dynamics during fixation impact the accuracy and precision of video-based gaze estimation. Vision Res 2016;118:48–59. link1

[20] Guestrin ED, Eizenman M. General theory of remote gaze estimation using the pupil center and corneal reflections. IEEE Trans Biomed Eng 2016;53 (6):1124–33. link1

[21] Lu F, Sugano Y, Okabe T, Sato Y. Adaptive linear regression for appearancebased gaze estimation. IEEE Trans Pattern Anal Mach Intell 2014;36 (10):2033–46. link1

[22] Huang MX, Kwok TC, Ngai G, Leong HV, Chan SC. Building a self-learning eye gaze model from user interaction data. In: Proceedings of the 22nd ACM international conference on Multimedia; 2014 Nov; Orlando Florida, USA. New York: Association for Computing Machinery; 2014. p. 1017–20. link1

[23] Liu G, Yu Y, Mora KAF, Odobez JM. A differential approach for gaze estimation. IEEE Trans Pattern Anal Mach Intell 2019;43(3):1092–9. link1

[24] Venturelli M, Borghi G, Vezzani R, Cucchiara R. From depth data to head pose estimation: a siamese approach. In: Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Volume 5); 2017 Feb 27–Mar 1; Porto, Portugal. New York: Springer; 2017. p. 194–201. link1

[25] Liu G, Yu Y, Mora KAF, Odobez JM. A differential approach for gaze estimation with calibration. IEEE Trans Pattern Anal Mach Intell 2021;43(3):1092–9. link1

[26] Bromley J, Guyon I, Lecun Y, Säckinger E, Shah R. Signature verification using a ‘‘siamese” time delay neural network. In: Proceedings of the 6th International Conference on Neural Information Processing Systems; 1993 Nov; Denver, CO, USA: Morgan Kaufmann Publishers; 1994. p. 737–44. link1

[27] Simonyan K, Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Proceedings of the 27th International Conference on Neural Information Processing Systems—Volume 1; 2014 Dec. Cambridge: MIT Press; 2014. p. 568–76. link1

[28] Simonyan K, Zisserman A. Very deep convolutional networks for largescale image recognition. In: Proceeding of International Conference on Learning Representations 2015; 2015 May 7–9; San Diego, CA, USA. New York: WikiCFP; 2015. link1

[29] Zhang X, Sugano Y, Fritz M, Bulling A. Appearance-based gaze estimation in the wild. In: Proceeding of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2015 Jun 7–12; Boston, MA, USA. New York: IEEE; 2015. p. 4511–20. link1

[30] Lian D, Hu L, Luo W, Xu Y, Duan L, Yu J, et al. Multiview multitask gaze estimation with deep convolutional neural networks. IEEE Trans Neural Netw Learn Syst 2019;30(10):3010–23. link1

[31] Park S, Spurr A, Hilliges O. Deep pictorial gaze estimation. In: Proceedings of the European Conference on Computer Vision (ECCV); 2018 Sep 9–14; Munich, Germany. New York: Springer; 2018. p. 741–57. link1

[32] Liu J, Francis BSL, Rajan D. Free-head appearance-based eye gaze estimation on mobile devices. In: Proceeding of 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC); 2019 Feb 11–13; Okinawa, Japan. New York: IEEE; 2019. p. 232–7. link1

[33] Wong ET, Yean S, Hu Q, Lee BS, Liu J, Deepu R. Gaze estimation using residual neural network. In: Proceeding of 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops); 2019 Mar 11–15; Kyoto, Japan. New York: IEEE; 2019. p. 411–4. link1

[34] Dubey N, Ghosh S, Dhall A. Unsupervised learning of eye gaze representation from the web. In: Proceeding of 2019 International Joint Conference on Neural Networks (IJCNN); 2019 Jul 14–19; Budapest, Hungary. New York: IEEE; 2019. arXiv:1904.02459v1. link1

[35] Funes-Mora KA, Odobez JM. Gaze estimation in the 3D space using RGB-D sensors. Int J Comput Vis 2016;118(2):194–216. link1

[36] Funes Mora KA, Monay F, Odobez JM. Eyediap: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In: Proceeding of the Symposium on Eye Tracking Research and Applications; 2014 Mar 26–28; Florida, UF, USA. New York: Association for Computing Machinery; 2014. p. 255–8. link1

[37] Sugano Y, Fritz M, Andreas Bulling X, et al. It’s written all over your face: fullface appearance-based gaze estimation. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); 2017 Jul 21–26; Honolulu, HI, USA. New York: IEEE; 2017. p. 51–60. link1

[38] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Proceeding of the 25th International Conference on Neural Information Processing Systems—Volume 1; 2012 Dec 14–16; Siem Reap, Cambodia. LaneRed Hook: Curran Associates Inc; 2012. p. 1097–105. link1

[39] Ogusu R, Yamanaka T. Lpm: learnable pooling module for efficient full-face gaze estimation. In: Proceeding of 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019); 2019 May 14–18; Lille, France. New York: IEEE; 2019. p. 1–5. link1

[40] Sugano Y, Matsushita Y, Sato Y. Learning-by-synthesis for appearance-based 3D gaze estimation. In: Proceeding of 2014 IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. New York: IEEE; 2014. p. 1821–8. link1

[41] Zhang X, Sugano Y, Bulling A. Revisiting data normalization for appearancebased gaze estimation. In: Proceeding of the 2018 ACM Symposium on Eye Tracking Research & Applications. 2018 Jun 14–17; Warsaw, Poland. New York: Association for Computing Machinery; 2018. p. 1–9. link1

[42] Sugano Y, Matsushita Y, Sato Y, Koike H. An incremental learning method for unconstrained gaze estimation. In: Proceeding of European Conference on Computer Vision; 2008 Oct 12–18; Marseille, France. Berlin: Springer; 2008. p. 656–67. link1

[43] Zhang X, HuangMX, Sugano Y, Bulling A. Training person-specific gaze estimators from user interactions with multiple devices. In: Proceeding of the 2018 CHI Conference on Human Factors in Computing Systems; 2018 Apr 21–26;Montréal, QC, Canada. New York: Association for Computing Machinery; 2018. p. 624. link1

[44] Yu Y, Liu G, Odobez JM. Improving few-shot user-specific gaze adaptation via gaze redirection synthesis. In: Proceeding of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019 Jun 15–20; Long Beach, CA, USA. New York: IEEE; 2019. p. 2019. link1

[45] Veges M, Varga V, Lorincz A, András L. 3D human pose estimation with } Siamese equivariant embedding. Neurocomputing 2019;339:194–201. link1

[46] Doumanoglou A, Balntas V, Kouskouridas R, Kim TK. Siamese regression networks with efficient mid-level feature extraction for 3D object pose estimation. In: Proceeding of 29th Conference on Neural Information Processing Systems (NIPS 2016); 2016 Dec 5–10; Barcelona, Spain; 2016.

[47] Simo-Serra E, Trulls E, Ferraz L, Kokkinos I, Fua P, Moreno-Noguer F. Discriminative learning of deep convolutional feature point descriptors. In: Proceeding of 2015 IEEE International Conference on Computer Vision (ICCV); 2015 Dec 7–13; Santiago, Chile. New York: IEEE; 2015. p. 118–26. link1

[48] Wang J, Song Y, Leung T, Rosenberg C, Wang J, Philbin J, et al. Learning negrained image similarity with deep ranking. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition; 2014 Jun 23–28; Columbus, OH, USA. Washington, DC: IEEE Computer Society; 2014. p. 1386–93. link1

[49] Baltrusaitis T, Robinson P, Morency LP. Continuous conditional neural fields for structured regression. In: Proceeding of European conference on computer vision; 2014 Sep 6–12; Zurich, Switzerland. Berlin: Springer; 2014. p. 593–608. link1

[50] Lepetit V, Moreno-Noguer F, Fua P. EPnP: an accurate o(n) solution to the PnP problem. Int J Comput Vis 2009;81(2):155–66. link1

[51] Schneider T, Schauerte B, Stiefelhagen R. Manifold alignment for person independent appearance-based gaze estimation. In: Proceeding of 2014 22nd International Conference on Pattern Recognition; 2014 Aug 24–28; Stockholm, Sweden. New York: IEEE; 2014. p. 1167–72. link1

[52] Mora KAF, Odobez JM. Person independent 3D gaze estimation from remote RGB-D cameras. In: Proceeding of 2013 IEEE International Conference on Image Processing. 2013 Sep 15–18; Melbourne, Australia. New York: IEEE; 2013. p. 2787–91. link1

[53] LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE 1998;86(11):2278–324. link1

Related Research