Journal Home Online First Current Issue Archive For Authors Journal Information 中文版

Engineering >> 2020, Volume 6, Issue 3 doi: 10.1016/j.eng.2020.01.011

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Humanlike Common Sense

a Center for Vision, Cognition, Learning, and Autonomy, University of California, Los Angeles, CA 90095, USA

b Center for Brains, Minds, and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

Received: 2019-09-16 Revised: 2019-12-11 Accepted: 2020-01-03 Available online: 2020-02-22

Next Previous

Abstract

Recent progress in deep learning is essentially based on a “big data for small tasks” paradigm, under which massive amounts of data are used to train a classifier for a single narrow task. In this paper, we call for a shift that flips this paradigm upside down. Specifically, we propose a “small data for big tasks” paradigm, wherein a single artificial intelligence (AI) system is challenged to develop “common sense,” enabling it to solve a wide range of tasks with little training data. We illustrate the potential power of this new paradigm by reviewing models of common sense that synthesize recent breakthroughs in both machine and human vision. We identify functionality, physics, intent, causality, and utility (FPICU) as the five core domains of cognitive AI with humanlike common sense. When taken as a unified concept, FPICU is concerned with the questions of “why” and “how,” beyond the dominant “what” and “where” framework for understanding vision. They are invisible in terms of pixels but nevertheless drive the creation, maintenance, and development of visual scenes. We therefore coin them the “dark matter” of vision. Just as our universe cannot be understood by merely studying observable matter, we argue that vision cannot be understood without studying FPICU. We demonstrate the power of this perspective to develop cognitive AI systems with humanlike common sense by showing how to observe and apply FPICU with little training data to solve a wide range of challenging tasks, including tool use, planning, utility inference, and social learning. In summary, we argue that the next generation of AI must embrace “dark” humanlike common sense for solving novel tasks.

Figures

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 13

Fig. 14

Fig. 15

Fig. 16

Fig. 17

Fig. 18

Fig. 19

Fig. 20

Fig. 21

Fig. 22

Fig. 23

Fig. 24

Fig. 25

Fig. 26

Fig. 27

Fig. 28

Fig. 29

Fig. 30

Fig. 31

Fig. 32

Fig. 33

Fig. 34

Fig. 35

Fig. 36

Fig. 37

Fig. 38

References

[ 1 ] Marr D. Vision: a computational investigation into the human representation and processing of visual information. San Francisco: W.H. Freeman and Company; 1982. link1

[ 2 ] Mishkin M, Ungerleider LG, Macko KA. Object vision and spatial vision: two cortical pathways. Trends Neurosci 1983;6:414–7. link1

[ 3 ] Ikeuchi K, Hebert M. Task-oriented vision. In: Landy MS, Maloney LT, Pavel M, editors. Exploratory vision. New York: Springer; 1996. p. 257–77. link1

[ 4 ] Land M, Mennie N, Rusted J. The roles of vision and eye movements in the control of activities of daily living. Perception 1999;28(11):1311–28. link1

[ 5 ] Fang F, He S. Cortical responses to invisible objects in the human dorsal and ventral pathways. Nat Neurosci 2005;8(10):1380–5. link1

[ 6 ] Creem-Regehr SH, Lee JN. Neural representations of graspable objects: are tools special? Brain Res Cogn Brain Res 2005;22(3):457–69. link1

[ 7 ] Potter MC. Meaning in visual search. Science 1975;187(4180):965–6. link1

[ 8 ] Potter MC. Short-term conceptual memory for pictures. J Exp Psychol Hum Learn 1976;2(5):509–22. link1

[ 9 ] Schyns PG, Oliva A. From blobs to boundary edges: evidence for time- and spatial-scale-dependent scene recognition. Psychol Sci 1994;5(4):195–200. link1

[10] Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. Nature 1996;381(6582):520–2. link1

[11] Greene MR, Oliva A. The briefest of glances: the time course of natural scene understanding. Psychol Sci 2009;20(4):464–72. link1

[12] Greene MR, Oliva A. Recognition of natural scenes from global properties: seeing the forest without representing the trees. Cognit Psychol 2009;58 (2):137–76. link1

[13] Li FF, Iyer A, Koch C, Perona P. What do we perceive in a glance of a real-world scene? J Vis 2007;7(1):10. link1

[14] Rousselet G, Joubert O, Fabre-Thorpe M. How long to get to the ‘‘gist” of realworld natural scenes? Vis Cognit 2005;12(6):852–77. link1

[15] Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 2001;42(3):145–75. link1

[16] Delorme A, Richard G, Fabre-Thorpe M. Ultra-rapid categorisation of natural scenes does not rely on colour cues: a study in monkeys and humans. Vision Res 2000;40(16):2187–200. link1

[17] Serre T, Oliva A, Poggio T. A feedforward architecture accounts for rapid categorization. Proc Natl Acad Sci USA 2007;104(15):6424–9. link1

[18] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 2012 Neural Information Processing Systems; 2012 Dec 3–6; Lake Tahoe, NV, USA; 2012.

[19] Kavukcuoglu K, Sermanet P, Boureau YL, Gregor K, Mathieu M, Cun YL. Learning convolutional feature hierarchies for visual recognition. In: Proceedings of the 2010 Neural Information Processing Systems; 2010 Dec 6–11; Vancouver, BC, Canada; 2010.

[20] Deng J, Dong W, Socher R, Li LJ, Li K, Li FF. ImageNet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009 Jun 20–25; Miami, FL, USA; 2009.

[21] Rajalingham R, Issa EB, Bashivan P, Kar K, Schmidt K, DiCarlo JJ. Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. J Neurosci 2018;38(33):7255–69. link1

[22] Oliva A, Schyns PG. Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognit Psychol 1997;34(1):72–107. link1

[23] Schyns PG. Diagnostic recognition: task constraints, object information, and their interactions. Cognition 1998;67(1–2):147–79. link1

[24] Malcolm GL, Nuthmann A, Schyns PG. Beyond gist: strategic and incremental information accumulation for scene categorization. Psychol Sci 2014;25 (5):1087–97. link1

[25] Qi S, Huang S, Wei P, Zhu SC. Predicting human activities using stochastic grammar. In: Proceedings of the 2017 IEEE International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy; 2017. p. 1164–72.

[26] Pei M, Jia Y, Zhu SC. Parsing video events with goal inference and intent prediction. In: Proceedings of the 2011 IEEE International Conference on Computer Vision; 2011 Nov 6–13; Barcelona, Spain; 2011.

[27] Gosselin F, Schyns PG. Bubbles: a technique to reveal the use of information in recognition tasks. Vision Res 2001;41(17):2261–71. link1

[28] Ikeuchi K, Hebert M. Task oriented vision. In: Proceedings of the 1992 IEEE/ RSJ International Conference on Intelligent Robots and Systems; 1992 Jul 7– 10; Raleigh, NC, USA; 1992. p. 2187–94.

[29] Hartley R, Zisserman A. Multiple view geometry in computer vision. 2nd ed. Cambridge: Cambridge University Press; 2003. link1

[30] Ma Y, Soatto S, Kosecka J, Sastry SS. An invitation to 3-D vision: from images to geometric models. New York: Springer Science & Business Media; 2012. link1

[31] Gupta A, Hebert M, Kanade T, Blei DM. Estimating spatial layout of rooms using volumetric reasoning about objects and surfaces. In: Proceedings of the 2010 Neural Information Processing Systems; 2010 Dec 6–11; Vancouver, BC, Canada; 2010.

[32] Schwing AG, Fidler S, Pollefeys M, Urtasun R. Box in the box: joint 3D layout and object reasoning from single images. In: In: Proceedings of the 2013 IEEE International Conference on Computer Vision; 2013 Dec 1–8; Sydney, Australia. p. 353–60. link1

[33] Choi W, Chao YW, Pantofaru C, Savarese S. Understanding indoor scenes using 3D geometric phrases. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; 2013 Jun 25–27; Portland, OR, USA; 2013. p. 33–40.

[34] Zhao Y, Zhu SC. Scene parsing by integrating function, geometry and appearance models. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; 2013 Jun 25–27; Portland, OR, USA; 2013. p. 3119–26.

[35] Liu X, Zhao Y, Zhu SC. Single-view 3D scene reconstruction and parsing by attribute grammar. IEEE Trans Pattern Anal Mach Intell 2018;40(3):710–25. link1

[36] Huang S, Qi S, Zhu Y, Xiao Y, Xu Y, Zhu SC. Holistic 3D scene parsing and reconstruction from a single RGB image. In: Proceedings of the 2018 European Conference on Computer Vision; 2018 Sep 8–14; Munich, Germany; 2018.

[37] Chen Y, Huang S, Yuan T, Qi S, Zhu Y, Zhu SC. Holistic++ scene understanding: single-view 3D holistic scene parsing and human pose estimation with human–object interaction and physical commonsense. In: Proceedings of the 2019 IEEE International Conference on Computer Vision; 2019 Oct 27–Nov 2; Seoul, Korea. p. 8648–57. link1

[38] Huang S, Chen Y, Yuan T, Qi S, Zhu Y, Zhu SC. PerspectiveNet: 3D object detection from a single RGB image via perspective points. In: Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, editors. Advances in neural information processing systems 32: proceedings of Neural Information Processing Systems 2019; 2019 Dec 8 14; Vancouver, BC, Canada; 2019. p. 8903 15.

[39] Tolman EC. Cognitive maps in rats and men. Psychol Rev 1948;55 (4):189–208. link1

[40] Wang RF, Spelke ES. Comparative approaches to human navigation. In: Jeffery KJ, editor. The neurobiology of spatial behaviour. Oxford: Oxford University Press; 2003. p. 119–43. link1

[41] Koenderink JJ, van Doorn AJ, Kappers AM, Lappin JS. Large-scale visual frontoparallels under full-cue conditions. Perception 2002;31(12):1467–75. link1

[42] Warren WH, Rothman DB, Schnapp BH, Ericson JD. Wormholes in virtual space: from cognitive maps to cognitive graphs. Cognition 2017;166:152–63. link1

[43] Gillner S, Mallot HA. Navigation and acquisition of spatial knowledge in a virtual maze. J Cogn Neurosci 1998;10(4):445–63. link1

[44] Foo P, Warren WH, Duchon A, Tarr MJ. Do humans integrate routes into a cognitive map? Map-versus landmark-based navigation of novel shortcuts. J Exp Psychol Learn Mem Cogn 2005;31(2):195–215. link1

[45] Chrastil ER, Warren WH. From cognitive maps to cognitive graphs. PLoS ONE 2014;9(11):e112544. link1

[46] Byrne RW. Memory for urban geography. Q J Exp Psychol 1979;31(1):147–54. link1

[47] Tversky B. Distortions in cognitive maps. Geoforum 1992;23(2):131–8. link1

[48] Ogle KN. Researches in binocular vision. Philadelphia: WB Saunders; 1950. link1

[49] Foley JM. Binocular distance perception. Psychol Rev 1980;87(5):411–34. link1

[50] Luneburg RK. Mathematical analysis of binocular vision. Princeton: Princeton University Press; 1947. link1

[51] Indow T. A critical review of Luneburg’s model with regard to global structure of visual space. Psychol Rev 1991;98(3):430–53. link1

[52] Gogel WC. A theory of phenomenal geometry and its applications. Percept Psychophys 1990;48(2):105–23. link1

[53] Glennerster A, Tcheang L, Gilson SJ, Fitzgibbon AW, Parker AJ. Humans ignore motion and stereo cues in favor of a fictional stable world. Curr Biol 2006;16 (4):428–32. link1

[54] Hafting T, Fyhn M, Molden S, Moser MB, Moser EI. Microstructure of a spatial map in the entorhinal cortex. Nature 2005;436(7052):801–6. link1

[55] Killian NJ, Jutras MJ, Buffalo EA. A map of visual space in the primate entorhinal cortex. Nature 2012;491(7426):761–4. link1

[56] O’Keefe J, Nadel L. The hippocampus as a cognitive map. Oxford: Clarendon Press; 1978. link1

[57] Jacobs J, Weidemann CT, Miller JF, Solway A, Burke JF, Wei XX, et al. Direct recordings of grid-like neuronal activity in human spatial navigation. Nat Neurosci 2013;16(9):1188–90. link1

[58] Fyhn M, Hafting T, Witter MP, Moser EI, Moser MB. Grid cells in mice. Hippocampus 2008;18(12):1230–8. link1

[59] Doeller CF, Barry C, Burgess N. Evidence for grid cells in a human memory network. Nature 2010;463(7281):657–61. link1

[60] Yartsev MM, Witter MP, Ulanovsky N. Grid cells without theta oscillations in the entorhinal cortex of bats. Nature 2011;479(7371):103–7. link1

[61] Gao R, Xie J, Zhu SC, Wu Y. Learning grid cells as vector representation of selfposition coupled with matrix representation of self-motion. In: Proceedings of the 2019 International Conference on Learning Representations; 2019 May 6–9; New Orleans, LA, USA; 2019.

[62] Xie J, Gao R, Nijkamp E, Zhu S, Wu YN. Representation learning: a statistical perspective. Annu Rev Stat Appl 2020:7. link1

[63] Gootjes-Dreesbach L, Pickup LC, Fitzgibbon AW, Glennerster A. Comparison of view-based and reconstruction-based models of human navigational strategy. J Vis 2017;17(9):11. link1

[64] Vuong J, Fitzgibbon AW, Glennerster A. Human pointing errors suggest a flattened, task-dependent representation of space. bioRxiv 2018:390088. link1

[65] Choi H, Scholl BJ. Perceiving causality after the fact: postdiction in the temporal dynamics of causal perception. Perception 2006;35(3):385–99. link1

[66] Scholl BJ, Nakayama K. Illusory causal crescents: misperceived spatial relations due to perceived causality. Perception 2004;33(4):455–69. link1

[67] Scholl BJ, Gao T. Perceiving animacy and intentionality: visual processing or higher-level judgment. In: Rutherford MD, Kuhlmeier VA, editors. Social perception: detection and interpretation of animacy, agency, and intention. Cambridge: The MIT Press; 2013. p. 197–229. link1

[68] Scholl BJ. Objects and attention: the state of the art. Cognition 2001;80(1– 2):1–46. link1

[69] Vul E, Alvarez G, Tenenbaum JB, Black MJ. Explaining human multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. In: Proceedings of the 2009 Neural Information Processing Systems; 2009 Dec 7–10; Vancouver, BC, Canada; 2009.

[70] Battaglia PW, Hamrick JB, Tenenbaum JB. Simulation as an engine of physical scene understanding. Proc Natl Acad Sci USA 2013;110(45):18327–32. link1

[71] Hamrick J, Battaglia P, Tenenbaum JB. Internal physics models guide probabilistic judgments about object dynamics. In: Proceedings of the 2011 Annual Meeting of the Cognitive Science Society; 2011 Jul 20–23; Boston, MA, USA; 2011.

[72] Xie D, Shu T, Todorovic S, Zhu SC. Learning and inferring ‘‘dark matter” and predicting human intents and trajectories in videos. IEEE Trans Pattern Anal Mach Intell 2018;40(7):1639–52. link1

[73] Ullman T, Stuhlmüller A, Goodman N, Tenenbaum JB. Learning physics from dynamical scenes. In: Proceedings of the 2014 Annual Meeting of the Cognitive Science Society; 2014 Jul 23–26; Quebec City, QC, Canada; 2014.

[74] Gerstenberg T, Tenenbaum JB. Intuitive theories. In: Waldmann MR, editor. Oxford handbook of causal reasoning. New York: Oxford University Press; 2017. p. 515–48. link1

[75] Newton I, Colson J. The method of fluxions and infinite series; with its application to the geometry of curve-lines. London: Henry Woodfall; 1736. link1

[76] Maclaurin C. A treatise of fluxions: in two books. München: Ruddimans; 1742. link1

[77] Mueller ET. Commonsense reasoning: an event calculus based approach. 2nd ed. Amsterdam: Morgan Kaufmann; 2014. link1

[78] Mueller ET. Daydreaming in humans and machines: a computer model of the stream of thought. Norwood: Ablex Publishing Corporation; 1990. link1

[79] Michotte A. The perception of causality. 2nd ed. London: Methuen & Co; 1963. link1

[80] Carey S. The origin of concepts. New York: Oxford University Press; 2009. link1

[81] Farhadi A, Endres I, Hoiem D, Forsyth D. Describing objects by their attributes. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition; 2009 Jun 20–25; Miami, FL, USA; 2009. p. 1778–85.

[82] Parikh D, Grauman K. Relative attributes. In: Proceedings of the 2011 International Conference on Computer Vision; 2011 Nov 6–13; Barcelona, Spain; 2011. p. 503–10.

[83] Laptev I, Marszałek M, Schmid C, Rozenfeld B. Learning realistic human actions from movies. In: Proceedings of the 2008 Conference on Computer Vision and Pattern Recognition; 2008 Jun 24–26; Anchorage, AK, USA; 2008.

[84] Yao B, Zhu SC. Learning deformable action templates from cluttered videos. Proceedings of the 2009 International Conference on Computer Vision; 2009 Sep 29–Oct 2; Kyoto, Japan, 2009. link1

[85] Yao BZ, Nie BX, Liu Z, Zhu SC. Animated pose templates for modeling and detecting human actions. IEEE Trans Pattern Anal Mach Intell 2013;36 (3):436–52. link1

[86] Wang J, Liu Z, Wu Y, Yuan J. Mining actionlet ensemble for action recognition with depth cameras. Proceedings of the 2012 Conference on Computer Vision and Pattern Recognition; 2012 Jun 16–21; Providence, RI, USA, 2012. link1

[87] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proceedings of the 2005 Conference on Computer Vision and Pattern Recognition; 2005 Jun 20–26; San Diego, CA, USA; 2005.

[88] Sadanand S, Corso JJ. Action bank: a high-level representation of activity in video. Proceedings of the 2012 Conference on Computer Vision and Pattern Recognition; 2012 Jun 16–21; Providence, RI, USA, 2012. link1

[89] Fleming RW, Barnett-Cowan M, Bülthoff HH. Perceived object stability is affected by the internal representation of gravity. Perception 2010;39:109. link1

[90] Zago M, Lacquaniti F. Visual perception and interception of falling objects: a review of evidence for an internal model of gravity. J Neural Eng 2005;2(3): S198–208. link1

[91] Kellman PJ, Spelke ES. Perception of partly occluded objects in infancy. Cognit Psychol 1983;15(4):483–524. link1

[92] Baillargeon R, Spelke ES, Wasserman S. Object permanence in five-month-old infants. Cognition 1985;20(3):191–208. link1

[93] Johnson SP, Aslin RN. Perception of object unity in 2-month-old infants. Dev Psychol 1995;31(5):739–45. link1

[94] Needham A. Factors affecting infants’ use of featural information in object segregation. Curr Dir Psychol Sci 1997;6(2):26–33. link1

[95] Baillargeon R. Infants’ physical world. Curr Dir Psychol Sci 2004;13(3):89–94. link1

[96] Zheng B, Zhao Y, Yu JC, Ikeuchi K, Zhu SC. Detecting potential falling objects by inferring human action and natural disturbance. In: Proceedings of the 2014 International Conference on Robotics and Automation; 2014 May 31– Jun 7; Hong Kong, China; 2014.

[97] Zheng B, Zhao Y, Yu JC, Ikeuchi K, Zhu SC. Beyond point clouds: scene understanding by reasoning geometry and physics. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; 2013 Jun 23–28; Portland, OR, USA; 2013. p. 3127–34.

[98] Zheng B, Zhao Y, Yu JC, Ikeuchi K, Zhu SC. Scene understanding by reasoning stability and safety. Int J Comput Vis 2015;112(2):221–38. link1

[99] Qi S, Zhu Y, Huang S, Jiang C, Zhu SC. Human-centric indoor scene synthesis using stochastic grammar. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–22; Salt Lake City, UT, USA; 2018.

[100] Huang S, Qi S, Xiao Y, Zhu Y, Wu YN, Zhu SC. Cooperative holistic scene understanding: unifying 3D object, layout, and camera pose estimation. In: Proceedings of the 2018 Neural Information Processing Systems; 2018 Dec 3– 8; Montreal, QC, Canada; 2018.

[101] Gupta A, Satkin S, Efros AA, Hebert M. From 3D scene geometry to human workspace. In: Proceedings of the 2011 Conference on Computer Vision and Pattern Recognition; 2011 Jun 20–25; Providence, RI, USA; 2011.

[102] Iacoboni M, Molnar-Szakacs I, Gallese V, Buccino G, Mazziotta JC, Rizzolatti G. Grasping the intentions of others with one’s own mirror neuron system. PLoS Biol 2005;3(3):e79. link1

[103] Csibra G, Gergely G. ‘Obsessed with goals’: functions and mechanisms of teleological interpretation of actions in humans. Acta Psychol 2007;124 (1):60–78. link1

[104] Baker CL, Tenenbaum JB, Saxe RR. Goal inference as inverse planning. In: Proceedings of the 2007 Annual Meeting of the Cognitive Science Society; 2007 Aug 1–4; Austin, TX, USA; 2007.

[105] Baker CL, Goodman ND, Tenenbaum JB. Theory-based social goal inference. In: Proceedings of the 2008 Annual Meeting of the Cognitive Science Society; 2008 Jul 23–27; Washington, DC, USA; 2008. p. 1447–52.

[106] Hoai M, De la Torre F. Max-margin early event detectors. Int J Comput Vis 2014;107(2):191–202. link1

[107] Turek MW, Hoogs A, Collins R. Unsupervised learning of functional categories in video scenes. In: Proceedings of the 2010 European Conference on Computer Vision; 2010 Sep 5–11; Heraklion, Greece. p. 664–77. link1

[108] Grabner H, Gall J, van Gool L. What makes a chair a chair? In: Proceedings of the 2011 Conference on Computer Vision and Pattern Recognition; 2011 Jun 20–25; Providence, RI, USA; 2011. p. 1529–36.

[109] Jia Z, Gallagher A, Saxena A, Chen T. 3D-based reasoning with blocks, support, and stability. In: Proceedings of the 2013 Conference on Computer Vision and Pattern Recognition; 2013 Jun 23–28; Portland, OR, USA; 2013. p. 1–8.

[110] Jiang Y, Koppula H, Saxena A. Hallucinated humans as the hidden context for labeling 3D scenes. In: Proceedings of the 2013 Conference on Computer Vision and Pattern Recognition; 2013 Jun 23–28; Portland, OR, USA; 2013. p. 2993–3000.

[111] Shu T, Thurman SM, Chen D, Zhu SC, Lu H. Critical features of joint actions that signal human interaction. In: Proceedings of the 2016 Annual Meeting of the Cognitive Science Society; 2016 Aug 10–13; Philadelphia, PA, USA; 2016.

[112] Shu T, Peng Y, Fan L, Lu H, Zhu SC. Perception of human interaction based on motion trajectories: from aerial videos to decontextualized animations. Top Cogn Sci 2018;10(1):225–41. link1

[113] Shu T, Peng Y, Lu H, Zhu SC. Partitioning the perception of physical and social events within a unified psychological space. In: Proceedings of the 2019 Annual Meeting of the Cognitive Science Society; 2019 Jul 24–27; Montreal, QC, Canada; 2019.

[114] Baker C, Saxe R, Tenenbaum J. Bayesian theory of mind: modeling joint beliefdesire attribution. In: Proceedings of the 2011 Annual Meeting of the Cognitive Science Society; 2011 Jul 20–23; Boston, MA, USA; 2011.

[115] Zhao Y, Holtzen S, Gao T, Zhu SC. Represent and infer human theory of mind for human–robot interaction. Proceedings of the 2015 AAAI Fall Symposium Series; 2015 Nov 12–14; Arlington, VA, USA, 2015. link1

[116] Nisan N, Ronen A. Algorithmic mechanism design. Games Econ Behav 2001;35(1–2):166–96. link1

[117] Bentham J. An introduction to the principles of morals. London: Athlone; 1935. link1

[118] Nishant S. Utility learning, non-Markovian planning, and task-oriented programming language [dissertation]. Los Angeles: University of California; 2019. link1

[119] Robb AA. Optical geometry of motion: a new view of the theory of relativity. W Heffer 1911. link1

[120] Malament DB. The class of continuous timelike curves determines the topology of spacetime. J Math Phys 1977;18(7):1399–404. link1

[121] Robb AA. Geometry of time and space. New York: Cambridge University Press; 2014. link1

[122] Corrigan R, Denton P. Causal understanding as a developmental primitive. Dev Rev 1996;16(2):162–202. link1

[123] White PA. Causal processing: origins and development. Psychol Bull 1988;104(1):36–52. link1

[124] Chen YC, Scholl BJ. The perception of history: seeing causal history in static shapes induces illusory motion perception. Psychol Sci 2016;27(6):923–30. link1

[125] Holyoak KJ, Cheng PW. Causal learning and inference as a rational process: the new synthesis. Annu Rev Psychol 2011;62(1):135–63. link1

[126] Shanks DR, Dickinson A. Associative accounts of causality judgment. Psychol Learn Motiv 1988;21:229–61. link1

[127] Rescorla RA, Wagner AR. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Black AH, Prokasy WF, editors. Classical conditioning II: current research and theory. New York: Appleton-Century-Crofts; 1972. p. 64–99. link1

[128] Lu H, Yuille AL, Liljeholm M, Cheng PW, Holyoak KJ. Bayesian generic priors for causal learning. Psychol Rev 2008;115(4):955–84. link1

[129] Edmonds M, Qi S, Zhu Y, Kubricht J, Zhu SC, Lu H. Decomposing human causal learning: bottom-up associative learning and top-down schema reasoning. In: Proceedings of the 2019 Annual Meeting of the Cognitive Science Society; 2019 Jul 24–27; Montreal, QC, Canada; 2019.

[130] Waldmann MR, Holyoak KJ. Predictive and diagnostic learning within causal models: asymmetries in cue competition. J Exp Psychol Gen 1992;121 (2):222–36. link1

[131] Edmonds M, Kubricht J, Summers C, Zhu Y, Rothrock B, Zhu SC, et al. Human causal transfer: challenges for deep reinforcement learning. In: Proceedings of the 2018 Annual Meeting of the Cognitive Science Society; 2018 Jul 25–28; Madison, CT, USA; 2018.

[132] Cheng PW. From covariation to causation: a causal power theory. Psychol Rev 1997;104(2):367–405. link1

[133] Scholl BJ, Tremoulet PD. Perceptual causality and animacy. Trends Cogn Sci 2000;4(8):299–309. link1

[134] Rolfs M, Dambacher M, Cavanagh P. Visual adaptation of the perception of causality. Curr Biol 2013;23(3):250–4. link1

[135] McCollough C. Color adaptation of edge-detectors in the human visual system. Science 1965;149(3688):1115–6. link1

[136] Kominsky JF, Scholl BJ. Retinotopically specific visual adaptation reveals the structure of causal events in perception. In: Proceedings of the 2018 Annual Meeting of the Cognitive Science Society; 2018 Jul 25–28; Madison, CT, USA; 2018.

[137] Gerstenberg T, Peterson MF, Goodman ND, Lagnado DA, Tenenbaum JB. Eyetracking causality. Psychol Sci 2017;28(12):1731–44. link1

[138] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nature 2015;518 (7540):529–33. link1

[139] Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In: Proceedings of the 2015 International Conference on Machine Learning; 2015 Jul 6–11; Lille, France; 2015.

[140] Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016;529(7587):484–9. link1

[141] Levine S, Finn C, Darrell T, Abbeel P. End-to-end training of deep visuomotor policies. J Mach Learn Res 2016;17(1):1334–73. link1

[142] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. 2017. arXiv:1707.06347.

[143] Zhang C, Vinyals O, Munos R, Bengio S. A study on overfitting in deep reinforcement learning. 2018. arXiv:1804.06893.

[144] Kansky K, Silver T, Mély DA, Eldawy M, Lázaro-Gredilla M, Lou X, et al. Schema networks: zero-shot transfer with a generative causal model of intuitive physics. 2017. arXiv:1706.04317.

[145] Edmonds M, Ma X, Qi S, Zhu Y, Lu H, Zhu SC. Theory-based causal transfer: integrating instance-level induction and abstract-level structure learning. 2019. arXiv:1911.11185.

[146] Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol 1974;66(5):688–701. link1

[147] Imbens GW, Rubin DB. Causal inference for statistics, social, and biomedical sciences. New York: Cambridge University Press; 2015. link1

[148] Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983;70(1):41–55. link1

[149] Pearl J. Causality: models, reasoning and inference. New York: Cambridge University Press; 2000. link1

[150] Spirtes P, Glymour C, Scheines R, Heckerman D, Meek C, Cooper GF, et al. Causation, prediction, and search. 2nd ed. Cambridge: MIT Press; 2000. link1

[151] Chickering DW. Optimal structure identification with greedy search. J Mach Learn Res 2002;3:507–54. link1

[152] Peters J, Mooij JM, Janzing D, Schölkopf B. Causal discovery with continuous additive noise models. J Mach Learn Res 2014;15(1):2009–53. link1

[153] He YB, Geng Z. Active learning of causal networks with intervention experiments and optimal designs. J Mach Learn Res 2008;9(11):2523–47. link1

[154] Bramley NR, Dayan P, Griffiths TL, Lagnado DA. Formalizing Neurath’s ship: approximate algorithms for online causal learning. Psychol Rev 2017;124 (3):301–38. link1

[155] Fisher RA. The design of experiments. London: Oliver and Boyd; 1935. link1

[156] Fire A, Zhu SC. Learning perceptual causality from video. ACM Trans Intell Syst Technol 2016;7(2):23. link1

[157] Fire A, Zhu SC. Using causal induction in humans to learn and infer causality from video. In: Proceedings of the 2013 Annual Meeting of the Cognitive Science Society; 2013 Jul 31–Aug 3; Berlin, Germany; 2013.

[158] Zhu SC, Wu YN, Mumford D. Minimax entropy principle and its application to texture modeling. Neural Comput 1997;9(8):1627–60. link1

[159] Xu Y, Qin L, Liu X, Xie J, Zhu SC. A causal and–or graph model for visibility fluent reasoning in tracking interacting objects. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–22; Salt Lake City, UT, USA; 2018. p. 2178–87.

[160] Xiong C, Shukla N, Xiong W, Zhu SC. Robot learning with a spatial, temporal, and causal and–or graph. In: Proceedings of the 2016 IEEE International Conference on Robotics and Automation; 2016 May 16–21; Stockholm, Sweden; 2016.

[161] McCloskey M, Washburn A, Felch L. Intuitive physics: the straight-down belief and its origin. J Exp Psychol Learn Mem Cogn 1983;9(4):636–49. link1

[162] McCloskey M, Caramazza A, Green B. Curvilinear motion in the absence of external forces: naive beliefs about the motion of objects. Science 1980;210 (4474):1139–41. link1

[163] DiSessa AA. Unlearning Aristotelian physics: a study of knowledge-based learning. Cogn Sci 1982;6(1):37–75. link1

[164] Kaiser MK, Jonides J, Alexander J. Intuitive reasoning about abstract and familiar physics problems. Mem Cognit 1986;14(4):308–12. link1

[165] Smith KA, Battaglia P, Vul E. Consistent physics underlying ballistic motion prediction. In: Proceedings of the 2013 Annual Meeting of the Cognitive Science Society; 2013 Jul 31–Aug 3; Berlin, Germany; 2013.

[166] Kaiser MK, Proffitt DR, Whelan SM, Hecht H. Influence of animation on dynamical judgments. J Exp Psychol Hum Percept Perform 1992;18 (3):669–89. link1

[167] Kaiser MK, Proffitt DR, Anderson K. Judgments of natural and anomalous trajectories in the presence and absence of motion. J Exp Psychol Learn Mem Cogn 1985;11(4):795–803. link1

[168] Kim IK, Spelke ES. Perception and understanding of effects of gravity and inertia on object motion. Dev Sci 1999;2(3):339–62. link1

[169] Piaget J, Cook MT. The origins of intelligence in children. New York: International Universities Press; 1952. link1

[170] Piaget J, Cook MT. The construction of reality in the child. New York: Basic Books; 1954. link1

[171] Hespos SJ, Baillargeon R. Décalage in infants’ knowledge about occlusion and containment events: converging evidence from action tasks. Cognition 2006;99(2):B31–41. link1

[172] Hespos SJ, Baillargeon R. Young infants’ actions reveal their developing knowledge of support variables: converging evidence for violation-ofexpectation findings. Cognition 2008;107(1):304–16. link1

[173] Bower TGR. Development in infancy. New York: WH Freeman; 1974. link1

[174] Leslie AM, Keeble S. Do six-month-old infants perceive causality? Cognition 1987;25(3):265–88. link1

[175] Luo Y, Baillargeon R, Brueckner L, Munakata Y. Reasoning about a hidden object after a delay: evidence for robust representations in 5-month-old infants. Cognition 2003;88(3):B23–32. link1

[176] Baillargeon R, Li J, Ng W, Yuan S. An account of infants’ physical reasoning. In: Woodward A, Needham A, editors. Learning and the infant mind. New York: Oxford University Press; 2009. p. 66–116. link1

[177] Baillargeon R. The acquisition of physical knowledge in infancy: a summary in eight lessons. Blackwell Handb Child Cognit Dev 2002;1:46–83. link1

[178] Achinstein P. The nature of explanation. New York: Oxford University Press; 1983. link1

[179] Fischer J, Mikhael JG, Tenenbaum JB, Kanwisher N. Functional neuroanatomy of intuitive physical inference. Proc Natl Acad Sci USA 2016;113(34): E5072–81. link1

[180] Ullman TD, Spelke E, Battaglia P, Tenenbaum JB. Mind games: game engines as an architecture for intuitive physics. Trends Cogn Sci 2017;21 (9):649–65. link1

[181] Bates C, Yildirim I, Tenenbaum JB, Battaglia PW. Humans predict liquid dynamics using probabilistic simulation. In: Proceedings of the 2015 Annual Meeting of the Cognitive Science Society; 2015 Jul 23–25; Pasadena, CA, USA; 2015.

[182] Kubricht J, Jiang C, Zhu Y, Zhu SC, Terzopoulos D, Lu H. Probabilistic simulation predicts human performance on viscous fluid-pouring problem. In: Proceedings of the 2016 Annual Meeting of the Cognitive Science Society; 2016 Aug 10–13; Philadelphia, PA, USA; 2016.

[183] Kubricht J, Zhu Y, Jiang C, Terzopoulos D, Zhu SC, Lu H. Consistent probabilistic simulation underlying human judgment in substance dynamics. In: Proceedings of the 2017 Annual Meeting of the Cognitive Science Society; 2017 Jul 26–29; London, UK; 2017.

[184] Kubricht JR, Holyoak KJ, Lu H. Intuitive physics: current research and controversies. Trends Cogn Sci 2017;21(10):749–59. link1

[185] Mumford D, Desolneux A. Pattern theory: the stochastic analysis of realworld signals. Boca Raton: CRC Press; 2010. link1

[186] Mumford D. Pattern theory: a unifying perspective. In: Joseph A, Mignot F, Murat F, Prum B, Rentschler R, editors. First European congress of mathematics. Heidelberg: Springer; 1994. p. 187–224. link1

[187] Julesz B. Visual pattern discrimination. IRE Trans Inf Theory 1962;8(2):84–92. link1

[188] Zhu SC, Wu Y, Mumford D. Filters, random fields and maximum entropy (frame): towards a unified theory for texture modeling. Int J Comput Vis 1998;27(2):107–26. link1

[189] Julesz B. Textons, the elements of texture perception, and their interactions. Nature 1981;290(5802):91–7. link1

[190] Zhu SC, Guo CE, Wang Y, Xu Z. What are textons? Int J Comput Vis 2005;62 (1–2):121–43. link1

[191] Guo C, Zhu SC, Wu YN. Towards a mathematical theory of primal sketch and sketchability. In: Proceedings of the 9th IEEE International Conference on Computer Vision; 2003 Oct 13–16; Nice, France; 2003.

[192] Guo C, Zhu SC, Wu YN. Primal sketch: integrating structure and texture. Comput Vis Image Underst 2007;106(1):5–19. link1

[193] Nitzberg M, Mumford DB. The 2.1-D sketch. In: Proceedings of the 3rd International Conference on Computer Vision; 1990 Dec 4–7; Osaka, Japan; 1990.

[194] Wang JYA, Adelson EH. Layered representation for motion analysis. In: Proceedings of the 1993 IEEE Conference on Computer Vision and Pattern Recognition; 1993 Jun 15–17; New York, NY, USA; 1993.

[195] Wang JA, Adelson EH. Representing moving images with layers. IEEE Trans Image Process 1994;3(5):625–38. link1

[196] Marr D, Nishihara HK. Representation and recognition of the spatial organization of three-dimensional shapes. Proc R Soc Lond B Biol Sci 1978;200(1140):269–94. link1

[197] Binford I. Visual perception by computer. In: Proceedings of the 1971 IEEE Conference of Systems and Control; 1971 Dec 15–17; Miami Beach, FL, USA; 1971.

[198] Brooks RA. Symbolic reasoning among 3-D models and 2-D images. Artif Intell 1981;17(1–3):285–348. link1

[199] Kanade T. Recovery of the three-dimensional shape of an object from a single view. Artif Intell 1981;17(1–3):409–60. link1

[200] Broadbent D. A question of levels: comment on McClelland and Rumelhart. J Exp Psychol Gen 1985;114(2):189–92. link1

[201] Lowe D. Perceptual organization and visual recognition. Springer Science & Business Media; 1985. Boston. link1

[202] Pentland AP. Perceptual organization and the representation of natural form. In: Fischler MA, Firschein O, editors. Readings in computer vision. Amsterdam: Elsevier; 1987. p. 680–99. link1

[203] Wertheimer M. [Experimental studies on the seeing of motion]. Z Psychol Z Angew Psychol 1912;61(3):161–265. German.

[204] Wagemans J, Elder JH, Kubovy M, Palmer SE, Peterson MA, Singh M, et al. A century of Gestalt psychology in visual perception: I. perceptual grouping and figure–ground organization. Psychol Bull 2012;138(6):1172–217. link1

[205] Wagemans J, Feldman J, Gepshtein S, Kimchi R, Pomerantz JR, van der Helm PA, et al. A century of Gestalt psychology in visual perception: II. Conceptual and theoretical foundations. Psychol Bull 2012;138(6):1218–52. link1

[206] Köhler W. [The physical Gestalten at rest and in steady state]. Braunschweig: Vieweg und Sohn.; 1920. German.

[207] Köhler W. Physical Gestalten. In: Ellis WD, editor. A source book of Gestalt psychology. London: Routledge & Kegan Paul; 1938. p. 17–54. link1

[208] Wertheimer M. [Investigations in gestalt theory: II. laws of organization in perceptual forms]. Psychol Forsch 1923;4(1):301–50. German.

[209] Wertheimer M. Laws of organization in perceptual forms. In: Ellis WD, editor. A source book of Gestalt psychology. London: Routledge & Kegan Paul; 1938. p. 71–94. link1

[210] Koffka K. Principles of Gestalt psychology. London: Routledge; 1935. link1

[211] Waltz D. Understanding line drawings of scenes with shadows. In: Winston PH, Horn B, editors. The psychology of computer vision. New York: McGrawHill Companies; 1975. link1

[212] Barrow HG, Tenenbaum JM. Interpreting line drawings as three-dimensional surfaces. Artif Intell 1981;17(1–3):75–116. link1

[213] Lowe DG. Three-dimensional object recognition from single two-dimensional images. Artif Intell 1987;31(3):355–95. link1

[214] Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2004;60(2):91–110. link1

[215] Solso RL, MacLin MK, MacLin OH. Cognitive psychology. 7th ed. New York: Pearson Education; 2005. link1

[216] Dayan P, Hinton GE, Neal RM, Zemel RS. The Helmholtz machine. Neural Comput 1995;7(5):889–904. link1

[217] Roberts LG. Machine perception of three-dimensional solids [dissertation]. Cambridge: Massachusetts Institute of Technology; 1963. link1

[218] Biederman I, Mezzanotte RJ, Rabinowitz JC. Scene perception: detecting and judging objects undergoing relational violations. Cognit Psychol 1982;14 (2):143–77. link1

[219] Blum M, Griffith A, Neumann B. A stability test for configurations of blocks Technical report. Cambridge: Massachusetts Institute of Technology; 1970. link1

[220] Brand M, Cooper P, Birnbaum L. Seeing physics, or: physics is for prediction. In: Proceedings of the Workshop on Physics-based Modeling in Computer Vision; 1995 Jun 18–19; Cambridge, MA, USA; 1995. p. 144–50.

[221] Gupta A, Efros AA, Hebert M. Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings of the 2010 European Conference on Computer Vision; 2010 Sep 5–11; Heraklion, Greece; 2010. p. 482–96.

[222] Hedau V, Hoiem D, Forsyth D. Recovering the spatial layout of cluttered rooms. In: Proceedings of the 2009 International Conference on Computer Vision; 2009 Sep 29–Oct 2; Kyoto, Japan; 2009. p. 1849–56.

[223] Lee DC, Hebert M, Kanade T. Geometric reasoning for single image structure recovery. In: Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition; 2009 Jun 20–25; Miami, FL, USA; 2009. p. 2136–43.

[224] Hedau V, Hoiem D, Forsyth D. Recovering free space of indoor scenes from a single image. In: Proceedings of the 2012 Conference on Computer Vision and Pattern Recognition; 2012 Jun 16–21; Providence, RI, USA; 2012. p. 2807–14.

[225] Silberman N, Hoiem D, Kohli P, Fergus R. Indoor segmentation and support inference from RGBD images. In: Proceedings of the 2012 European Conference on Computer Vision; 2012 Oct 7–13; Florence, Italy; 2012. p. 746–60.

[226] Schwing AG, Hazan T, Pollefeys M, Urtasun R. Efficient structured prediction for 3D indoor scene understanding. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition; 2012 Jun 16–21; Providence, RI, USA; 2012. p. 2815–22.

[227] Guo R, Hoiem D. Support surface prediction in indoor scenes. In: Proceedings of the 2013 IEEE International Conference on Computer Vision; 2013 Dec 1– 8; Sydney, NSW, Australia; 2013. p. 2144–51.

[228] Shao T, Monszpart A, Zheng Y, Koo B, Xu W, Zhou K, et al. Imagining the unseen: stability-based cuboid arrangements for scene understanding. ACM Trans Graph 2014;33(6):1–11. link1

[229] Du Y, Liu Z, Basevi H, Leonardis A, Freeman B, Tenenbaum J, et al. Learning to exploit stability for 3D scene parsing. In: Proceedings of the 2018 Neural Information Processing Systems; 2018 Dec 3–8; Montreal, QC, Canada; 2018.

[230] Wu J, Yildirim I, Lim JJ, Freeman B, Tenenbaum J. Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: Proceedings of the 2015 Neural Information Processing Systems; 2015 Dec 7– 12; Montreal, QC, Canada; 2015.

[231] Wu J, Lim JJ, Zhang H, Tenenbaum JB, Freeman WT. Physics 101: learning physical object properties from unlabeled videos. In: Proceedings of the 2016 British Machine Vision Conference; 2016 Sep 19–22; York, UK; 2016.

[232] Zhu Y, Zhao Y, Zhu SC. Understanding tools: task-oriented object modeling, learning and recognition. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition; 2015 Jun 7–12; Boston, MA, USA; 2015. p. 2855–64.

[233] Zhu Y, Jiang C, Zhao Y, Terzopoulos D, Zhu SC. Inferring forces and learning human utilities from videos. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition; 2016 Jun 26–Jul 1; Las Vegas, NV, USA; 2016.

[234] Brubaker MA, Fleet DJ. The kneed walker for human pose tracking. In: Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition; 2008 Jun 23–28; Anchorage, AK, USA; 2008. p. 1–8.

[235] Brubaker MA, Sigal L, Fleet DJ. Estimating contact dynamics. In: Proceedings of the 2009 IEEE International Conference on Computer Vision; 2009 Sep 29– Oct 2; Kyoto, Japan; 2009. p. 2389–96.

[236] Brubaker MA, Fleet DJ, Hertzmann A. Physics-based person tracking using the anthropomorphic walker. Int J Comput Vis 2010;87(1–2):140–55. link1

[237] Pham TH, Kheddar A, Qammaz A, Argyros AA. Towards force sensing from vision: observing hand-object interactions to infer manipulation forces. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition; 2015 Jun 7–12; Boston, MA, USA. p. 2810–9. link1

[238] Wang Y, Min J, Zhang J, Liu Y, Xu F, Dai Q, et al. Video-based hand manipulation capture through composite motion control. ACM Trans Graph 2013;32(4):43. link1

[239] Zhao W, Zhang J, Min J, Chai J. Robust realtime physics-based motion control for human grasping. ACM Trans Graph 2013;32(6):207. link1

[240] Gibson JJ. The perception of the visual world. Boston: Houghton Mifflin; 1950. link1

[241] Gibson JJ. The senses considered as perceptual systems. Boston: Houghton Mifflin; 1966. link1

[242] Nelson K. Concept, word, and sentence: interrelations in acquisition and development. Psychol Rev 1974;81(4):267–85. link1

[243] Gibson JJ. The theory of affordances. In: Gieseking JJ, Mangold W, Katz C, Low S, Saegert S, editors. The people, place, and space reader. New York: Routledge; 2014. link1

[244] Hassanin M, Khan S, Tahtali M. Visual affordance and function understanding: a survey. 2018. arXiv:1807.06775.

[245] Min H, Yi C, Luo R, Zhu J, Bi S. Affordance research in developmental robotics: a survey. IEEE Trans Cogn Dev Syst 2016;8(4):237–55. link1

[246] Bohg J, Morales A, Asfour T, Kragic D. Data-driven grasp synthesis—a survey. IEEE Trans Robot 2014;30(2):289–309. link1

[247] Yamanobe N, Wan W, Ramirez-Alpizar IG, Petit D, Tsuji T, Akizuki S, et al. A brief review of affordance in robotic manipulation research. Adv Robot 2017;31(19–20):1086–101. link1

[248] Kohler W. The mentality of apes. New York: Routledge; 1925. link1

[249] Thorpe WH. Learning and instinct in animals. Cambridge: Harvard University Press; 1956. link1

[250] Oakley KP. Man the tool-maker. Chicago: University of Chicago Press; 1968. link1

[251] Goodall J. The chimpanzees of Gombe: patterns of behavior. Cambridge: Bellknap Press of the Harvard University Press; 1986. link1

[252] Whiten A, Goodall J, McGrew WC, Nishida T, Reynolds V, Sugiyama Y, et al. Cultures in chimpanzees. Nature 1999;399(6737):682–5. link1

[253] Byrne R, Whiten A, editors. Machiavellian intelligence: social expertise and the evolution of intellect in monkeys, apes, and humans. New York: Oxford University Press; 1988. link1

[254] Santos LR, Rosati A, Sproul C, Spaulding B, Hauser MD. Means-means-end tool choice in cotton-top tamarins (Saguinus oedipus): finding the limits on primates’ knowledge of tools. Anim Cogn 2005;8:236–46. link1

[255] Hunt GR. Manufacture and use of hook-tools by New Caledonian crows. Nature 1996;379(6562):249–51. link1

[256] Weir AA, Chappell J, Kacelnik A. Shaping of hooks in New Caledonian crows. Science 2002;297(5583):981. link1

[257] McCoy DE, Schiestl M, Neilands P, Hassall R, Gray RD, Taylor AH. New Caledonian crows behave optimistically after using tools. Curr Biol 2019;29 (16):2737–42. link1

[258] Beck BB. Animal tool behavior: the use and manufacture of tools by animals. New York: Garland STPM Press; 1980. link1

[259] Bird CD, Emery NJ. Insightful problem solving and creative tool modification by captive nontool-using rooks. Proc Natl Acad Sci USA 2009;106 (25):10370–5. link1

[260] Freeman P, Newell A. A model for functional reasoning in design. In: Proceedings of the 1971 International Joint Conference on Artificial Intelligence; 1971 Sep 1–3; London, England; 1971.

[261] Winston PH. Learning structural descriptions from examples Technical report. Cambridge: Massachusetts Institute of Technology; 1970. link1

[262] Winston PH, Binford TO, Katz B, Lowry M. Learning physical descriptions from functional definitions, examples, and precedents. Proceedings of the 1983 AAAI Conference on Artificial Intelligence; 1983 Aug 22–26; Washington, DC, USA, 1983. link1

[263] Brady M, Agre PE. The mechanic’s mate. In: Proceedings of the 6th European Conference on Artificial Intelligence; 1984 Sep 5–7; Pisa, Italy; 1984. p. 79– 94

[264] Connell JH, Brady M. Generating and generalizing models of visual objects. Artif Intell 1987;31(2):159–83. link1

[265] Ho SB. Representing and using functional definitions for visual recognition [dissertation]. Madison: The University of Wisconsin-Madison; 1987. link1

[266] DiManzo M, Trucco E, Giunchiglia F, Ricci F. FUR: understanding functional reasoning. Int J Intell Syst 1989;4(4):431–57. link1

[267] Minsky M. The society of mind. New York: Simon and Schuster Paperbacks; 1988. link1

[268] Stark L, Bowyer K. Achieving generalized object recognition through reasoning about association of function to structure. IEEE Trans Pattern Anal Mach Intell 1991;13(10):1097–104. link1

[269] Liu Z, Freeman WT, Tenenbaum JB, Wu J. Physical primitive decomposition. In: Proceedings of the 2018 European Conference on Computer Vision; 2018 Sep 8–14; Munich, Germany; 2018.

[270] Baber C. Cognition and tool use: forms of engagement in human and animal use of tools. London: CRC Press; 2003. link1

[271] Inhelder B, Piaget J. The growth of logical thinking from childhood to adolescence: an essay on the construction of formal operational structures. London: Psychology Press; 1958. link1

[272] Hespos SJ, Baillargeon R. Reasoning about containment events in very young infants. Cognition 2001;78(3):207–45. link1

[273] Wang SH, Baillargeon R, Paterson S. Detecting continuity violations in infancy: a new account and new evidence from covering and tube events. Cognition 2005;95(2):129–73. link1

[274] Hespos SJ, Spelke ES. Precursors to spatial language: the case of containment. In: Aurnague M, Hickmann M, editors. The categorization of spatial entities in language and cognition. Amsterdam: John Benjamins Publishing; 2007. p. 233–45. link1

[275] Strickland B, Scholl BJ. Visual perception involves event-type representations: the case of containment versus occlusion. J Exp Psychol Gen 2015;144(3):570–80. link1

[276] Casasola M, Cohen LB. Infant categorization of containment, support and tight-fit spatial relationships. Dev Sci 2002;5(2):247–64. link1

[277] Davis E, Marcus G, Frazier-Logue N. Commonsense reasoning about containers using radically incomplete information. Artif Intell 2017;248:46–84. link1

[278] Davis E. How does a box work? A study in the qualitative dynamics of solid objects. Artif Intell 2011;175(1):299–345. link1

[279] Davis E. Pouring liquids: a study in commonsense physical reasoning. Artif Intell 2008;172(12–13):1540–78. link1

[280] Cohn AG. Qualitative spatial representation and reasoning techniques. In: Proceedings of the 1997 Annual Conference on Artificial Intelligence; 1997 Sep 9–12; Freiburg, Germany; 1997. p. 1–30.

[281] Cohn AG, Hazarika SM. Qualitative spatial representation and reasoning: an overview. Fundam Inform 2001;46(1–2):1–29. link1

[282] Liang W, Zhao Y, Zhu Y, Zhu SC. Evaluating human cognition of containing relations with physical simulation. In: Proceedings of the 2015 Annual Meeting of the Cognitive Science Society; 2015 Jul 23–25; Pasadena, CA, USA; 2015.

[283] Yu LF, Duncan N, Yeung SK. Fill and transfer: a simple physics-based approach for containability reasoning. In: Proceedings of the 2015 International Conference on Computer Vision; 2015 Dec 11–18; Santiago, Chile; 2015.

[284] Mottaghi R, Schenck C, Fox D, Farhadi A. See the glass half full: reasoning about liquid containers, their volume and content. In: Proceedings of the 2017 International Conference on Computer Vision; 2017 Oct 22–29; Venice, Italy; 2017.

[285] Liang W, Zhao Y, Zhu Y, Zhu SC. What is where: inferring containment relations from videos. In: Proceedings of the 2016 International Joint Conference on Artificial Intelligence; 2016 Jul 9–15; New York, NY, USA; 2016.

[286] Liang W, Zhu Y, Zhu SC. Tracking occluded objects and recovering incomplete trajectories by reasoning about containment relations and human actions. In: Proceedings of the 2018 AAAI Conference on Artificial Intelligence; 2018 Feb 2–7; New Orleans, LA, USA; 2018.

[287] Jiang Y, Lim M, Saxena A. Learning object arrangements in 3D scenes using human context. In: Proceedings of the 29th International Conference on Machine Learning; 2012 Jun 26–Jul 1; Edinburgh, Scotland. p. 907–14. link1

[288] Jiang C, Qi S, Zhu Y, Huang S, Lin J, Yu LF, et al. Configurable 3D scene synthesis and 2D image rendering with per-pixel ground truth using stochastic grammars. Int J Comput Vis 2018;126(9):920–41. link1

[289] Dautenhahn K, Nehaniv CL, editors. Imitation in animals and artifacts. Cambridge: MIT Press; 2002. link1

[290] Argall BD, Chernova S, Veloso M, Browning B. A survey of robot learning from demonstration. Robot Auton Syst 2009;57(5):469–83. link1

[291] Osa T, Pajarinen J, Neumann G, Bagnell JA, Abbeel P, Peters J. An algorithmic perspective on imitation learning. Found Trends Rob 2018;7(1–2):1–179. link1

[292] Gu Y, Sheng W, Liu M, Ou Y. Fine manipulative action recognition through sensor fusion. In: Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2015 Sep 28–Oct 2; Hamburg, Germany; 2015.

[293] Hammond FL, Mengüç Y, Wood RJ. Toward a modular soft sensor-embedded glove for human hand motion and tactile pressure measurement. In: Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2014 Sep 14–18; Chicago, IL, USA. p. 4000–7. link1

[294] Liu H, Xie X, Millar M, Edmonds M, Gao F, Zhu Y, et al. A glove-based system for studying hand-object manipulation via joint pose and force sensing. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2017 Sep 24–28; Vancouver, BC, USA. p. 6617–24. link1

[295] Edmonds M, Gao F, Xie X, Liu H, Qi S, Zhu Y, et al. Feeling the force: integrating force and pose for fluent discovery through imitation learning to open medicine bottles. In: Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2017 Sep 24–28; Vancouver, BC, USA. p. 3530–7. link1

[296] Liu H, Zhang Y, Si W, Xie X, Zhu Y, Zhu SC. Interactive robot knowledge patching using augmented reality. In: Proceedings of the 2018 IEEE International Conference on Robotics and Automation; 2018 May 21–25; Brisbane, QLD, Australia. p. 1947–54. link1

[297] Edmonds M, Gao F, Liu H, Xie X, Qi S, Rothrock B, et al. A tale of two explanations: enhancing human trust by explaining robot behavior. Sci Robot 2019;4(37):eaay4663. link1

[298] Liu H, Zhang C, Zhu Y, Jiang C, Zhu SC. Mirroring without overimitation: learning functionally equivalent manipulation actions. Proceedings of the 2019 AAAI Conference on Artificial Intelligence; 2019 Jan 27–Feb 1; Honolulu, HI, USA, 2019. link1

[299] Dennett DC. The intentional stance. Cambridge: MIT Press; 1989. link1

[300] Heider F. The psychology of interpersonal relations. London: Psychology Press; 2013. link1

[301] Gergely G, Nádasdy Z, Csibra G, Bíró S. Taking the intentional stance at 12 months of age. Cognition 1995;56(2):165–93. link1

[302] Premack D, Woodruff G. Does the chimpanzee have a theory of mind? Behav Brain Sci 1978;1(4):515–26. link1

[303] Baldwin DA, Baird JA. Discerning intentions in dynamic human action. Trends Cogn Sci 2001;5(4):171–8. link1

[304] Woodward AL. Infants selectively encode the goal object of an actor’s reach. Cognition 1998;69(1):1–34. link1

[305] Meltzoff AN, Brooks R. ‘‘Like me” as a building block for understanding other minds: bodily acts, attention, and intention. In: Malle BF, Moses LJ, Baldwin DA, editors. Intentions and intentionality: foundations of social cognition. Cambridge: MIT Press; 2001. p. 171–92. link1

[306] Baldwin DA, Baird JA, Saylor MM, Clark MA. Infants parse dynamic action. Child Dev 2001;72(3):708–17. link1

[307] Tomasello M, Carpenter M, Call J, Behne T, Moll H. Understanding and sharing intentions: the origins of cultural cognition. Behav Brain Sci 2005;28 (5):675–91. link1

[308] Biro S, Hommel B. Becoming an intentional agent: introduction to the special issue. Acta Psychol 2007;124(1):1–7. link1

[309] Gergely G, Bekkering H, Király I. Rational imitation in preverbal infants. Nature 2002;415(6873):755. link1

[310] Woodward AL, Sommerville JA, Gerson S, Henderson AME, Buresh J. The emergence of intention attribution in infancy. Psychol Learn Motiv 2009;51:187–222. link1

[311] Zelazo PD, Astington JW, Olson DR, editors. Developing theories of intention: social understanding and self-control. Mahwah: Lawrence Erlbaum Associates Publishers; 1999. link1

[312] Bloom P. Intention, history, and artifact concepts. Cognition 1996;60 (1):1–29. link1

[313] Heider F, Simmel M. An experimental study of apparent behavior. Am J Psychol 1944;57(2):243–59. link1

[314] Berry DS, Misovich SJ. Methodological approaches to the study of social event perception. Pers Soc Psychol Bull 1994;20(2):139–52. link1

[315] Bassili JN. Temporal and spatial contingencies in the perception of social events. J Pers Soc Psychol 1976;33(6):680–5. link1

[316] Dittrich WH, Lea SE. Visual perception of intentional motion. Perception 1994;23(3):253–68. link1

[317] Dennett DC. Précis of the intentional stance. Behav Brain Sci 1988;11 (3):495–505. link1

[318] Liu S, Brooks NB, Spelke ES. Origins of the concepts cause, cost, and goal in prereaching infants. Proc Natl Acad Sci USA 2019;116(36):17747–52. link1

[319] Gao T, Newman GE, Scholl BJ. The psychophysics of chasing: a case study in the perception of animacy. Cognit Psychol 2009;59(2):154–79. link1

[320] Liu S, Spelke ES. Six-month-old infants expect agents to minimize the cost of their actions. Cognition 2017;160:35–42. link1

[321] Gergely G, Csibra G. Teleological reasoning in infancy: the naïve theory of rational action. Trends Cogn Sci 2003;7(7):287–92. link1

[322] Baker CL, Saxe R, Tenenbaum JB. Action understanding as inverse planning. Cognition 2009;113(3):329–49. link1

[323] Pereira LM, Anh HT. Intention recognition via causal Bayes networks plus plan generation. In: Proceedings of the 14th Portuguese Conference on Artificial Intelligence; 2009 Oct 12–15; Aveiro, Portugal; 2009. p. 138–49.

[324] Narang S, Best A, Manocha D. Inferring user intent using Bayesian theory of mind in shared avatar-agent virtual environments. IEEE Trans Vis Comput Graph 2019;25(5):2113–22. link1

[325] Nakahashi R, Baker CL, Tenenbaum JB. Modeling human understanding of complex intentional action with a Bayesian nonparametric subgoal model. Proceedings of the 2016 AAAI Conference on Artificial Intelligence; 2016 Feb 12–17; Phoenix, AZ, USA, 2016. link1

[326] Holtzen S, Zhao Y, Gao T, Tenenbaum JB, Zhu SC. Inferring human intent from video by sampling hierarchical plans. In: Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems; 2016 Oct 9–14; Daejeon, Korea. p. 1489–96. link1

[327] Kong Y, Fu Y. Human action recognition and prediction: a survey. 2018. arXiv:1806.11230.

[328] Blakemore SJ, Decety J. From the perception of action to the understanding of intention. Nat Rev Neurosci 2001;2(8):561–7. link1

[329] Elsner B, Hommel B. Effect anticipation and action control. J Exp Psychol Hum Percept Perform 2001;27(1):229–40. link1

[330] Elsner B. Infants’ imitation of goal-directed actions: the role of movements and action effects. Acta Psychol 2007;124(1):44–59. link1

[331] Rizzolatti G, Craighero L. The mirror–neuron system. Annu Rev Neurosci 2004;27(1):169–92. link1

[332] Kaplan JT, Iacoboni M. Getting a grip on other minds: mirror neurons, intention understanding, and cognitive empathy. Soc Neurosci 2006;1(3– 4):175–83. link1

[333] Reid VM, Csibra G, Belsky J, Johnson MH. Neural correlates of the perception of goal-directed action in infants. Acta Psychol 2007;124(1):129–38. link1

[334] Csibra G, Gergely G. The teleological origins of mentalistic action explanations: a developmental hypothesis. Dev Sci 2002;1(2):255–9. link1

[335] Gergely G. The development of understanding self and agency. In: Goswami U, editor. Blackwell handbook of childhood cognitive development. Oxford: Blackwell Publishers Ltd.; 2002. p. 26–46.

[336] Kleinke CL. Gaze and eye contact: a research review. Psychol Bull 1986;100 (1):78–100. link1

[337] Emery NJ. The eyes have it: the neuroethology, function and evolution of social gaze. Neurosci Biobehav Rev 2000;24(6):581–604. link1

[338] Burgoon JK, Guerrero LK, Floyd K. Nonverbal communication. New York: Routledge; 2016. link1

[339] Wei P, Liu Y, Shu T, Zheng N, Zhu SC. Where and why are they looking? Jointly inferring human attention and intentions in complex tasks. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–22; Salt Lake City, UT, USA; 2018. p. 6801–9.

[340] Melis AP, Tomasello M. Chimpanzees (Pan troglodytes) coordinate by communicating in a collaborative problem-solving task. Proc R Soc B 1901;2019(286):20190408. link1

[341] Fan L, Chen Y, Wei P, Wang W, Zhu SC. Inferring shared attention in social scene videos. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–22; Salt Lake City, UT, USA; 2018. p. 6460–8.

[342] Fan L, Wang W, Huang S, Tang X, Zhu SC. Understanding human gaze communication by spatio-temporal graph reasoning. In: Proceedings of the 2019 International Conference on Computer Vision; 2019 Oct 27–Nov 2; Seoul, Korea. p. 5724–33. link1

[343] Trick S, Koert D, Peters J, Rothkopf C. Multimodal uncertainty reduction for intention recognition in human–robot interaction. 2019. arXiv:1907.02426.

[344] Shu T, Ryoo MS, Zhu SC. Learning social affordance for human–robot interaction. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence; 2016 Jul 9–15; New York, NY, USA; 2016. p. 3454–61.

[345] Shu T, Gao X, Ryoo MS, Zhu SC. Learning social affordance grammar from videos: transferring human interactions to human–robot interactions. In: Proceedings of the 2017 IEEE International Conference on Robotics and Automation; 2017 May 29–Jun 3; Singapore, Singapore; 2017.

[346] Russell SJ, Norvig P. Artificial intelligence: a modern approach. 3rd ed. New York: Pearson Education Limited; 2016. link1

[347] Hutcheson F. An inquiry into the original of our ideas of beauty and virtue: in two treatises. 2nd ed. London: Darby J, Bettesworth A, Fayram F, Pemberton J, Rivington C, Hooke J, Clay F, Batley J, Symon E; 1726.

[348] Mill JS. Utilitarianism. 12th ed. New York: Longmans, Green and Company; 1895. link1

[349] Shukla N, He Y, Chen F, Zhu SC. Learning human utility from video demonstrations for deductive planning in robotics. In: Proceedings of the 1st Annual Conference on Robot Learning; 2017 Nov 13–15; Mountain View, CA, USA. p. 448–57. link1

[350] Grice HP, Cole P, Morgan J. Logic and conversation. In: Ezcurdia M, Stainton RJ, editors. The semantics–pragmatics boundary in philosophy. Toronto: Broadview Press; 2013. link1

[351] Goodman ND, Frank MC. Pragmatic language interpretation as probabilistic inference. Trends Cogn Sci 2016;20(11):818–29. link1

[352] Lewis D. Convention: a philosophical study. Oxford: Blackwell Publishers; 2002. link1

[353] Sperber D, Wilson D. Relevance: communication and cognition. Cambridge: Harvard University Press; 1986. link1

[354] Wittgenstein L. Philosophical investigations. New York: Macmillan; 1953. link1

[355] Clark HH. Using language. Cambridge: Cambridge University Press; 1996. link1

[356] Qing C, Franke M. Variations on a Bayesian theme: comparing Bayesian models of referential reasoning. In: Zeevat H, Schmitz HC, editors. Bayesian natural language semantics and pragmatics. Heidelberg: Springer; 2015. p. 201–20. link1

[357] Goodman ND, Stuhlmüller A. Knowledge and implicature: modeling language understanding as social cognition. Top Cogn Sci 2013;5(1):173–84. link1

[358] Dale R, Reiter E. Computational interpretations of the Gricean maxims in the generation of referring expressions. Cogn Sci 1995;19(2):233–63. link1

[359] Benz A, Jäger G, van Rooij R. An introduction to game theory for linguists. In: Benz A, Jäger G, van Rooij R, editors. Game theory and pragmatics. London: Palgrave Macmillan; 2006. p. 1–82. link1

[360] Jäger G. Applications of game theory in linguistics. Lang Linguist Compass 2008;2(3):406–21. link1

[361] Frank MC, Goodman ND. Predicting pragmatic reasoning in language games. Science 2012;336(6084):998. link1

[362] Kleiman-Weiner M, Gerstenberg T, Levine S, Tenenbaum JB. Inference of intention and permissibility in moral decision making. In: Proceedings of the 2015 Annual Meeting of the Cognitive Science Society; 2015 Jul 23–25; Pasadena, CA, USA; 2015.

[363] Kleiman-Weiner M, Ho MK, Austerweil JL, Littman ML, Tenenbaum JB. Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction. In: Proceedings of the 2016 Annual Meeting of the Cognitive Science Society; 2016 Aug 10–13; Philadelphia, PA, USA; 2016.

[364] Shum M, Kleiman-Weiner M, Littman ML, Tenenbaum JB. Theory of minds: understanding behavior in groups through inverse planning. In: Proceedings of the 2019 AAAI Conference on Artificial Intelligence; 2019 Jan 27–Feb 1; Honolulu, HI, USA; 2019.

[365] Kleiman-Weiner M, Shaw A, Tenenbaum JB. Constructing social preferences from anticipated judgments: when impartial inequity is fair and why? In: Proceedings of the 2017 Annual Meeting of the Cognitive Science Society; 2017 Jul 26–29; London, UK; 2017.

[366] Kleiman-Weiner M, Saxe R, Tenenbaum JB. Learning a commonsense moral theory. Cognition 2017;167:107–23. link1

[367] Kinney M, Tsatsoulis C. Learning communication strategies in multiagent systems. Appl Intell 1998;9(1):71–91. link1

[368] Lowe R, Wu Y, Tamar A, Harb J, Abbeel OP, Mordatch I. Multi-agent actorcritic for mixed cooperative–competitive environments. In: Proceedings of the 2017 Neural Information Processing Systems; 2017 Dec 3–9; Long Beach, CA, USA; 2017.

[369] Foerster J, Assael IA, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. Proceedings of the 2016 Neural Information Processing Systems; 2016 Dec 5–10; Barcelona, Spain, 2016. link1

[370] Foerster J, Nardelli N, Farquhar G, Afouras T, Torr PHS, Kohli P, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning; 2017 Aug 6–11; Sydney, NSW, Australia. p. 1146–55. link1

[371] Holyoak KJ. Analogy and relational reasoning. In: Holyoak KJ, Morrison RG, editors. The Oxford handbook of thinking and reasoning. New York: Oxford University Press; 2012. p. 234–59. link1

[372] Raven JC. Raven progressive matrices. Torrance: Western Psychological Services; 1938. link1

[373] Zhang C, Gao F, Jia B, Zhu Y, Zhu SC. RAVEN: a dataset for relational and analogical visual reasoning. In: Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition; 2019 Jun 16–20; Long Beach, CA, USA. p. 5317–27. link1

[374] Legg S, Hutter M. Universal intelligence: a definition of machine intelligence. Minds Mach 2007;17(4):391–444. link1

[375] Mo K, Zhu S, Chang AX, Yi L, Tripathi S, Guibas LJ, et al. PartNet: a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding. In: Proceedings of the 2019 Conference on Computer Vision and Pattern Recognition; 2019 Jun 16–20; Long Beach, CA, USA. p. 909–18. link1

[376] Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, et al. ShapeNet: an information-rich 3D model repository. 2015. arXiv:1512.03012.

[377] Feng T, Yu LF, Yeung SK, Yin K, Zhou K. Crowd-driven mid-scale layout design. ACM Trans Graph 2016;35(4):132. link1

[378] Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V. MINOS: multimodal indoor simulator for navigation in complex environments. 2017. arXiv:1712.03931.

[379] Brodeur S, Perez E, Anand A, Golemo F, Celotti L, Strub F, et al. HoME: a household multimodal environment. 2017. arXiv:1711.11017.

[380] Xia F, Zamir AR, He Z, Sax A, Malik J, Savarese S. Gibson Env: real-world perception for embodied agents. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–22; Salt Lake City, UT, USA; 2018. p. 9068–79.

[381] Wu Y, Wu YX, Gkioxari G, Tian Y. Building generalizable agents with a realistic and rich 3D environment. 2018. arXiv:1801.02209.

[382] Kolve E, Mottaghi R, Han W, VanderBilt E, Weihs L, Herrasti A, et al. AI2- THOR: an interactive 3D environment for visual AI. 2017. arXiv:1712.05474.

[383] Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, et al. VirtualHome: simulating household activities via programs. In: Proceedings of the 2018 Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–22; Salt Lake City, UT, USA; 2018. p. 8494–502.

[384] Xie X, Liu H, Zhang Z, Qiu Y, Gao F, Qi S, et al. VRGym: a virtual testbed for physical and interactive AI. In: Proceedings of the ACM TURC; 2019 May 17– 19; Chengdu, China; 2019.

[385] Gao X, Gong R, Shu T, Xie X, Wang S, Zhu SC. VRKitchen: an interactive 3D virtual environment for task-oriented learning. 2019. arXiv:1903.05757.

[386] Shah S, Dey D, Lovett C, Kapoor A. AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter M, Siegwart R, editors. Field and service robotics. Cham: Springer; 2018. p. 621–35. link1

[387] Gao M, Wang X, Wu K, Pradhana A, Sifakis E, Yuksel C, et al. GPU optimization of material point methods. ACM Trans Graph 2018;37(6):254. link1

[388] Terzopoulos D, Platt J, Barr A, Fleischer K. Elastically deformable models. In: Stone MC, editor. Proceedings of the 14th Annual Conference on Computer Graphics and Interactive Techniques; 1987 July 27–31; Anaheim, CA, USA. New York: Association for Computing Machinery; 1987. p. 205–14.

[389] Terzopoulos D, Fleischer K. Modeling inelastic deformation: viscolelasticity, plasticity, fracture. In: Beach RJ, editor. Proceedings of the 15th Annual Conference on Computer Graphics and Interactive Techniques; 1988 Aug 1– 5; Atlanta, GA, USA; New York: Association for Computing Machinery; 1988. p. 269–78.

[390] Foster N, Metaxas D. Realistic animation of liquids. Graph Models Image Proc 1996;58(5):471–83. link1

[391] Stam J. Stable fluids. ACM Trans Graph 1999;99:121–8. link1

[392] Bridson R. Fluid simulation for computer graphics. London: CRC Press; 2015. link1

[393] Bonet J, Wood RD. Nonlinear continuum mechanics for finite element analysis. New York: Cambridge University Press; 1997. link1

[394] Blemker S, Teran J, Sifakis E, Fedkiw R, Delp S. Fast 3D muscle simulations using a new quasistatic invertible finite-element algorithm. In: Proceedings of the 2005 International Symposium on Computer Simulation in Biomechanics; 2005 Jul 28–30; Cleveland, OH, USA; 2005.

[395] Hegemann J, Jiang C, Schroeder C, Teran JM. A level set method for ductile fracture. In: Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation; 2013 Jul 19–21; Anaheim, CA, USA; 2013. p. 193–201.

[396] Gast TF, Schroeder C, Stomakhin A, Jiang C, Teran JM. Optimization integrator for large time steps. IEEE Trans Vis Comput Graph 2015;21(10):1103–15. link1

[397] Li M, Gao M, Langlois T, Jiang C, Kaufman DM. Decomposed optimization time integrator for large-step elastodynamics. ACM Trans Graph 2019;38(4):70. link1

[398] Wang Y, Jiang C, Schroeder C, Teran J. An adaptive virtual node algorithm with robust mesh cutting. In: Proceedings of the 2014 ACM SIGGRAPH/ Eurographics Symposium on Computer Animation; 2014 Jul 21–23; Copenhagen, Denmark; 2014. p. 77–85.

[399] Monaghan JJ. Smoothed particle hydrodynamics. Annu Rev Astron Astrophys 1992;30(1):543–74. link1

[400] Liu WK, Jun S, Zhang YF. Reproducing kernel particle methods. Int J Numer Methods Fluids 1995;20(8–9):1081–106. link1

[401] Li S, Liu WK. Meshfree and particle methods and their applications. Appl Mech Rev 2002;55(1):1–34. link1

[402] Donea J, Giuliani S, Halleux JP. An arbitrary Lagrangian-Eulerian finite element method for transient dynamic fluid–structure interactions. Comput Methods Appl Mech Eng 1982;33(1–3):689–723. link1

[403] Brackbill JU, Ruppel HM. FLIP: a method for adaptively zoned, particle-in-cell calculations of fluid flows in two dimensions. J Comput Phys 1986;65 (2):314–43. link1

[404] Jiang C, Schroeder C, Selle A, Teran J, Stomakhin A. The affine particle-in-cell method. ACM Trans Graph 2015;34(4):51. link1

[405] Sulsky D, Chen Z, Schreyer HL. A particle method for history-dependent materials. Comput Methods Appl Mech Eng 1994;118(1–2):179–96. link1

[406] Sulsky D, Zhou SJ, Schreyer HL. Application of a particle-in-cell method to solid mechanics. Comput Phys Commun 1995;87(1–2):236–52. link1

[407] Stomakhin A, Schroeder C, Chai L, Teran J, Selle A. A material point method for snow simulation. ACM Trans Graph 2013;32(4):102. link1

[408] Gaume J, Gast T, Teran J, van Herwijnen A, Jiang C. Dynamic anticrack propagation in snow. Nat Commun 2018;9(1):3047. link1

[409] Ram D, Gast T, Jiang C, Schroeder C, Stomakhin A, Teran J, et al. A material point method for viscoelastic fluids, foams and sponges. In: Proceedings of the 14th ACM SIGGRAPH/Eurographics Symposium on Computer Animation; 2015 Aug 7–9; Los Angeles, CA, USA; 2015. p. 157–63.

[410] Yue Y, Smith B, Batty C, Zheng C, Grinspun E. Continuum foam: a material point method for shear-dependent flows. ACM Trans Graph 2015;34 (5):160. link1

[411] Fang Y, Li M, Gao M, Jiang C. Silly rubber: an implicit material point method for simulating non-equilibrated viscoelastic and elastoplastic solids. ACM Trans Graph 2019;38(4):118. link1

[412] Klár G, Gast T, Pradhana A, Fu C, Schroeder C, Jiang C, et al. Drucker-Prager elastoplasticity for sand animation. ACM Trans Graph 2016;35(4):103. link1

[413] Daviet G, Bertails-Descoubes F. A semi-implicit material point method for the continuum simulation of granular materials. ACM Trans Graph 2016;35 (4):102. link1

[414] Hu Y, Fang Y, Ge Z, Qu Z, Zhu Y, Pradhana A, et al. A moving least squares material point method with displacement discontinuity and two-way rigid body coupling. ACM Trans Graph 2018;37(4):150. link1

[415] Wang S, Ding M, Gast TF, Zhu L, Gagniere S, Jiang C, et al. Simulation and visualization of ductile fracture with the material point method. ACM Trans Graph 2019;2(2):18. link1

[416] Wolper J, Fang Y, Li M, Lu J, Gao M, Jiang C. CD-MPM: continuum damage material point methods for dynamic fracture animation. ACM Trans Graph 2019;38(4):119. link1

[417] Jiang C, Gast T, Teran J. Anisotropic elastoplasticity for cloth, knit and hair frictional contact. ACM Trans Graph 2017;36(4):152. link1

[418] Han X, Gast TF, Guo Q, Wang S, Jiang C, Teran J. A hybrid material point method for frictional contact with diverse materials. ACM Trans Graph 2019;2(2):17. link1

[419] Fu C, Guo Q, Gast T, Jiang C, Teran J. A polynomial particle-in-cell method. ACM Trans Graph 2017;36(6):222. link1

[420] Stomakhin A, Schroeder C, Jiang C, Chai L, Teran J, Selle A. Augmented MPM for phase-change and varied materials. ACM Trans Graph 2014;33(4):138. link1

[421] Tampubolon AP, Gast T, Klár G, Fu C, Teran J, Jiang C, et al. Multi-species simulation of porous sand and water mixtures. ACM Trans Graph 2017;36 (4):105. link1

[422] Gao M, Pradhana A, Han X, Guo Q, Kot G, Sifakis E, et al. Animating fluid sediment mixture in particle-laden flows. ACM Trans Graph 2018;37(4):149. link1

[423] Nairn JA. Material point method calculations with explicit cracks. Comput Model Eng Sci 2003;4(6):649–64. link1

[424] Chen Z, Shen L, Mai YW, Shen YG. A bifurcation-based decohesion model for simulating the transition from localization to decohesion with the MPM. Z Angew Math Phys 2005;56(5):908–30. link1

[425] Schreyer HL, Sulsky DL, Zhou SJ. Modeling delamination as a strong discontinuity with the material point method. Comput Methods Appl Mech Eng 2002;191(23–24):2483–507. link1

[426] Sulsky D, Schreyer HL. Axisymmetric form of the material point method with applications to upsetting and Taylor impact problems. Comput Methods Appl Mech Eng 1996;139(1–4):409–29. link1

[427] Huang P, Zhang X, Ma S, Wang HK. Shared memory OpenMP parallelization of explicit MPM and its application to hypervelocity impact. Comput Model Eng Sci 2008;38(2):119–48. link1

[428] Hu W, Chen Z. Model-based simulation of the synergistic effects of blast and fragmentation on a concrete wall using the MPM. Int J Impact Eng 2006;32 (12):2066–96. link1

[429] York AR, Sulsky D, Schreyer HL. Fluid-membrane interaction based on the material point method. Int J Numer Methods Eng 2000;48(6):901–24. link1

[430] Bandara S, Soga K. Coupling of soil deformation and pore fluid flow using material point method. Comput Geotech 2015;63:199–214. link1

[431] Guilkey JE, Hoying JB, Weiss JA. Computational modeling of multicellular constructs with the material point method. J Biomech 2006;39(11): 2074–86. link1

[432] Huang P. Material point method for metal and soil impact dynamics problems. Beijing: Tsinghua University; 2010. link1

[433] Fang Y, Hu Y, Hu SM, Jiang C. A temporally adaptive material point method with regional time stepping. Comput Graph Forum 2018;37(8):195–204. link1

[434] Bardenhagen SG, Kober EM. The generalized interpolation material point method. Comput Model Eng Sci 2004;5(6):477–96. link1

[435] Gao M, Tampubolon AP, Jiang C, Sifakis E. An adaptive generalized interpolation material point method for simulating elastoplastic materials. ACM Trans Graph 2017;36(6):223. link1

[436] Sadeghirad A, Brannon RM, Burghardt J. A convected particle domain interpolation technique to extend applicability of the material point method for problems involving massive deformations. Int J Numer Methods Eng 2011;86(12):1435–56. link1

[437] Zhang DZ, Ma X, Giguere PT. Material point method enhanced by modified gradient of shape function. J Comput Phys 2011;230(16):6379–98. link1

[438] Bernstein DS, Givan R, Immerman N, Zilberstein S. The complexity of decentralized control of Markov decision processes. Math Oper Res 2002;27 (4):819–40. link1

[439] Goldman CV, Zilberstein S. Optimizing information exchange in cooperative multi-agent systems. In: Proceedings of the 2nd International Joint Conference on Autonomous Agents and Multiagent Systems; 2003 Jul 14– 18; Melbourne, VIC, Australia. p. 137–44. link1

[440] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with deep reinforcement learning. 2013. arXiv:1312.5602.

[441] Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al. Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 2017;12(4):e0172395. link1

[442] Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S. Counterfactual multi-agent policy gradients. In: Proceedings of the 2018 AAAI Conference on Artificial Intelligence; 2018 Feb 2–7; New Orleans, LA, USA; 2018.

[443] Sukhbaatar S, Fergus R. Learning multiagent communication with backpropagation. In: Proceedings of the 2016 Neural Information Processing Systems; 2016 Dec 5–10; Barcelona, Spain; 2016. p. 2244–52.

[444] Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the 2018 AAAI Conference on Artificial Intelligence; 2018 Feb 2–7; New Orleans, LA, USA; 2018.

[445] Lazaridou A, Peysakhovich A, Baroni M. Multi-agent cooperation and the emergence of (natural) language. In: Proceedings of the 5th International Conference on Learning Representations; 2017 Apr 24–26; Toulon, France; 2017.

[446] Havrylov S, Titov I. Emergence of language with multi-agent games: learning to communicate with sequences of symbols. In: Proceedings of the 2017 Neural Information Processing Systems; 2017 Dec 3–9; Long Beach, CA, USA; 2017.

[447] Evtimova K, Drozdov A, Kiela D, Cho K. Emergent language in a multi-modal, multi-step referential game. 2017. arXiv:1705.10369.

[448] Lazaridou A, Hermann KM, Tuyls K, Clark S. Emergence of linguistic communication from referential games with symbolic and pixel input. In: Proceedings of the 2018 International Conference on Learning Representations; 2018 Apr 30–May 3; Vancouver, BC, Canada; 2018.

[449] Wagner K, Reggia JA, Uriagereka J, Wilkinson GS. Progress in the simulation of emergent communication and language. Adapt Behav 2003;11(1):37–69. link1

[450] Ibsen-Jensen R, Tkadlec J, Chatterjee K, Nowak MA. Language acquisition with communication between learners. J R Soc Interface 2018;15(140):20180073. link1

[451] Graesser L, Cho K, Kiela D. Emergent linguistic phenomena in multi-agent communication games. 2019. arXiv:1901.08706.

[452] Dupoux E, Jacob P. Universal moral grammar: a critical appraisal. Trends Cogn Sci 2007;11(9):373–8. link1

[453] Mikhail J. Elements of moral cognition: Rawls’ linguistic analogy and the cognitive science of moral and legal judgment. New York: Cambridge University Press; 2011. link1

[454] Blake PR, McAuliffe K, Corbit J, Callaghan TC, Barry O, Bowie A, et al. The ontogeny of fairness in seven societies. Nature 2015;528(7581):258–61. link1

[455] Henrich J, Boyd R, Bowles S, Camerer C, Fehr E, Gintis H, et al. In search of homo economicus: behavioral experiments in 15 small-scale societies. Am Econ Rev 2001;91(2):73–8. link1

[456] House BR, Silk JB, Henrich J, Barrett HC, Scelza BA, Boyette AH, et al. Ontogeny of prosocial behavior across diverse societies. Proc Natl Acad Sci USA 2013;110(36):14586–91. link1

[457] Graham J, Meindl P, Beall E, Johnson KM, Zhang L. Cultural differences in moral judgment and behavior, across and within societies. Curr Opin Psychol 2016;8:125–30. link1

[458] Hurka T. Virtue, vice, and value. Cambridge: Oxford University Press; 2000. link1

[459] Rawls J. A theory of justice. Cambridge: Harvard University Press; 1971. link1

[460] Haidt J. The new synthesis in moral psychology. Science 2007;316 (5827):998–1002. link1

[461] Hamlin JK. Moral judgment and action in preverbal infants and toddlers: evidence for an innate moral core. Curr Dir Psychol Sci 2013;22 (3):186–93. link1

[462] Kim R, Kleiman-Weiner M, Abeliuk A, Awad E, Dsouza S, Tenenbaum JB, et al. A computational model of commonsense moral decision making. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society; 2018 Feb 2–3; New Orleans, LA, USA; 2018. p. 197–203.

[463] Holyoak KJ, Thagard P. The analogical mind. Am Psychol 1997;52(1): 35–44. link1

[464] Buehner MJ, Cheng PW. Causal learning. In: Holyoak KJ, Morrison RG, editors. The Oxford handbook of thinking and reasoning. New York: Oxford University Press; 2012. p. 210–33. link1

[465] Hesse MB. Models and analogies in science. South Bend: Notre Dame University Press; 1966. link1

[466] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 2013 Neural Information Processing Systems; 2013 Dec 5–8; Lake Tahoe, NV, USA; 2013.

[467] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013. arXiv:1301.3781.

[468] Carpenter PA, Just MA, Shell P. What one intelligence test measures: a theoretical account of the processing in the Raven progressive matrices test. Psychol Rev 1990;97(3):404–31. link1

[469] Antol S, Agrawal A, Lu J, Mitchell M, Batra D, Zitnick CL, et al. VQA: visual question answering. In: Proceedings of the 2015 International Conference on Computer Vision; 2015 Dec 11–18; Santiago, Chile; 2015. p. 2425–33.

[470] Snow RE, Kyllonen PC, Marshalek B. The topography of ability and learning correlations. Adv Psychol Hum Intell 1984;2(S47):103. link1

[471] Jaeggi SM, Buschkuehl M, Jonides J, Perrig WJ. Improving fluid intelligence with training on working memory. Proc Natl Acad Sci USA 2008;105 (19):6829–33. link1

[472] Bower GH. A contrast effect in differential conditioning. J Exp Psychol 1961;62(2):196–9. link1

[473] Meyer DR. The effects of differential rewards on discrimination reversal learning by monkeys. J Exp Psychol 1951;41(4):268–74. link1

[474] Schrier AM, Harlow HF. Effect of amount of incentive on discrimination learning by monkeys. J Comp Physiol Psychol 1956;49(2):117–22. link1

[475] Shapley RM, Victor JD. The effect of contrast on the transfer properties of cat retinal ganglion cells. J Physiol 1978;285(1):275–98. link1

[476] Lawson R. Brightness discrimination performance and secondary reward strength as a function of primary reward amount. J Comp Physiol Psychol 1957;50(1):35–9. link1

[477] Amsel A. Frustrative nonreward in partial reinforcement and discrimination learning: some recent history and a theoretical extension. Psychol Rev 1962;69(4):306–28. link1

[478] Gibson JJ, Gibson EJ. Perceptual learning; differentiation or enrichment? Psychol Rev 1955;62(1):32–41. link1

[479] Gibson JJ. The ecological approach to visual perception: classic edition. London: Psychology Press; 2014. link1

[480] Catrambone R, Holyoak KJ. Overcoming contextual limitations on problemsolving transfer. J Exp Psychol Learn Mem Cogn 1989;15(6):1147–56. link1

[481] Gentner D, Gunn V. Structural alignment facilitates the noticing of differences. Mem Cognit 2001;29(4):565–77. link1

[482] Hammer R, Diesendruck G, Weinshall D, Hochstein S. The development of category learning strategies: what makes the difference? Cognition 2009;112 (1):105–19. link1

[483] Gick ML, Paterson K. Do contrasting examples facilitate schema acquisition and analogical transfer? Can J Psychol 1992;46(4):539. link1

[484] Haryu E, Imai M, Okada H. Object similarity bootstraps young children to action-based verb extension. Child Dev 2011;82(2):674–86. link1

[485] Smith L, Gentner D. The role of difference–detection in learning contrastive categories. In: Proceedings of the 2014 Annual Meeting of the Cognitive Science Society; 2014 Jul 23–26; Quebec City, QC, Canada; 2014.

[486] Gentner D. Structure-mapping: a theoretical framework for analogy. Cogn Sci 1983;7(2):155–70. link1

[487] Gentner D, Markman AB. Structural alignment in comparison: no difference without similarity. Psychol Sci 1994;5(3):152–8. link1

[488] Schwartz DL, Chase CC, Oppezzo MA, Chin DB. Practicing versus inventing with contrasting cases: the effects of telling first on learning and transfer. J Educ Psychol 2011;103(4):759–75. link1

[489] Zhang C, Jia B, Gao F, Zhu Y, Lu H, Zhu SC. Learning perceptual inference by contrasting. In: Proceedings of the 2019 Neural Information Processing Systems; 2019 Dec 8–14; Vancouver, BC, Canada; 2019.

[490] Dehaene S. The number sense: how the mind creates mathematics. New York: Oxford University Press; 2011. link1

[491] Zhang W, Zhang C, Zhu Y, Zhu SC. Machine number sense: a dataset of visual arithmetic problems for abstract and relational reasoning. In: Proceedings of the 2020 AAAI Conference on Artificial Intelligence; 2020 Feb 7–12; New York, NY, USA; 2020.

Related Research