深度学习的几何学解释

Na Lei; Dongsheng An; Yang Guo; Kehua Su; Shixia Liu; Zhongxuan Luo; Shing-Tung Yau; Xianfeng Gu

doi:10.1016/j.eng.2019.09.010

PDF(4885 KB)

工程（英文） ›› 2020, Vol. 6 ›› Issue (3) : 361-374. DOI: 10.1016/j.eng.2019.09.010

研究论文

Article

深度学习的几何学解释

Na Lei ^a^,^* ,
Dongsheng An ^b^,^* ,
Yang Guo ^b ,
Kehua Su ^c ,
Shixia Liu ^d ,
Zhongxuan Luo ^a ,
Shing-Tung Yau ^e^,^d ,
Xianfeng Gu ^b^,^e^,^d

作者信息 +

A Geometric Understanding of Deep Learning

Na Lei ^a^,^* ,
Dongsheng An ^b^,^* ,
Yang Guo ^b ,
Kehua Su ^c ,
Shixia Liu ^d ,
Zhongxuan Luo ^a ,
Shing-Tung Yau ^e^,^d ,
Xianfeng Gu ^b^,^e^,^d

Author information +

History +

摘要

本文从几何角度来理解深度学习，特别是提出了生成对抗网络（GAN）的最优传输（OT）观点。自然数据集具有内在的模式，该模式可被概括为流形分布原理，即同一类高维数据分布于低维流形附近。 GAN主要完成流形学习和概率分布变换两项任务。其中，后者可以用经典的OT方法来实现。从OT的角度来看，生成器用于计算OT映射，而判别器用于计算生成数据分布与真实数据分布之间的Wasserstein距离；两者都可以归结为一个凸优化过程。此外， OT理论揭示了生成器与判别器之间的内在关系是协作的而不是竞争的，并且解释了模式崩溃的根本原因。在此基础上，我们提出了一种新的生成模型，该模型利用自编码器（AE）进行流形学习，并利用OT映射进行概率分布变换。这个AE-OT模型提升了深度学习理论的严谨性和透明性、提高了计算的稳定性和效率，尤其是避免了模式崩溃问题。实验结果验证了我们的假设，并充分展示了我们提出的AE-OT模型的优点。

Abstract

This work introduces an optimal transportation (OT) view of generative adversarial networks (GANs). Natural datasets have intrinsic patterns, which can be summarized as the manifold distribution principle: the distribution of a class of data is close to a low-dimensional manifold. GANs mainly accomplish two tasks: manifold learning and probability distribution transformation. The latter can be carried out using the classical OT method. From the OT perspective, the generator computes the OT map, while the discriminator computes the Wasserstein distance between the generated data distribution and the real data distribution; both can be reduced to a convex geometric optimization process. Furthermore, OT theory discovers the intrinsic collaborative—instead of competitive—relation between the generator and the discriminator, and the fundamental reason for mode collapse. We also propose a novel generative model, which uses an autoencoder (AE) for manifold learning and OT map for probability distribution transformation. This AE–OT model improves the theoretical rigor and transparency, as well as the computational stability and efficiency; in particular, it eliminates the mode collapse. The experimental results validate our hypothesis, and demonstrate the advantages of our proposed model.

导出引用

Na Lei, Dongsheng An, Yang Guo. 深度学习的几何学解释. Engineering. 2020, 6(3): 361-374 https://doi.org/10.1016/j.eng.2019.09.010

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning; 2017 Aug 6–11; Sydney, Australia; 2017. p. 214–23.
[2]	Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science 2000;290(5500):2319–23.
[3]	van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9(11):2579–605.
[4]	Mescheder L, Geiger A, Nowozin S. Which training methods for GANs do actually converge? In: Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10–15; Stockholmsmässan, Sweden; 2018. p. 3478–87.
[5]	Villani C. Optimal transport: old and new. Berlin: Springer Science & Business Media; 2008.
[6]	Gu DX, Luo F, Sun J, Yau ST. Variational principles for Minkowski type problems, discrete optimal transport, and discrete Monge–Ampère equations. Asian J Math 2016;20(2):383–98.
[7]	Peyré G, Cuturi M. Computational optimal transport. Found Trends Mach Learn 2019;11(5–6):355–607.
[8]	Solomon J. Optimal transport on discrete domains. 2018. arXiv:1801.07745.
[9]	Cuturi M. Sinkhorn distances: lightspeed computation of optimal transportation distances. Adv Neural Inf Process Syst 2013;26:2292–300.
[10]	Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, et al. Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans Graph 2015;34(4):66.
[11]	Lei N, Su K, Cui L, Yau ST, Gu XD. A geometric view of optimal transportation and generative model. Comput Aided Geom Des 2019;68:1–21.
[12]	Benamou JD, Brenier Y, Guittet K. The Monge–Kantorovitch mass transfer and its computational fluid mechanics formulation. Int J Numer Methods Fluids 2002;40(1–2):21–30.
[13]	Jean-David Benamou BDF, Oberman AM. Numerical solution of the optimal transportation problem using the Monge–Ampère equation. J Comput Phys 2014;260:107–26.
[14]	Nicolas P, Gabriel P, Oudet E. Optimal transport with proximal splitting. SIAM J Imaging Sci 2014;7(1):212–38.
[15]	Bengio Y, Mesnil G, Dauphin Y, Rifai S. Better mixing via deep representations. In: Proceedings of the 30th International Conference on Machine Learning; 2013 Jun 16–21; Atlanta, GA, USA; 2013. p. 552–60.
[16]	Salakhutdinov R, Larochelle H. Efficient learning of deep Boltzmann machines. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics; 2010 May 13–15; Chia Laguna Resort, Italy; 2010. p. 693–700.
[17]	Kingma DP, Welling M. Auto-encoding variational Bayes. 2013. arXiv:1312.6114.
[18]	Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. 2014. arXiv:1401.4082.
[19]	Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B. Adversarial autoencoders. 2015. arXiv:1511.05644.
[20]	Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B. Wasserstein auto-encoders. 2017. arXiv:1711.01558.
[21]	He X, Yan S, Hu Y, Niyogi P, Zhang HJ. Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 2005;27(3):328–40.
[22]	Arandjelovic´ O. Unfolding a face: from singular to manifold. In: Proceedings of the 9th Asian Conference on Computer Vision; 2009 Sep 23–27; Xi’an, China; 2009. p. 203–13.
[23]	Salimans T, Karpathy A, Chen X, Kingma DP. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. 2017. arXiv:1701.05517.
[24]	Oord Ad, Kalchbrenner N, Kavukcuoglu K. Pixel recurrent neural networks. 2016. arXiv:1601.06759.
[25]	Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, et al. WaveNet: a generative model for raw audio. 2016. arXiv:1609.03499.
[26]	Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. 2014. arXiv:1406.2661.
[27]	Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC. Improved training of wasserstein GANs. 2017. arXiv:1704.00028.
[28]	Miyato T, Kataoka T, Koyama M, Yoshida Y. Spectral normalization for generative adversarial networks. 2018. arXiv:1802.05957.
[29]	Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration. In: Proceedings of the 2011 International Conference on Computer Vision; 2011 Jun 6–11; Barcelona, Spain; 2011. p. 479–86.
[30]	Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs. 2016. arXiv:1606.03498.
[31]	Heusel M, Ramsauer H, Unterthiner T, Nessler B, Klambauer G, Hochreiter S. GANs trained by a two time-scale update rule converge to a Nash equilibrium. 2017. arXiv:1706.08500.
[32]	Sajjadi MS, Bachem O, Lucic M, Bousquet O, Gelly S. Assessing generative models via precision and recall. 2018. arXiv:1806.00035.
[33]	Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O. Are GANs created equal? A large-scale study. 2018. arXiv:1711.10337.
[34]	Bojanowski P, Joulin A, Lopez-Paz D, Szlam A. Optimizing the latent space of generative networks. 2017. arXiv:1707.05776.
[35]	Li K, Malik J. Implicit maximum likelihood estimation. 2018. arXiv:1809.09087.
[36]	Hoshen Y, Malik J. Non-adversarial image synthesis with generative latent nearest neighbors. 2018. arXiv:1812.08985.
[37]	Dinh L, Krueger D, Bengio Y. NICE: non-linear independent components estimation. 2014. arXiv:1410.8516.
[38]	Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using real NVP. 2017. arXiv:1605.08803.
[39]	Kingma DP, Dhariwal P. Glow: generative flow with invertible 1 1 convolutions. 2018. arXiv:1807.03039.
[40]	LeCun Y, Chopra S, Hadsell R, Ranzota MA, Huang FJ. A tutorial on energybased learning. In: Bakir G, Hofman T, Schölkopf T, Smola A, Taskar B, editors. Predicting structured data. Cambridge: The MIT Press; 2006.
[41]	Dai J, Lu Y, Wu Y. Generative modeling of convolutional neural networks. In: Proceedings of the 3rd International Conference on Learning Representations; 2015 May 7–9; San Diego, CA, USA; 2015.
[42]	Nijkamp E, Hill M, Zhu S, Wu Y. On learning non-convergent nonpersistent short-run MCMC toward energy-based model. 2019. arXiv:1904. 09770.
[43]	Bonnotte N. From Knothe’s rearrangement to Brenier’s optimal transport map. SIAM J Math Anal 2013;45(1):64–87.
[44]	Brenier Y. Polar factorization and monotone rearrangement of vector-valued functions. Commun Pure Appl Math 1991;44(4):375–417.
[45]	Caffarelli L. Some regularity properties of solutions of Monge–Ampère equation. Commun Pure Appl Math 1991;44(8–9):965–9.
[46]	Alexandrov AD. Convex polyhedra. New York: Springer; 2005.
[47]	Guo X, Hong J, Lin T, Yang N. Relaxed wasserstein with applications to GANs. 2017. arXiv:1705.07164.
[48]	Lei N, Guo Y, An D, Qi X, Luo Z, Gu X, et al. Mode collapse and regularity of optimal transportation maps. 2019. arXiv:1902.02934.
[49]	Kingma DP, Ba J. Adam: a method for stochastic optimization. 2014. arXiv:1412.6980.
[50]	Srivastava A, Valkov L, Russell C, Gutmann MU, Sutton C. VeeGAN: reducing mode collapse in GANs using implicit variational learning. 2017. arXiv:1705.17761.
[51]	Lin Z, Khetan A, Fanti G, Oh S. PacGAN: the power of two samples in generative adversarial networks. 2017. arXiv:1712.04086.
[52]	Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, et al. Adversarially learned inference. 2016. arXiv:1606.00704.
[53]	LeCun Y, Cortes C, Burges CJC. The MNIST database of handwritten digits. Available from: http://yann.lecun.com/exdb/mnist/.
[54]	Xiao H, Rasul F, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017. arXiv:1708.07747.
[55]	Krizhevsky A. Learning multiple layers of features from tiny images. Technical report. Toronto: University of Toronto; 2009.
[56]	Zhang Z, Luo P, Loy CC, Tang X. From facial expression recognition to interpersonal relation prediction. Int J Comput Vis 2018;126(5):550–69.

PDF(4885 KB)

Accesses

Citation

Detail

段落导航

Received	Published
02 Mar 2019	24 Jan 2020
Issue Date
14 Jun 2024

期刊首页

在线期刊

优先出版

当期目录

过刊浏览

专题出版

作者中心

作者指南

征稿启事

出版政策

版权协议

出版道德

模板下载

关于期刊

出版范围

期刊简介

编委会

青年通讯专家

收录与重大支持

联系我们

English

摘要

Abstract

关键词

Keywords

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献