深度学习的几何学解释

Na Lei, Dongsheng An, Yang Guo, Kehua Su, Shixia Liu, Zhongxuan Luo, Shing-Tung Yau, Xianfeng Gu

工程(英文) ›› 2020, Vol. 6 ›› Issue (3) : 361-374.

PDF(4885 KB)
PDF(4885 KB)
工程(英文) ›› 2020, Vol. 6 ›› Issue (3) : 361-374. DOI: 10.1016/j.eng.2019.09.010
研究论文
Article

深度学习的几何学解释

作者信息 +

A Geometric Understanding of Deep Learning

Author information +
History +

摘要

本文从几何角度来理解深度学习,特别是提出了生成对抗网络(GAN)的最优传输(OT)观点。自然数据集具有内在的模式,该模式可被概括为流形分布原理,即同一类高维数据分布于低维流形附近。 GAN主要完成流形学习和概率分布变换两项任务。其中,后者可以用经典的OT方法来实现。从OT的角度来看,生成器用于计算OT映射,而判别器用于计算生成数据分布与真实数据分布之间的Wasserstein距离;两者都可以归结为一个凸优化过程。此外, OT理论揭示了生成器与判别器之间的内在关系是协作的而不是竞争的,并且解释了模式崩溃的根本原因。在此基础上,我们提出了一种新的生成模型,该模型利用自编码器(AE)进行流形学习,并利用OT映射进行概率分布变换。这个AE-OT模型提升了深度学习理论的严谨性和透明性、提高了计算的稳定性和效率,尤其是避免了模式崩溃问题。实验结果验证了我们的假设,并充分展示了我们提出的AE-OT模型的优点。

Abstract

This work introduces an optimal transportation (OT) view of generative adversarial networks (GANs). Natural datasets have intrinsic patterns, which can be summarized as the manifold distribution principle: the distribution of a class of data is close to a low-dimensional manifold. GANs mainly accomplish two tasks: manifold learning and probability distribution transformation. The latter can be carried out using the classical OT method. From the OT perspective, the generator computes the OT map, while the discriminator computes the Wasserstein distance between the generated data distribution and the real data distribution; both can be reduced to a convex geometric optimization process. Furthermore, OT theory discovers the intrinsic collaborative—instead of competitive—relation between the generator and the discriminator, and the fundamental reason for mode collapse. We also propose a novel generative model, which uses an autoencoder (AE) for manifold learning and OT map for probability distribution transformation. This AE–OT model improves the theoretical rigor and transparency, as well as the computational stability and efficiency; in particular, it eliminates the mode collapse. The experimental results validate our hypothesis, and demonstrate the advantages of our proposed model.

关键词

生成,对抗,深度学习,最优传输,模式崩溃 /

Keywords

Generative / Adversarial / Deep learning / Optimal transportation / Mode collapse

引用本文

导出引用
Na Lei, Dongsheng An, Yang Guo. 深度学习的几何学解释. Engineering. 2020, 6(3): 361-374 https://doi.org/10.1016/j.eng.2019.09.010

参考文献

[1]
Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning; 2017 Aug 6–11; Sydney, Australia; 2017. p. 214–23.
[2]
Tenenbaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science 2000;290(5500):2319–23.
[3]
van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9(11):2579–605.
[4]
Mescheder L, Geiger A, Nowozin S. Which training methods for GANs do actually converge? In: Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10–15; Stockholmsmässan, Sweden; 2018. p. 3478–87.
[5]
Villani C. Optimal transport: old and new. Berlin: Springer Science & Business Media; 2008.
[6]
Gu DX, Luo F, Sun J, Yau ST. Variational principles for Minkowski type problems, discrete optimal transport, and discrete Monge–Ampère equations. Asian J Math 2016;20(2):383–98.
[7]
Peyré G, Cuturi M. Computational optimal transport. Found Trends Mach Learn 2019;11(5–6):355–607.
[8]
Solomon J. Optimal transport on discrete domains. 2018. arXiv:1801.07745.
[9]
Cuturi M. Sinkhorn distances: lightspeed computation of optimal transportation distances. Adv Neural Inf Process Syst 2013;26:2292–300.
[10]
Solomon J, de Goes F, Peyré G, Cuturi M, Butscher A, Nguyen A, et al. Convolutional wasserstein distances: efficient optimal transportation on geometric domains. ACM Trans Graph 2015;34(4):66.
[11]
Lei N, Su K, Cui L, Yau ST, Gu XD. A geometric view of optimal transportation and generative model. Comput Aided Geom Des 2019;68:1–21.
[12]
Benamou JD, Brenier Y, Guittet K. The Monge–Kantorovitch mass transfer and its computational fluid mechanics formulation. Int J Numer Methods Fluids 2002;40(1–2):21–30.
[13]
Jean-David Benamou BDF, Oberman AM. Numerical solution of the optimal transportation problem using the Monge–Ampère equation. J Comput Phys 2014;260:107–26.
[14]
Nicolas P, Gabriel P, Oudet E. Optimal transport with proximal splitting. SIAM J Imaging Sci 2014;7(1):212–38.
[15]
Bengio Y, Mesnil G, Dauphin Y, Rifai S. Better mixing via deep representations. In: Proceedings of the 30th International Conference on Machine Learning; 2013 Jun 16–21; Atlanta, GA, USA; 2013. p. 552–60.
[16]
Salakhutdinov R, Larochelle H. Efficient learning of deep Boltzmann machines. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics; 2010 May 13–15; Chia Laguna Resort, Italy; 2010. p. 693–700.
[17]
Kingma DP, Welling M. Auto-encoding variational Bayes. 2013. arXiv:1312.6114.
[18]
Rezende DJ, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models. 2014. arXiv:1401.4082.
[19]
Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B. Adversarial autoencoders. 2015. arXiv:1511.05644.
[20]
Tolstikhin I, Bousquet O, Gelly S, Schoelkopf B. Wasserstein auto-encoders. 2017. arXiv:1711.01558.
[21]
He X, Yan S, Hu Y, Niyogi P, Zhang HJ. Face recognition using laplacianfaces. IEEE Trans Pattern Anal Mach Intell 2005;27(3):328–40.
[22]
Arandjelovic´ O. Unfolding a face: from singular to manifold. In: Proceedings of the 9th Asian Conference on Computer Vision; 2009 Sep 23–27; Xi’an, China; 2009. p. 203–13.
[23]
Salimans T, Karpathy A, Chen X, Kingma DP. PixelCNN++: Improving the PixelCNN with discretized logistic mixture likelihood and other modifications. 2017. arXiv:1701.05517.
[24]
Oord Ad, Kalchbrenner N, Kavukcuoglu K. Pixel recurrent neural networks. 2016. arXiv:1601.06759.
[25]
Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, et al. WaveNet: a generative model for raw audio. 2016. arXiv:1609.03499.
[26]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. 2014. arXiv:1406.2661.
[27]
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC. Improved training of wasserstein GANs. 2017. arXiv:1704.00028.
[28]
Miyato T, Kataoka T, Koyama M, Yoshida Y. Spectral normalization for generative adversarial networks. 2018. arXiv:1802.05957.
[29]
Zoran D, Weiss Y. From learning models of natural image patches to whole image restoration. In: Proceedings of the 2011 International Conference on Computer Vision; 2011 Jun 6–11; Barcelona, Spain; 2011. p. 479–86.
[30]
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X. Improved techniques for training GANs. 2016. arXiv:1606.03498.
[31]
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Klambauer G, Hochreiter S. GANs trained by a two time-scale update rule converge to a Nash equilibrium. 2017. arXiv:1706.08500.
[32]
Sajjadi MS, Bachem O, Lucic M, Bousquet O, Gelly S. Assessing generative models via precision and recall. 2018. arXiv:1806.00035.
[33]
Lucic M, Kurach K, Michalski M, Gelly S, Bousquet O. Are GANs created equal? A large-scale study. 2018. arXiv:1711.10337.
[34]
Bojanowski P, Joulin A, Lopez-Paz D, Szlam A. Optimizing the latent space of generative networks. 2017. arXiv:1707.05776.
[35]
Li K, Malik J. Implicit maximum likelihood estimation. 2018. arXiv:1809.09087.
[36]
Hoshen Y, Malik J. Non-adversarial image synthesis with generative latent nearest neighbors. 2018. arXiv:1812.08985.
[37]
Dinh L, Krueger D, Bengio Y. NICE: non-linear independent components estimation. 2014. arXiv:1410.8516.
[38]
Dinh L, Sohl-Dickstein J, Bengio S. Density estimation using real NVP. 2017. arXiv:1605.08803.
[39]
Kingma DP, Dhariwal P. Glow: generative flow with invertible 1 1 convolutions. 2018. arXiv:1807.03039.
[40]
LeCun Y, Chopra S, Hadsell R, Ranzota MA, Huang FJ. A tutorial on energybased learning. In: Bakir G, Hofman T, Schölkopf T, Smola A, Taskar B, editors. Predicting structured data. Cambridge: The MIT Press; 2006.
[41]
Dai J, Lu Y, Wu Y. Generative modeling of convolutional neural networks. In: Proceedings of the 3rd International Conference on Learning Representations; 2015 May 7–9; San Diego, CA, USA; 2015.
[42]
Nijkamp E, Hill M, Zhu S, Wu Y. On learning non-convergent nonpersistent short-run MCMC toward energy-based model. 2019. arXiv:1904. 09770.
[43]
Bonnotte N. From Knothe’s rearrangement to Brenier’s optimal transport map. SIAM J Math Anal 2013;45(1):64–87.
[44]
Brenier Y. Polar factorization and monotone rearrangement of vector-valued functions. Commun Pure Appl Math 1991;44(4):375–417.
[45]
Caffarelli L. Some regularity properties of solutions of Monge–Ampère equation. Commun Pure Appl Math 1991;44(8–9):965–9.
[46]
Alexandrov AD. Convex polyhedra. New York: Springer; 2005.
[47]
Guo X, Hong J, Lin T, Yang N. Relaxed wasserstein with applications to GANs. 2017. arXiv:1705.07164.
[48]
Lei N, Guo Y, An D, Qi X, Luo Z, Gu X, et al. Mode collapse and regularity of optimal transportation maps. 2019. arXiv:1902.02934.
[49]
Kingma DP, Ba J. Adam: a method for stochastic optimization. 2014. arXiv:1412.6980.
[50]
Srivastava A, Valkov L, Russell C, Gutmann MU, Sutton C. VeeGAN: reducing mode collapse in GANs using implicit variational learning. 2017. arXiv:1705.17761.
[51]
Lin Z, Khetan A, Fanti G, Oh S. PacGAN: the power of two samples in generative adversarial networks. 2017. arXiv:1712.04086.
[52]
Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, et al. Adversarially learned inference. 2016. arXiv:1606.00704.
[53]
LeCun Y, Cortes C, Burges CJC. The MNIST database of handwritten digits. Available from: http://yann.lecun.com/exdb/mnist/.
[54]
Xiao H, Rasul F, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017. arXiv:1708.07747.
[55]
Krizhevsky A. Learning multiple layers of features from tiny images. Technical report. Toronto: University of Toronto; 2009.
[56]
Zhang Z, Luo P, Loy CC, Tang X. From facial expression recognition to interpersonal relation prediction. Int J Comput Vis 2018;126(5):550–69.
PDF(4885 KB)

Accesses

Citation

Detail

段落导航
相关文章

/