面向数据权利、数据定价和隐私计算的数据驱动学习

Jimin Xu; Nuanxin Hong; Zhening Xu; Zhou Zhao; Chao Wu; Kun Kuang; Jiaping Wang; Mingjie Zhu; Jingren Zhou; Kui Ren; Xiaohu Yang; Cewu Lu; Jian Pei; Harry Shum

doi:10.1016/j.eng.2022.12.008

PDF(1357 KB)

工程（英文） ›› 2023, Vol. 25 ›› Issue (6) : 66-76. DOI: 10.1016/j.eng.2022.12.008

研究论文

Review

面向数据权利、数据定价和隐私计算的数据驱动学习

Jimin Xu ^a ,
Nuanxin Hong ^a ,
Zhening Xu ^a ,
Zhou Zhao ^a ,
Chao Wu ^a ,
Kun Kuang ^a ,
Jiaping Wang ^b ,
Mingjie Zhu ^c ,
Jingren Zhou ^d ,
Kui Ren ^a ,
Xiaohu Yang ^a ,
Cewu Lu ^e ,
Jian Pei ^f ,
Harry Shum ^b

作者信息 +

Data-Driven Learning for Data Rights, Data Pricing, and Privacy Computing

Jimin Xu ^a ,
Nuanxin Hong ^a ,
Zhening Xu ^a ,
Zhou Zhao ^a ,
Chao Wu ^a ,
Kun Kuang ^a ,
Jiaping Wang ^b ,
Mingjie Zhu ^c ,
Jingren Zhou ^d ,
Kui Ren ^a ,
Xiaohu Yang ^a ,
Cewu Lu ^e ,
Jian Pei ^f ,
Harry Shum ^b

Author information +

History +

摘要

近年来，数据已成为数字经济中最重要的生产要素之一。与传统生产要素不同，数据的数字化性质使其难以合同和交易。因此，建立一个高效和标准的数据交易市场体系将有利于降低成本，提高行业各方的生产力。尽管许多研究致力于数据法规和其他数据交易问题，如隐私和定价，但很少有工作对机器学习和数据科学领域的这些研究进行全面回顾。为了提供对这个主题的完整和最新的理解，本文涵盖了数据交易过程中的三个关键问题：数据权利、数据定价和隐私计算。通过厘清这些主题之间的关系，本文提供了一个数据生态系统的全貌，其中数据由个人、研究机构和政府等数据主体生成，而数据处理者出于创新或运营目的获取数据，并通过适当的定价机制根据数据主体各自的所有权分配收益。为了使人工智能（AI）能够长期有益于人类社会的发展，人工智能算法需要通过数据保护法规（即隐私保护法规）进行评估，以帮助构建日常生活中值得信赖的人工智能系统。

Abstract

In recent years, data has become one of the most important resources in the digital economy. Unlike traditional resources, the digital nature of data makes it difficult to value and contract. Therefore, establishing an efficient and standard data-transaction market system would be beneficial for lowering cost and improving productivity among the parties in this industry. Although numerous studies have been dedicated to the issue of complying with data regulations and other data-transaction issues such as privacy and pricing, little work has been done to provide a comprehensive review of these studies in the fields of machine learning and data science. To provide a complete and up-to-date understanding of this topic, this review covers the three key issues of data transaction: data rights, data pricing, and privacy computing. By connecting these topics, this paper provides a big picture of a data ecosystem in which data is generated by data subjects such as individuals, research agencies, and governments, while data processors acquire data for innovational or operational purposes, and benefits are allocated according to the data's respective ownership via an appropriate price. With the long-term goal of making artificial intelligence (AI) beneficial to human society, AI algorithms will then be assessed by data protection regulations (i.e., privacy protection regulations) to help build trustworthy AI systems for daily life.

导出引用

Jimin Xu, Nuanxin Hong, Zhening Xu. 面向数据权利、数据定价和隐私计算的数据驱动学习. Engineering. 2023, 25(6): 66-76 https://doi.org/10.1016/j.eng.2022.12.008

参考文献

原文顺序 | 文献年度倒序 | 文中引用次数倒序

[1]	Schwaller P, Laino T, Gaudin T, Bolgar P, Hunter CA, Bekas C, et al. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Cent Sci 2019;5(9):1572‒83.
[2]	Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577(7792):706‒10.
[3]	Lu L, Meng X, Mao Z, Karniadakis GE. DeepXDE: a deep learning library for solving differential equations. SIAM Rev 2021;63(1):208‒28.
[4]	Pei J. A survey on data pricing: from economics to data science. IEEE Trans Knowl Data Eng 2020;34(10):4586‒608.
[5]	Cong Z, Luo X, Jian P, Zhu F, Zhang Y. Data pricing in machine learning pipelines. Knowl Inf Syst 2021;64:1417‒55.
[6]	Parkins D. The world’s most valuable resource is no longer oil, but data [Internet]. New York City: The Economist; 2017 May 6 [cited 2022 Dec 27]. Available from: https://www.economist.com/leaders/2017/05/06/the-worldsmost-valuable-resource-is-no-longer-oil-but-data.
[7]	Atkinson RD. IP protection in the data economy: getting the balance right on 13 critical issues. Report. Washington, DC: Information Technology & Innovation Foundation; 2019 Jan 22.
[8]	Klein B, Crawford RG, Alchian AA. Vertical integration, appropriable rents, and the competitive contracting process. J Law Econ 1978;21(2):297‒326.
[9]	Williamson OE. Transaction-cost economics: the governance of contractual relations. J Law Econ 1979;22(2):233‒61.
[10]	Demsetz H. Toward a theory of property rights. Am Econ Rev 1967;57(2):347‒59.
[11]	Balkin JM. The fiduciary model of privacy. Harv Law Rev Forum 2020;134:11‒33.
[12]	Ritter J, Mayer A. Regulating data as property: a new construct for moving forward. Duke Law Technol Rev 2018;16:220‒77.
[13]	Michael K, Kobran S, Abbas R, PrivacyHamdoun S., data rights and cybersecurity: technology for good in the achievement of sustainable development goals. In: Proceedings of 2019 IEEEInternational Symposium on Technology and Society ISTAS); 2019 Nov 15‒16; Medford, MA, USA. New York City: IEEE; 2019. p. 1‒13.
[14]	Voigt P, von dem Bussche A. The EU General Data Protection Regulation (GDPR). Brussels: European Commission; 2017.
[15]	Truong NB, Sun K, Lee GM, Guo Y. GDPR-compliant personal data management: a blockchain-based solution. IEEE Trans Inf Forensics Secur 2020;15:1746‒61.
[16]	Wingerath W, Gessert F, Witt E, Kuhlmann H, Bücklers F, Wollmer B, et al. Speed Kit: a polyglot & GDPR-compliant approach for caching personalized content. In: Proceedings of 2020 IEEE 36th International Conference on Data Engineering (ICDE); 2020 Apr 20‒24; Dallas, TX, USA. New York City: IEEE; 2020. p. 1603‒8.
[17]	Agostinelli S, Maggi FM, Marrella A, Sapio F. Achieving GDPR compliance of BPMN process models. In: Cappiello C, Ruiz M, editors. Information systems engineering in responsible information systems. New York City: Springer; 2019.
[18]	Ginart AA, Guan MY, Valiant G, Zou J. Making AI forget you: data deletion in machine learning. In: Proceedings of 33rd Conference on Neural Information Processing Systems; 2019 Dec 8‒14; Vancouver, BC, Canada; 2019.
[19]	Li Q, Wen Z, Wu Z, Hu S, Wang N, Li Y, et al. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Trans Knowl Data Eng 2023;35(4):3347‒66.
[20]	McMahan HB, Moore E, Ramage D, Hampson S, Arcas BA. Communicationefficient learning of deep networks from decentralized data. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS); 2017 Apr 20‒22; Lauderdale, FL, USA; 2017.
[21]	The Chartered Institute of Marketing (CIM). Data right: best data practice [Internet]. Berkshire: CIM; c2018 [cited 2022 Dec 27]. Available from: https:// www.cim.co.uk/more/data-right/.
[22]	Kerber W. A new (intellectual) property right for non-personal data? An economic analysis. J Eur Int IP Law 2016;11:989‒99.
[23]	Grossman SJ, Hart OD. The costs and benefits of ownership: a theory of vertical and lateral integration. J Polit Econ 1986;94(4):691‒719.
[24]	Yan T, Procaccia AD. If you like Shapley then you’ll love the core. In: Proceedings of the AAAI Conference on Artificial Intelligence; 2021 Feb 2‒9; online. Palo Alto: AAAI Press; 2021. p. 5751‒9.
[25]	Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Query-based data pricing. J ACM 2015;62(5):1‒44.
[26]	Koutris P, Upadhyaya P, Balazinska M, Howe B, Suciu D. Toward practical query pricing with QueryMarket. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data; 2013 Jun 22‒27; New York City, NY, USA. New York City: Association for Computing Machinery; 2013. p. 613‒24.
[27]	Deep S, Koutris P. QIRANA: a framework for scalable query pricing. In: Proceedings of the 2017 ACM International Conference on Management of Data; 2017 May 14‒19; Chicago, IL, USA. New York City: Association for Computing Machinery; 2017. p. 699‒713.
[28]	Cook RD. Detection of influential observation in linear regression. Technometrics 2000;42(1):65‒8.
[29]	Cook RD, Weisberg S. Residuals and influence in regression. New York City: Chapman and Hall; 1982.
[30]	Yoon J, Arik S, Pfister T. Data valuation using reinforcement learning. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jul 13‒18; Vienna, Austria; 2020.
[31]	Shapley LS. A value for n-person games. In: Kuhn HW, Tucker AW, editors. Contributions to the theory of games. Princeton: Princeton University Press; 2016.
[32]	Ghorbani A, Zou J. Data Shapley: equitable valuation of data for machine learning. In: Proceedings of the 36th International Conference on Machine Learning; 2019 Jun 9‒15; Long Beach, CA, USA; 2019.
[33]	Jia R, Dao D, Wang B, Hubis FA, Gurel NM, Li B, et al. Efficient task-specific data valuation for nearest neighbor algorithms. Proc VLDB Endow 2019;12(11): 1610‒23.
[34]	Amirata G, Kim M, Zou J. A distributional framework for data valuation. In: Proceedings of the 37th International Conference on Machine Learning; 2020 Jun 12‒18; Vienna, Austria. 2020. p. 3535‒44.
[35]	Kwon Y, Rivas MA, Zou J. Efficient computation and analysis of distributional Shapley values. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics; 2021 Apr 13‒15; online. 2021. p. 793‒801.
[36]	Xu X, Wu Z, Foo CS, Low BKH. Validation free and replication robust volumebased data valuation. In: Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021); 2021 Dec 7‒10; online. 2021. p. 10837‒48.
[37]	Wu Z, Shu Y, Low BKH. DAVINZ: data valuation using deep neural networks at initialization. In: Proceedings of International Conference on Machine Learning; 2022 Jul 17‒23; Baltimore, MA, USA. 2022. p. 24150‒76.
[38]	Wang J, Zhang Y, Kim TK, Gu Y. Shapley Q-value: a local reward approach to solve global reward games. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence; 2020 Feb 7‒12; New York City, NY, USA. Palo Alto: AAAI Press; 2020. p. 7285‒92.
[39]	Wang J, Wiens J, Lundberg S. Shapley flow: a graph-based approach to interpreting model predictions. In: Proceedings of 23rd International Conference on Artificial Intelligence and Statistics; 2020 Aug 26‒28; online. New York City: Society for Artificial Intelligence and Statistics; 2021. p. 721‒9.
[40]	Ghorbani A, Zou J, Esteva A. Data Shapley valuation for efficient batch active learning. 2021. arXiv:2104.08312.
[41]	Fan Z, Fang H, Zhou Z, Pei J, Friedlander MP, Liu C, et al. Improving fairness for data valuation in federated learning. 2021. arXiv:2109.09046.
[42]	Xu X, Lyu L, Ma X, Miao CL, Foo CS, Low BKH. Gradient driven rewards to guarantee fairness in collaborative machine learning. In: Proceedings of 35th Conference on Neural Information Processing Systems (NeurIPS 2021); 2021 Dec 7‒10; online. 2021. p. 16104‒17.
[43]	Tang S, Ghorbani A, Yamashita R, Rehman S, Dunnmon JA, Zou J, et al. Data valuation for medical imaging using Shapley value and application to a largescale chest X-ray dataset. Sci Rep 2021;11:8366.
[44]	Niu C, Zheng Z, Wu F, Tang SJ, Gao X, Chen G. Unlocking the value of privacy: trading aggregate statistics over private correlated data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2018 Aug 19‒23; London, UK. New York City: Association for Computing Machinery (ACM); 2018. p. 2031‒40.
[45]	Chen L, Koutris P, Kumar A. Towards model-based pricing for machine learning in a data marketplace. In: Proceedings of the 2019 International Conference on Management of Data; 2019 Jun 30‒Jul 5; Amsterdam, the Netherlands. New York City: Association for Computing Machinery (ACM); 2019. p. 1535‒52.
[46]	Liu J, Lou J, Liu J, Xiong L, Pei J, Sun J. Dealer: an end-to-end model marketplace with differential privacy. Pro VLDB Endow 2021;14:957‒69.
[47]	Lin Q, Zhang J, Liu J, Ren K, Lou J, Jun L, et al. Demonstration of Dealer: an endto- end model marketplace with differential privacy. Pro VLDB Endow 2021;14 (12):2747‒50.
[48]	Zheng S, Cao Y, Yoshikawa M. Trading data with personalized differential privacy and partial arbitrage freeness. 2021. arXiv:2105.01651.
[49]	Niu C, Zheng Z, Wu F, Gao X, Chen G. Trading data in good faith: integrating truthfulness and privacy preservation in data markets. In: Proceedings of 2017 IEEE 33rd International Conference on Data Engineering (ICDE); 2017 Apr 19‒22; DiegoSan, CA, USA. New York City: IEEE; 2017. p. 223‒6.
[50]	Zhou Z, Cao X, Liu J, Zhang B, Ren K. Zero knowledge contingent payments for trained neural networks. In: Bertino E, Shulman H, Waidner M, editors. Computer security—ESORICS 2021. New York City: Springer; 2021. p. 628‒48.
[51]	Isaak J, Hanna MJ. User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 2018;51(8):56‒9.
[52]	Dwork C. Differential privacy. In: Bugliesi M, Preneel B, Sassone V, Wegener I, editors. International colloquium on automata, languages, and programming. Berlin: Springer; 2006. p. 1‒12.
[53]	Dwork C, Roth A. The algorithmic foundations of differential privacy. Found Trends Theor Comput Sci 2014;9(3‒4):211‒407.
[54]	Erlingsson Ú, Pihur V, Korolova A. RAPPOR: randomized aggregatable privacypreserving ordinal response. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security; 2014 Nov 3‍‒‍7; Scottsdale, AZ, USA. New York City: Association for Computing Machinery (ACM); 2014. p. 1054‒67.
[55]	Qin Z, Yang Y, Yu T, Khalil I, Xiao X, Ren K. Heavy hitter estimation over setvalued data with local differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security; 2016 Oct 24‒28; Vienna, Austria. New York City: Association for Computing Machinery (ACM); 2016. p. 192‒203.
[56]	Qin Z, Yu T, Yang Y, Khalil I, Xiao X, Ren K. Generating synthetic decentralized social graphs with local differential privacy. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security; 2017 Oct 30‒ Nov 3; Dallas, TX, USA. New York City: Association for Computing Machinery (ACM); 2017. p. 425‒38.
[57]	Yao AC. Protocols for secure computations. In: Proceedings of 23rd Annual Symposium On Foundations Of Computer Science (SFCS 1982); 1982 Nov 3‒5; Chicago, IL, USA. New York City: IEEE; 1982. p. 160‒4.
[58]	Rabin MO. How to exchange secrets with oblivious transfer. 2005. IACR Cryptology ePrint Archive:187.
[59]	Tassa T. Generalized oblivious transfer by secret sharing. Des Codes Cryptogr 2011;58(1):11‒21.
[60]	Konečný J, McMahan HB, Yu FX, Richtárik P, Suresh TA, Bacon D. Federated learning: strategies for improving communication efficiency. 2016. arXiv:1610.05492.
[61]	Liu Y, Kang Y, Xing C, Chen T, Yang Q. A secure federated transfer learning framework. IEEE Intell Syst 2020;35(4):70‒82.
[62]	Kim H, Park J, Bennis M, Kim SL. Blockchained on-device federated learning. IEEE Commun Lett 2020;24(6):1279‒83.
[63]	Smith V, Chiang CK, Sanjabi M, Talwalkar A. Federated multi-task learning. In: Proceedings of 31st Conference on Neural Information Processing Systems (NIPS 2017); 2017 Dec 4‒9; Long Beach, CA, USA. Red Hook: Curran Associates Inc.; 2017. p. 30.
[64]	Cheng K, Fan T, Jin Y, Liu Y, Chen T, Papadopoulos D, et al. Secureboost: a lossless federated learning framework. IEEE Intell Syst 2021;36(6):87‒98.
[65]	Hardy S, Henecka W, Ivey-Law H, Nock R, Patrini G, Smith G, et al. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. 2017. arXiv:1711.10677.
[66]	Zhao S, Zhou L, Wang W, Cai D, Kam TL, Xu Y, et al. Splitnet: divide and co-training. 2020. arXiv:2011.14660.
[67]	Vepakomma P, Gupta O, Swedish T, Raskar R. Split learning for health: distributed deep learning without sharing raw patient data. 2018. arXiv:1812.00564.
[68]	Gentry C. Fully homomorphic encryption using ideal lattices. In: Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing; 2009 May 31‒Jun 2; Bethesda, MD, USA. New York City: Association for Computing Machinery (ACM); 2009. p. 169‒78.
[69]	Shoukry Y, Gatsis K, Alanwar A, Pappas GJ, Seshia SA, Srivastava M, et al. Privacy-aware quadratic optimization using partially homomorphic encryption. In: Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC); 2016 Dec 12‒14; VegasLas, NV, USA. New York City: IEEE; 2016. p. 5053‒8.
[70]	Damgård I, Pastro V, Smart N, Zakarias S. Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini R, Canetti R, editors. Advances in cryptology—CRYPTO 2012. Berlin: Springer; 2012. p. 43‒62.
[71]	Gorbunov S, Vaikuntanathan V, Wichs D. Leveled fully homomorphic signatures from standard lattices. In: Proceedings of the 57th Annual ACM Symposium on Theory of Computing; 2015 Jun 14‒17; Portland, OR, USA. New York City: Association for Computing Machinery (ACM); 2015. p. 469‒77.
[72]	Brakerski Z, Vaikuntanathan V. Efficient fully homomorphic encryption from (standard) LWE. SIAM J Comput 2014;43(2):831‒71.
[73]	López-Alt A, Tromer E, Vaikuntanathan V. On-the-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In: Proceedings of the 44th Annual ACM Symposium on Theory of Computing; 2012 May 19‒22; New York City, NY, USA. New York City: Association for Computing Machinery; 2012. p. 1219‒34.
[74]	Chillotti I, Gama N, Georgieva M, Izabachène M. Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Proceedings of 22nd International Conference on the Theory and Application of cryptology and Information Security; 2016 Dec 4‒8; Hanoi, Vietnam. Berlin: Springer; 2016. p. 3‒33.
[75]	Cheon JH, Kim A, Kim M, Song Y. Homomorphic encryption for arithmetic of approximate numbers. In: Takagi T, Peyrin T, editors. Advances in cryptology—ASIACRYPT 2017. Berlin: Springer; 2017. p. 409‒37.
[76]	Sabt M, Achemlal M, Bouabdallah A. Trusted execution environment: what it is, and what it is not. In: Proceedings of the 2015 BigDataSE/ISPAIEEETrustcom/; 2015 Aug 20‒22; Helsinki, Finland. New York City: IEEE; 2015. p. 57‒64.
[77]	Goldwasser S, Micali S, Rackoff C. The knowledge complexity of interactive proof systems. SIAM J Comput 1989;18:186‒208.
[78]	Bitansky N, Canetti R, Chiesa A, Tromer E. From extractable collision resistance to succinct non-interactive arguments of knowledge, and back again. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference; 2012 Jan 8‒10; Cambridge, MA, USA. New York City: Association for Computing Machinery (ACM); 2012. p. 326‒49.

PDF(1357 KB)

Accesses

Citation

Detail

段落导航

Received	Published
11 Jan 2022	24 Jan 2023
Issue Date
13 Jun 2024

期刊首页

在线期刊

优先出版

当期目录

过刊浏览

专题出版

作者中心

作者指南

征稿启事

出版政策

版权协议

出版道德

模板下载

关于期刊

出版范围

期刊简介

编委会

青年通讯专家

收录与重大支持

联系我们

English

摘要

Abstract

关键词

Keywords

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献