
面向金融场景的下一代数据库测试基准研究
Next-Generation Database Benchmark for Financial Scenarios
银行是我国最主要的金融主体,对数据库及数据服务解决方案有着更高的性能与安全要求。随着金融数据应用服务的 快速发展,银行数据库所涉及的数据类型、业务场景更加多样化,用户很难在种类繁多的数据库产品和数据服务解决方案中 做出最优选择。为此,结合金融行业的数据应用发展需求,本文通过采用文献调研和理论分析等方法,全面分析了银行数据 库的应用现状,特别是近几年数据库国产化替代的情况与面临的挑战,系统调研了国内外主要的数据库测试基准,展望了构 建面向金融场景的下一代数据库测试基准的必要性和重要性。研究发现,由于金融场景中业务逻辑更复杂、数据模式更多 样、安全性要求更高等多方面原因,现有数据库测试基准在应对金融场景下的数据库测试时存在多处不足,面临着诸多挑 战,基于此,本文从工作负载、数据模式、度量指标以及技术架构等方面出发,对面向金融场景的下一代数据库测试基准的 构建提出针对性建议和要求。
As the major financial entity in China, banks have high performance and security requirements for databases and data service solutions. With the progression of data application services in banking, the data types and business scenarios become more diverse, and it is difficult for users to make optimal choices among a wide diversity of database products and data service solutions. In combination with the data application demands of the financial industry, this study comprehensively analyzes the current status of applications of databases in banking, particularly the status and challenges of database localization in recent years, by using literature research and theoretical analysis. In addition, we systematically investigate the database benchmarks of China and other countries, and further prospect the necessity and importance of constructing next-generation database benchmarks for financial scenarios. We find that current database benchmarks have many deficiencies and face various challenges in dealing with the database testing in financial scenarios owing to the complex business logic, diverse data patterns, and high security requirements. Therefore, to build a nextgeneration database benchmark that can meet the requirements of financial scenarios, we propose several suggestions to address these challenges, which involve the aspects of workloads, data schemes, metrics, and technical architecture.
financial industry / bank / financial data / database / benchmark
[1] |
林毅夫 , 付才辉 , 任晓猛 . 金融创新如何推动高质量发展: 新结构经济学的视角 [J]. 金融论坛 , 2019 , 24 11 : 3 ‒ 13 .
|
[2] |
中国人民银行 . 2021年末我国金融业机构总资产 381 . 95 万亿元 [EBOL]. 2022-03-15 [ 2022-06-22 ]. http:www.pbc.gov.cngoutongjiaoliu1134561134694507972index.html .
|
[3] |
甲子光年智库 . 中国金融科技系列报告 [ROL]. 2020-08-11 [ 2022-06-06 ]. https:www.jazzyear.comstudy_list.html?classifyName2=金融科技classifyName3=全部classifyName4=全部 .
|
[4] |
中国人民银行 . 中国人民银行印发《金融科技发展规划2022—2025年》 [EBOL]. 2022-01-04 [ 2022-06-06 ]. http:www.pbc.gov.cngoutongjiaoliu1134561134694438627index.html .
|
[5] |
胡利明 . 分布式数据库在金融行业的应用和展望 [J]. 金融科技时代 , 2020 5 : 25 ‒ 33 .
|
[6] |
Poess M , Floyd C . New TPC benchmarks for decision support and web commerce [J]. ACM Special Interest Group on Management of Data Record , 2000 , 29 4 : 64 ‒ 71 .
|
[7] |
Nambiar R O , Poess M . The making of TPC-DS [C]. Seoul : Proceedings of the 32nd International Conference on Very Large Data Bases , 2006 .
|
[8] |
中国信息通信研究院 . 数据库发展研究报告2021年 [R]. 北京 : 中国信息通信研究院 , 2021 .
|
[9] |
ITpub技术栈 . 激荡三十年: 银行数据库的发展与变迁 [EBOL]. 2021-04-02 [ 2022-06-06 ]. https:z.itpub.netarticledetailCE307F44933F633B8EB297FE3CF7379E .
|
[10] |
中国人民银行 . 中国人民银行印发《金融科技FinTech发展规划2019—2021年》 [EBOL]. 2019-08-22 [ 2022-06-06 ]. http:www.pbc.gov.cngoutongjiaoliu113 4561134693878634index.html .
|
[11] |
全国金融标准化技术委员会 . 《分布式数据库技术金融应用规范 技术架构》等3项金融行业标准正式发布 [EBOL]. 2020-12-25 [ 2022-06-06 ]. https:www.cfstc. orgjinbiaowei29294362978097index.html .
|
[12] |
王飞鹏 . 追求卓越 舐砺前行——中信银行GoldenDB分布式数据库转型实践 [J]. 金融电子化 , 2020 2 : 76 ‒ 78 .
|
[13] |
李肇宁 . 分布式数据库金融应用稳步有序推进 [J]. 金融电子化 , 2020 12 : 34 ‒ 35 .
|
[14] |
戴功旺 . 构建"新生态", 探索金融行业分布式数据库发展之路 [J]. 中国金融电脑 , 2021 7 : 85 ‒ 86 .
|
[15] |
Leutenegger S T , Dias D M . A modeling study of the TPC-C benchmark [C]. Washington DC : Proceedings of the 1993 ACM International Conference on Management of Data , 1993 .
|
[16] |
计算机学会数据库专业委会 , 清华大学 , 墨天轮社区 . 数据库系统的分类和测评研究 [EBOL]. 2021-12-22 [ 2022-06-06 ]. https:www.modb.prodoc52857 .
|
[17] |
金澈清 , 钱卫宁 , 周敏奇 , 等 . 数据管理系统评测基准: 从传统数据库到新兴大数据 [J]. 计算机学报 , 2015 , 38 1 : 18 ‒ 34 .
|
[18] |
闫义博 , 朱文强 , 杨仝 , 等 . 大数据系统Benchmark测试综述 [J]. 网络新媒体技术 , 2018 , 7 3 : 6 ‒ 13 .
|
[19] |
Bitton D , DeWitt D J , Turbyfill C . Benchmarking database systems—A systematic approach [R]. Madison : University of Wisconsin-Madison , 1983 .
|
[20] |
Xin R , Mokhtar M . Databricks sets official data warehousing performance record [EBOL]. 2021-11-02 [ 2022-06-06 ]. https:databricks.comblog20211102databricks-sets-official-data-warehousing-performance-record.html .
|
[21] |
Dageville B , Cruanes T . Industry benchmarks and competing with integrity [EBOL]. 2021-11-12 [ 2022-06-06 ]. https:www.snowflake.comblogindustry-bench-marks-and-competing-with-integrity .
|
[22] |
Mokhtar M , Tavakoli-Shiraji A , Xin R , et al . Snowflake claims similar priceperformance to data-bricks, but not so fast! [EBOL]. 2021-11-15 [ 2022-06-06 ]. https:databricks.comblog20211115snowflake-claims-similar-price-performance-to-databricks-but-not-so-fast.html .
|
[23] |
Cao P , Gowda B , Lakshmi S , et al . From BigBench to TPCx-BB: Standardization of a big data benchmark [C]. New Delhi : 8th TPC Technology Conference , 2016 : 24 ‒ 44 .
|
[24] |
Hao Y , Qin X , Chen Y , et al . TS-Benchmark: A benchmark for time series databases [C]. Chania : 37th IEEE International Conference on Data Engineering , 2021 .
|
[25] |
Murphy R C , Wheeler K B , Barrett B W , et al . Introducing the graph 500 [J]. Cray Users Group , 2010 , 19 : 45 ‒ 74 .
|
[26] |
Dreseler M , Boissier M , Rabl T , et al . Quantifying TPC-H choke points and their optimizations [J]. Proceedings of the VLDB Endowment , 2020 , 13 8 : 1206 ‒ 1220 .
|
[27] |
O´Neil P E , O´Neil E J , Chen X , et al . The star schema benchmark and augmented fact table indexing [C]. Lyon : First TPC Technology Conference , 2009 .
|
[28] |
Ghazal A , Rabl T , Hu M , et al . Bigbench: Towards an industry standard benchmark for big data analytics [C]. New York : The 2013 ACM International Conference on Management of Data , 2013 .
|
[29] |
Eichmann P , Zgraggen E , Binnig C , et al . IDEBench: A benchmark for interactive data exploration [C]. Portland : The 2020 ACM International Conference on Management of Data , 2020 .
|
[30] |
Funke F , Kemper A , Krompass S , et al . Metrics for measuring the performance of the mixed workload CH-benCHmark [C]. Seattle : Third TPC Technology Conference , 2011 .
|
[31] |
Cooper B F , Silberstein A , Tam E , et al . Benchmarking cloud serving systems with YCSB [C]. Indianapolis : The 1st ACM Symposium on Cloud Computing , 2010 .
|
[32] |
Patil S , Polte M , Ren K , et al . YCSB++: Benchmarking and performance debugging advanced features in scalable table stores [C]. Cascais : ACM Symposium on Cloud Computing in conjunction with SOSP 2011 , 2011.
|
[33] |
Chintapalli S , Dagit D , Evans B , et al . Benchmarking streaming computation engines: Storm, flink and spark streaming [C]. Chicago : 2016 IEEE International Parallel and Distributed Processing Symposium Workshops , 2016 .
|
[34] |
Angles R , Antal J B , Averbuch A , et al . The LDBC social network benchmark [EBOL]. 2022-06-06 [ 2022-06-16 ]. http:arxiv.orgabs2001.02299 .
|
[35] |
Zhang C , Lu J H , Xu P F , et al . UniBench: A benchmark for multi-model database management systems [C]. Riode Janeiro : 10th TPC Technology Conference , 2018 .
|
[36] |
田稼丰 , 姜春宇 . 基于金融场景的数据库性能评估工具 [J]. 信息通信技术与政策 , 2020 , 46 4 : 85 ‒ 90 .
|
[37] |
Jiang C , Tian J , Ma P . Databench-T: A transactional database benchmark for financial scenarios [C]. Shenyang : 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications , 2021 .
|
[38] |
Liew S P , Takahashi T , Ueno M . PEARL: Data synthesis via private embeddings and adversarial reconstruction learning [EBOL]. 2022-03-08 [ 2022-06-16 ]. https:openreview.netpdf?id=M6M8BEmd6dq .
|
/
〈 |
|
〉 |