Multi-Mode Data Organization and File Retrieval Based on a Primer Library in Large-Scale Digital DNA Storage

doi:10.1016/j.eng.2023.10.021

PDF(3446 KB)

Engineering ›› 2025, Vol. 48 ›› Issue (5) : 151-162. DOI: 10.1016/j.eng.2023.10.021

Research

Article

Multi-Mode Data Organization and File Retrieval Based on a Primer Library in Large-Scale Digital DNA Storage

Shu-Fang Zhang^a^,^*(), Yu-Hui Li^a, Rui-Xian Zhang, Bing-Zhi Li^b^,^c^,^*(), Qing Wang^a

Author information +

History +

Abstract

At present, the polymerase chain reaction (PCR) amplification-based file retrieval method is the most commonly used and effective means of DNA file retrieval. The number of orthogonal primers limits the number of files that can be accurately accessed, which in turn affects the density in a single oligo pool of digital DNA storage. In this paper, a multi-mode DNA sequence design method based on PCR file retrieval in a single oligonucleotide pool is proposed for high-capacity DNA data storage. Firstly, by analyzing the maximum number of orthogonal primers at each predicted primer length, it was found that the relationship between primer length and the maximum available primer number does not increase linearly, and the maximum number of orthogonal primers is on the order of 10⁴. Next, this paper analyzes the maximum address space capacity of DNA sequences with different types of primer binding sites for file mapping. In the case where the capacity of the primer library is $\mathbb{R}$ (where $\mathbb{R}$ is even), the number of address spaces that can be mapped by the single-primer DNA sequence design scheme proposed in this paper is four times that of the previous one, and the two-level primer DNA sequence design scheme can reach $\left[\frac{\mathbb{R}}{2} \bullet\left(\frac{\mathbb{R}}{2}-1\right)\right]^{2}$ times. Finally, a multi-mode DNA sequence generation method is designed based on the number of files to be stored in the oligonucleotide pool, in order to meet the requirements of the random retrieval of target files in an oligonucleotide pool with large-scale file numbers. The performance of the primers generated by the orthogonal primer library generator proposed in this paper is verified, and the average Gibbs free energy of the most stable heterodimer formed between the orthogonal primers produced is -1 kcal∙(mol∙L⁻¹)⁻¹ (1 kcal = 4.184 kJ). At the same time, by selectively PCR-amplifying the DNA sequences of the two-level primer binding sites for random access, the target sequence can be accurately read with a minimum of 10³ reads, when the primer binding site sequences at different positions are mutually different. This paper provides a pipeline for orthogonal primer library generation and multi-mode mapping schemes between files and primers, which can help achieve precise random access to files in large-scale DNA oligo pools.

Keywords

DNA storage / File retrieval / Orthogonal primer / PCR-amplifying / DNA sequence design

Cite this article

EndNote

Ris (Procite)

Bibtex

Download citation ▾

Shu-Fang Zhang, Yu-Hui Li, Rui-Xian Zhang, Bing-Zhi Li, Qing Wang. Multi-Mode Data Organization and File Retrieval Based on a Primer Library in Large-Scale Digital DNA Storage. Engineering, 2025, 48(5): 151‒162 https://doi.org/10.1016/j.eng.2023.10.021

References

[1]	US News World Rep (1964 Feb:84-6.)
[2]	A.J. Pinho, D. Pratas, P.J.S.G. Ferreira Bacteria DNA sequence compression using a mixture of finite-context models, IEEE, Nice, France. Piscataway ( 2011), pp. 125-128. doi: 10.1109/SSP.2011.5967637
[3]	N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E.M. LeProust, B. Sipos, et al.. Towards practical, high capacity, low-maintenance information storage in synthesized DNA. Nature, 494 (7435) ( 2013), pp. 77-80. doi: 10.1038/nature11875
[4]	A.K.Y. Yim, A.C.S. Yu, J.W. Li, A.I.C. Wong, J.F.C. Loo, K.M. Chan, et al.. The essential component in DNA-based information storage system: robust error-tolerating module. Front Bioeng Biotechnol, 2 (2) ( 2014), pp. 49-51
[5]	P.A.I. Savitri, M.DT. Adiwijaya, W. Astuti. Digital medical image compression algorithm using adaptive Huffman coding and graph based quantization based on IWT-SVD, IEEE, Bandung, Indonesia. Piscataway ( 2016), pp. 264-269
[6]	S.L. Shipman, J. Nivala, J.D. Macklis, G.M. Church. Molecular recordings by directed CRISPR spacer acquisition. Science, 353(6298):aaf1175 ( 2016)
[7]	S.L. Shipman, J. Nivala, J.D. Macklis, G.M. Church. CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature, 547 (7663) ( 2017), pp. 345-349. doi: 10.1038/nature23017
[8]	F. Farzadfard, T.K. Lu. Genomically encoded analog memory with precise in vivo DNA writing in living cell populations. Science, 346 (6211) ( 2014), p. 1256272
[9]	N. Roquet, A.P. Soleimany, A.C. Ferris, S. Aaronson, T.K. Lu. Synthetic recombinase-based state machines in living cells. Science, 353(6297):aad8559 ( 2016)
[10]	W. Chen, M. Han, J. Zhou, Q. Ge, P. Wang, X. Zhang, et al.. An artificial chromosome for data storage. Natl Sci Rev, 8(5):nwab028 ( 2021)
[11]	L. Song, F. Geng, Z.Y. Gong, X. Chen, J. Tang, C. Gong, et al.. Robust data storage in DNA by de Bruijn graph-based de novo strand assembly. Nat Commun, 13 (1) ( 2022), p. 5361
[12]	S.M.H. Tabatabaei Yazdi, Y. Yuan, J. Ma, H. Zhao, O. Milenkovic.A rewritable, random-access DNA-based storage system. Sci Rep, 5 (1) ( 2015), p. 14138
[13]	L. Organick, S.D. Ang, Y.J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, et al.. Random access in large-scale DNA data storage. Nat Biotechnol, 36 (3) ( 2018), pp. 242-248. doi: 10.1038/nbt.4079
[14]	K.J. Tomek, K. Volkel, A. Simpson, A.G. Hass, E.W. Indermaur, J.M. Tuck, et al.. Driving the scalability of DNA-based information storage systems. ACS Synth Biol, 8 (6) ( 2019), pp. 1241-1248. doi: 10.1021/acssynbio.9b00100
[15]	X. Song, S. Shah, J. Reif. Multidimensional data organization and random access in large-scale DNA storage systems. Theor Comput Sci, 894 ( 2021), pp. 190-202
[16]	S. Newman, A.P. Stephenson, M. Willsey, B.H. Nguyen, C.N. Takahashi, K. Strauss, et al.. High density DNA data storage library via dehydration with digital microfluidic retrieval. Nat Commun, 10 (1) ( 2019), p. 1706
[17]	C. Bee, Y.J. Chen, M. Queen, D. Ward, X. Liu, L. Organick, et al.. Molecular-level similarity search brings computing to DNA data storage. Nat Commun, 12 (1) ( 2021), p. 4764
[18]	J.L. Banal, T.R. Shepherd, J. Berleant, H. Huang, M. Reyes, C.M. Ackerman, et al.. Random access DNA memory using Boolean search in an archival file storage system. Nat Mater, 20 (9) ( 2021), pp. 1272-1280. doi: 10.1038/s41563-021-01021-3
[19]	L. Piantanida, W.L. Hughes. A PCR-free approach to random access in DNA. Nat Mater, 20 (9) ( 2021), pp. 1173-1174. doi: 10.1038/s41563-021-01089-x
[20]	H.F. Löchel, M. Welzel, G. Hattab, A.C. Hauschild, D. Heider.Fractal construction of constrained code words for DNA storage systems. Nucleic Acids Res, 50 (5) ( 2022), p. e30. doi: 10.1093/nar/gkab1209
[21]	B. Cao, X. Li, X. Zhang, B. Wang, Q. Zhang, X. Wei. Designing uncorrelated address constrain for DNA storage by DMVO algorithm. IEEE/ACM Trans Comput Biol Bioinf, 19 (2) ( 2022), pp. 866-877. doi: 10.1109/tcbb.2020.3011582
[22]	Q. Yin, Y. Zheng, B. Wang, Q. Zhang. Design of constraint coding sets for archive DNA storage. IEEE/ACM Trans Comput Biol Bioinf, 19 (6) ( 2022), pp. 3384-3394. doi: 10.1109/tcbb.2021.3127271
[23]	B.H. Nguyen, C.N. Takahashi, G. Gupta, J.A. Smith, R. Rouse, P. Berndt, et al.. Scaling DNA data storage with nanoscale electrode wells. Sci Adv, 7(48):eabi6714 ( 2021)
[24]	C.N. Takahashi, B.H. Nguyen, K. Strauss, L. Ceze.Demonstration of end-to-end automation of DNA data storage. Sci Rep, 9 (1) ( 2019), p. 4998
[25]	L. Organick, B.H. Nguyen, R. McAmis, W.D. Chen, A.X. Kohll, S.D. Ang, et al.. An empirical comparison of preservation methods for synthetic DNA data storage. Small Methods, 5 (5) ( 2021), p. 2001094
[26]	P.L. Antkowiak, J. Koch, P. Rzepka, B. Nguyen, K. Strauss, W.J. Stark, et al.. Anhydrous calcium phosphate crystals stabilize DNA for dry storage. Chem Commun, 58 (19) ( 2022), pp. 3174-3177. doi: 10.1039/d2cc00414c