《1. Introduction》

1. Introduction

In 1961, Feynman [1] proposed the idea of molecular computing, which was demonstrated by Adleman 33 years later. In 1994, Adleman [2] presented a DNA computing model in which DNA molecules were used as data, and enzymes and biological operations were used as tools in information processing. Since then, the DNA computing model has been developed in several new dimensions including theory, experiments, and applications. Studies in the past decade have shown that the DNA computing model is superior in solving optimal and graphical problems, especially in non-deterministic polynomial-time (NP)-complete problems.

However, the main challenge of this model is that the size of DNA molecules used as data in information processing grows exponentially with an increase of problem size. This phenomenon is called exponential explosion, and is the biggest bottleneck preventing the development of DNA computing. To solve this problem, a parallel type of DNA computing model is proposed in this article, and several methods are used to overcome the biggest bottleneck.

In this section, we briefly introduce the latest developments in DNA computing research, and then outline the major innovative points of this article.

《1.1 Previous results》

1.1 Previous results

In 1994, Adleman [2] first explored the computing feasibility of DNA molecules by presenting a DNA computing model for the Hamiltonian path problem. After that, many studies were designed to show the advantage of the huge parallelism that is inherent in DNA-based computing. In 1995, Lipton [3] proposed a DNA computing model to solve the satisfiability (SAT) problem. In 1997, Ouyang et al. [4] designed a DNA computing model for the maximal clique problem. By using the DNA hairpin formation, Sakamoto et al. [5] solved the SAT problem. Rothemund [6] attempted to use DNA as instruments to implement Turing machine. In addition, DNA computing was applied to manipulate gene expression [7,8]. Winfree [9] proposed a sticker model by DNA self-assembly. In 2002, Adleman’s research group invented a DNA computer to solve a 20-variable 3-SAT problem with a searching ability of 220 [10].

Graph vertex coloring is a way of coloring the vertices of graph G such that no two adjacent vertices share the same color. Although it is a classical NP-hard problem, graph coloring arises naturally in a variety of applications such as timetable schedules [11], register allocation [12,13], and so forth. The analysis of  approximation algorithms for graph coloring started with the work  of Johnson [14]. Subsequently, O(n2/5log8/5n) colors were used to color a 3-colorable graph with n vertices by Blum [15]. Karger et al. [16] provided a randomized polynomial time algorithm that colors a 3-colorable graph on vertices with min {O(∆1/3log1/2∆log n), O(n1/4 log1/2n)} colors. Schiermeyer[17] gave a complicated algorithm for deciding 3-colorability in less than time O(1.415n). Beigel and Eppstein [18] improved the bound by giving a fast algorithm for (3, 2)-constraint satisfaction problem (CSP) for solving the 3-coloring problem in time O (1.3289n).

The major research into graph vertex coloring based on DNA computing is as follows: In 1999, Jonoska et al. [19] proposed the potential application of three-dimensional (3D) structures in the field of DNA computing, and showed that some DNA structures could theoretically be used in DNA computing. In 2003, they constructed 3-arm and 4-arm DNA molecules to solve the graph vertex 3-coloring problem based on the aforementioned studies [20,21]. Liu et al. [22] proposed a DNA algorithm for the graph coloring problem based on surfaces. In 2003, Gao and Xu [23] presented a new DNA computing model for solving the graph vertex coloring problem, in which the technology of enzyme cutting was used to eliminate false solutions. In 2006, using the magnetic beads separation technology, Xu et al. [24] established a DNA computing model to solve the graph vertex coloring problem; this model was validated by a series of experiments.

《1.2 Our results》

1.2 Our results

We now present a novel DNA computing model for the graph vertex coloring problem; although we focus on analyzing k = 3, this method can be generalized to the situation of k > 3.

The basic idea of our model is to reduce the initial solution space by using the optimal method and process the data by a parallel polymerase chain reaction (PCR) operation. First, a given graph is divided into several subgraphs in order to make the bio-operations easier and delete as many false solutions possible. The subgraphs can then be solved in parallel according to the following steps: Determine the order of the vertices, determine the color set of each vertex in the subgraph, encode the DNA sequences, determine the probes, construct the initial solution space, and delete the false solutions (see Sections 3.3–3.7). Finally, the subgraphs are combined into the graph and implemented to delete false solutions gradually (see Section 3.8).

The main points of this article are as follows: ① The exponential explosion problem is solved by dividing subgraphs, reducing the vertex colors without losing solutions, and ordering the vertices in subgraphs; ② the bio-operation times can be greatly reduced by using a parallel PCR technology. Thus, the processing speed is remarkably improved in this model.

A 3-colorable graph with 61 vertices (Fig. 1) is used as an example to demonstrate how to use this computing model to solve the graph vertex coloring problem. In general, for a 3-colorable graph, the computing complexity is 3n. Therefore, the computing capacity of our parallel DNA computing model can reach O(359), when the colors of vertices v1 and vn are given.

《Fig.1》

Fig. 1. A 3-colorable graph G with 61 vertices.

《1.3. Outline of the article》

1.3. Outline of the article

This article is laid out as follows. In Section 2, we explain some notations and definitions of the graph coloring problem and PCR technology. In Section 3, the DNA computing model is illustrated, including algorithm steps, bio-operations, and technologies. In Section 4, a 3-coloring problem of a graph with 61 vertices is then solved by this computing model, and the specific experimental operations are described. Section 5 contains theory analysis, as we carefully examine the complexity of overcoming the exponential explosion phenomenon. In the last section, we conclude the article and point out the next possible research direction.

《2. Notation and definition》

2. Notation and definition

《2.1 The graph coloring problem》

2.1 The graph coloring problem

The work in this paper is always limited to finite, simple, and undirected graphs. In a given graph G, V(G), E(G), dG(v), and ΓG(v)  denote the vertex set, edges set, degree of vertex v, and set of vertices adjacent to v of graph G, respectively, and are denoted using the short forms V, E, dG, and ΓG, respectively. We let V = {v1, v2, ..., vn} be the vertex set of a graph G, and denote the degree of vi as d(vi) (abbreviated as di, i = 1, 2, .. ., n). A walk denoted by W is an alternating sequence of vertices and edges, beginning and ending with a vertex, respectively, where each vertex is incident to both the edge that precedes it and the edge that follows it in the sequence, and where the vertices that precede and follow an edge are the end of that edge. A sequence of vertices is denoted by W, where W = v1...vk (k≥0), beginning with vertex v1 and ending with vertex vk, and where vi and vi+1 are adjacent (i = 1, 2, .. ., k - 1). The length of a walk is the number of edges in it. A path is a walk in which all the vertices are distinct.

The vertex coloring of a graph G is an assignment from its vertex set to a color set such that two ends of each edge are assigned two different  colors.  Namely,  it is a partition of V,  where V(G)=V1∪V2∪...∪Vk, Vi≠∅ (empty set), and Vi∩Vj=∅ ,i=1, 2, …,k.

The minimum number of colors needed for such a coloring is called the chromatic number of G, and is usually denoted by χ(G) . If a graph G can be colored by k(k≥χ(G)) colors, then the color set is denoted by Ck(G). In this article, we limit ourselves to a 3-colorable graph G and we always assume that C3(G) = {r, b, y}, where r, b, and y denote red, blue, and yellow, respectively. In other words, the 3-coloring problem of a given graph G can be solved if we find a map f: V(G) →{r, b, y} with the property ∀uv∈E(G) , f(u) ≠f(v). The 3-coloring problem is a classical NP-complete problem [25]. Any terms and notations not defined here can be found in Ref. [26].

Definition 2.1: Let G be a simple graph with the vertex set V(G) = {v1, v2, .. ., vn}. Let vi1vi2.. .vin be an ordering of V(G), where vij ∈V(G), j = 1, 2, .. ., n, and vij ≠ vil if j≠ l. The number of edges in the ordering vi1vi2.. .vin, denoted by Nedge(vi1vi2.. .vin), is defined as follows:

《2.2 Polymerase chain reaction》

2.2 Polymerase chain reaction

PCR is a technique that amplifies a few copies of DNA across several orders of magnitude, thereby generating millions or more copies of a particular DNA sequence. The three parts of PCR are carried out in the same vial with different temperatures. The three steps in PCR are: the denaturation of the strands, annealing the primer to the template, and the extension of new strands. At the end of a cycle, each DNA template in the vial has been exponentially duplicated. After n cycles, the number of the DNA molecules is y = (1 + x)n, where y is the copy number of DNA molecules, x is the efficiency of amplification, and n is the number of cycles.

《3. DNA computing model》

3. DNA computing model

In this section, we describe the DNA computing model in detail, including algorithm steps, bio-operations, and technologies.

《3.1 The algorithm of the model》

3.1 The algorithm of the model

In this model, limited PCR operations are exploited to delete the false solutions of a graph, and the DNA strands representing the true solutions of a given graph are sequenced to obtain the coloring of the given graph. The algorithm of this model is shown in Fig. 2.

《Fig.2》

Fig. 2. A flowchart of the algorithm.

《3.2. Subgraph division and bridge vertices determination》

3.2. Subgraph division and bridge vertices determination

Subgraph division can improve the computing efficiency in two aspects: On the one hand, the experiment can be operated more easily; on the other hand, most of false solutions can be deleted when constructing the initial solution space. Therefore, the appropriate division of a given graph G is of great significance in our model. Here Gj, V(Gj) ∈V(G), E(Gj) ∈E(G), where j = 1, 2, .. ., m-1, is used to denote the primary subgraph. The method of subgraph division is demonstrated as follows:

Step 1: Determine the size of the subgraphs. The number of vertices in each subgraph should be consistent, ranging from 15 to 20. In addition, the edges in a subgraph should be as numerous as possible. In this way, the experiments can be conveniently performed, and most of the false solutions will be deleted when the initial solution space is constructed for each subgraph. Taking the graph G in Fig. 3 as an example, there are two divisions. In one case, the graph can be divided into two subgraphs G01 with vertex set V'1= {1,2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22}, and G'2 with V'2 ={8, 9, 10, 11, 12, 13, 14, 15, 23, 24, 25, 26, 27, 28, 29, 30 } In another case, the graph can be divided into two subgraphs G'3 with V'3 ={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} and G'4 with V'4 ={16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30}. Obviously, the latter division method is better because the number of edges in subgraphs G'3 and G'4 is greater than in G'1 and G'2 .

《Fig.3》

Fig. 3. An example of primary subgraph division. (a) Graph G; (b) subgraph G'1 ; (c)subgraph G'2 ; (d) subgraph G'3 ; (e) subgraph G'4 .

Step 2: Obtain the first subgraph G'1 with the bridge vertices u1 and u2. The bridge vertices should be adjacent and the sum of the degree of the bridge vertices should be maximal. If more than one pair of bridge vertices satisfy these conditions, we select the one pair (u1 and u2) that result in Nedge(u1 = vi1 vi2 .. . vin = u2) being the greatest (see Section 3.3 for more details). If more than one pair of bridge vertices satisfy this condition, we then choose any pair.

Step 3: Determine the second subgraph G'2 containing bridge vertiex u2 and a new bridge vertex u3, which can be found using the following methods: The bridge vertex should be adjacent to the vertex u2, and the degree is maximal in G'2. If more than one pair of bridge vertices satisfy these conditions, then choose one pair according to Step 2.

Step 4: Perform similar operations as those in Step 3 to obtain the other subgraphs with their bridge vertices.

It should be noted that: ① the size of the vertex set of the last subgraph Gm may not be equal to the others; and ② the bridge vertices in the last subgraph Gm may be u1 and um-1, which is the bridge vertex of Gm-1 if u1 and um-1 are adjacent.

In our DNA computing model, the bridge vertices are very important for the subgraph combination. The principle in choosing bridge vertices is that the sum of the degrees of two bridge vertices in one subgraph should be as large as possible. In this way, the size of the color set of the vertices adjacent to bridge vertices can be reduced (see Section 3.3). Moreover, the size of the initial solution space can also be reduced.

According to the steps above, the four divided subgraphs of graph G in Fig. 1 are shown in Fig. 4; the bridge vertices are 1,  16, 31, 46, and 61, respectively.

《Fig.4》

Fig. 4. Four primary subgraphs of graph G (shown in Fig. 1). (a) Subgraph G1; (b) subgraph G2; (c) subgraph G3; (d) subgraph G4.

《3.3 Determining the order and color set for each vertex in the subgraphs》

3.3 Determining the order and color set for each vertex in the subgraphs

After subgraph division, the bridge vertices in each subgraph are determined. The algorithm for ordering the vertices in a subgraph should satisfy the following conditions: ① The bridge vertices must be considered as two endpoints in the ordering of the vertices subgraphs; and ② the number of edges in ordering (see Definition 2.1) should be as great as possible. Here, we take subgraph G1 as an example. Suppose that the vertex set of subgraph G1 is V(G1) = {v1, v2, ..., vt} with the bridge vertices v1 and vt. Without loss of generality, let the vertex sequence be v1 = vi1vi2.. .vit = vt, which satisfies the following constraint:

here, we give the algorithm for ordering vertices. Let v be a vertex of graph G. We start by defining the set of v-rooted paths, denoted by P(v), then define all the paths starting with the vertex v in G. The set of v-rooted paths P(v) in G can be determined as follows: First, the neighborhood of v is determined; next, the neighborhood of each vertex in Γ(v) that is represented as Γ2(v) without v is obtained. Thus, Γ3(v), Γ4(v) , etc. are obtained by the same method above, which is called a pruning algorithm. Because the number of vertices in each subgraph is less than 20, this algorithm can be implemented easily.

Suppose that G1 with V(G) = {v1, v2, .. ., vt} is a subgraph, where v1 and vt are the bridge vertices. We now give the algorithm for ordering the vertices of the subgraph G1:

Step 1: Determine on the rooted paths P(v1) and P(vt). If the v1-rooted paths P'(v1) contain a Hamiltonian path of subgraph G1, then order the vertices according to the Hamiltonian path. Otherwise, order the vertices according to the following steps.

Step 2: If P(v1) ∪ P(vt) = V(G1), then we consider path P1 in P(v1) and path P2 in P(vt), respectively, where P1 and Pt satisfy the following conditions: ① P1 = v1vi2... .vir  and Pt = visvi(s+1).. .vt; ② max{|E(P1)| + |E(Pt)|}; and ③ V(P1) ∩  V(P2) =∅. 

Step 3: If V'=V(G1)- P(v1) ∪ P(vt)=∅  , then we further find the longest path from the induced subgraph G[V '] and represent it as P'=u1u2...um , If V''=V(G1)-V(P1)-V(Pt)-V'=∅, then the ordering of vertices for the subgraph G1 is v1vi2vi3.. .viru1u2... umvisvi(s+1).. .vt. Otherwise, if V'' =V(G1)—V (P1)— V(Pt)— V'={ui1, ui2, ... ; uiq≠∅ , then the ordering of vertices for the subgraph G1 is v1vi2vi3.. .viru1u2.. .umui1ui2.. .uiqvisvi(s+1).. .vt.

Consider subgraph G'3 with the bridge vertices 1 and 13 as an example (Fig. 3). The path 1–10–11–2 is picked out from P(1), and the matched path 13–14–7–8–15–3–9–4 is also obtained from P(13). Then the path 5–6 is found in the remaining vertices. Finally, V'' ={12}. A vertex sequence of G'3 and the edges incident to the vertices are shown in Fig. 5(a). A similar vertex sequence of G'3 is shown in Fig. 5(b).

《Fig.5》

Fig. 5. The ordering vertices and set of the corresponding 11 edges for subgraph G'3 in Fig. 3(d).

The ordering vertices 1–2–.. .–16 in G1 in Fig. 4 are the best order because P1 = 1–2–.. .–16. P1 = 1–2–.. .–16 is a Hamiltonian path of G1, and 1 and 16 are the bridge vertices in G1. Another ordering of the vertices in G1 is 1–3–5–7–9–11–13–15–2–4–6–8– 10–12–14–16, which has only five edges {5; 7} ,{9; 11}, {13; 15},{15; 2}, and {6; 8}.

Step 4: Relabel the vertices in the subgraphs. Let the vertex sequence be v1v2.. .vt before labeling, and vσ(1)vσ(2)...vσ(t)  after reordering. Obviously, the latter vertex ordering is obtained by the permutation of the vertex subscript:

For example, for the vertex ordering of subgraph G'3 in Fig. 5, the mapping of the reordered vertices can be shown as follows:

In general, the vertices of a subgraph should be relabeled after ordering, and the vertex σ(i) will be relabeled by i, where i = 1, 2, ..., t.

The chromatic number k in this article is 3, and the color set C(3) = {red, yellow, blue} = {r, y, b}. The variables ri, yi, and bi indicate that the color of vi can be chosen as red, yellow, or blue.

Considering subgraph G1, where V(G1) = {v1, v2, .. ., vt}, the vertices v1 and vt are the bridge vertices in G1. In our model, the bridge vertices are always adjacent, so vertices v1 and vt can be colored red (r1) and blue (bt) in advance. (It should be noted that vertex vcan be colored red or yellow, of course, when it is not adjacent with a vertex colored blue. The true solutions with rt or yt can be obtained by color permutation according to the coloring with bt.) The vertices belonging to the set  will be blue or yellow, and those belonging to will be red or yellow. The color sets of the vertices of the four subgraphs in Fig. 4 are shown in Fig. 6.

《Fig.6》

Fig. 6. All possible colors for the vertices of the subgraph when C(1) ={r1}, C(16) ={b16}, C(31) = {r31}, C(46)={b46}, and C(61)={y61} are designated in advance. Color set of each vertex in (a) subgraph G1, (b) subgraph G2, (c) subgraph G3, and (d) subgraph G4.

This method can be generalized to any situation where k > 3. If a graph G can be colored by k(k≥χ(G))  colors, then the color set is denoted by Ck(G), which can be defined as C3(G) = {c1, c2, .. ., ck}. Then the k-coloring problem of a given graph G can be solved if we find a map f: V(G)→C3(G) with the property that "uv 2 E(G), f(u) ≠f(v). The other steps are completely the same as for the situation where k = 3.

This method greatly reduces the complexity of the initial solution space. The relevant formula and the analysis are given in Sections 3.2 and 5.2, respectively.

《3.4. Encoding》

3.4. Encoding

As DNA molecules work through specific hybridization in the computing process, encoding in DNA computing will be constrained by many factors such as the Gibbs free energy of chemistry, melting temperature, similarity of DNA sequences, length of DNA sequences, experimental environment, scale of problems, and so forth. Therefore, encoding is a very complicated and difficult problem in DNA computing, and is considered to be an NP- complete problem by several scholars. Many algorithms of DNA encoding based on various constraints have been formulated thus far. To provide good DNA strands for both specific hybridization and PCR in this article, the DNA strands are designed based on the following constraints:

• All sequences have no occurrence of five or more consecutive identical bases;

• The GC content of all sequences is 40%–60%;

• There are no more than eight of the same bases between any two sequences;

• There are no more than four matched bases in every sequence;

• For every sequence, five consecutive bases  in both the 3' and 5' ends are mismatched with any five bases in other sequences;

• The absolute value of the change in the Gibbs free energy of chemistry (│ΔG│) of the primer dimer formed by its own sequence is no more than 6.0 kcal·mol-1 (1 cal = 4.1868 J); and

• The absolute value of the change in the Gibbs free energy of chemistry (│ΔG│) of the primer dimer is no more than 9.0 kcal·mol-1, and the absolute value of the change in the Gibbs free energy of chemistry (│ΔG│) of the primer dimer formed at the 30 end is no more than 6.0 kcal mol-1.

Based on these constraints, the method described in Ref. [27] was used to design the DNA strands; next, the proper sequence was selected using the software Primer Premier 5.0.

It is easy to prove the following theorem.

Theorem 3.1: Let G be a 3-coloring graph with bridge vertices v1 and vn. According to the possible color set of each vertex, it can be determined that the number of DNA sequences representing vertex colors for any 3-coloring graph with n vertices is

In this article, 129 oligonucleotides (Table 1) were designed to represent the possible colors of the vertices in graph G. These oligonucleotides and their Watson–Crick complementary sequence were denoted by xi and , where i = 1, 2, .. ., 61, respectively.

《Table 1》

Table 1 DNA sequences for each xi.

   

《3.5. Determining probes》

3.5. Determining probes

The advantages of DNA computing are its huge parallelism and its great capacity for information storage. Over decades of research, it has been recognized that the biological activity in artificial DNA molecules is the main feature of the DNA computer. In particular, the gravitation between hydrogen bonds dominates the process. By making use of this feature, most of the false solutions can be eliminated from the initial solution space. This eliminating process cannot be carried out by electronic computers. To implement this idea, the method of designing hybridization probes plays a significant role in deleting false solutions.

In this model, the method of designing probes is described as follows (with chromatic number k = 3 as an example):

Step 1: Determine the color sets of each vertex. Let V(G) = {v1, v2, .. ., vn}. Without loss of generality, let its ordering of vertices be v1v2.. .vn. The set of possible colors for each vertex vi is denoted by Cvi , where i = 1, 2, .. ., n.

Step 2: Determine the set of probes. The probes are used to con stitute the initial solution space. Probes are only set up between two consecutive labeled vertices vi and vi+1, denoted by , where  xi ∈Cvi ,  xi+1 ∈ Cvi+1 ,  and  i = 1,  2,  .. .,  n — 1. If  then xi and xi+1 cannot have the same color, and vice versa. For example, Cv7 ={r7, y7}, Cv8 ={r8; y8}, and e = v7v8 ∈E(G1) exist in subgraph G1 (see Fig. 4), so only two probes  are constructed.

Step 3: Determine the DNA sequence for each probe. The probes  consist of the last half of the DNA sequence of xi and the first half of the DNA sequence of xi+1. In this article, the length of the probe is 20 bases. For example, if y7 =5'-AATACGCACTC ATCACATCG-3' and r8 = 5'-GACCTTACCGTTTAGAGTCG-3', then =5'-CGGTAAGGTCCGATGTGATG-3'.

To explain probe sets clearly, we utilize the graph theory method. This method is very useful for reducing initial solution space and theoretical analysis.

The probe graph of a graph G with n vertices, denoted by B G (abbreviated as B), is defined as follows:

According to the definition of probes and Fig. 6, the 3-coloring probe graphs of four subgraphs (see Fig. 4) are shown as B1, B2, B3, and B4 in Fig. 7. It is easy to identify information on probe numbers and distribution from each probe graph.

《Fig.7》

Fig. 7. Probe graphs for the subgraph shown in Fig. 4. (a) B1; (b) B2; (c) B3; (d) B4.

《3.6. Construction of the initial solution space》

3.6. Construction of the initial solution space

The method presented by Adleman [2] is used to construct the initial solution space of each deduced subgraph. The DNA sequences representing the possible solutions in the initial solution space are amplified in a series of operations:

• First, the 5' -ends of DNA sequences xi are phosphorylated by T4 polynucleotide kinase (PNK);

• Second, the product of phosphorylation and probes are mixed in 10 × T4 PNK buffer for annealing;

• Third, the annealed products are ligated by T4 DNA ligase in 10 × T4 DNA ligase buffer; and

• Finally, the ligated products are amplified by PCR with the primer pair . The amplified DNA strands represent all possible solutions of each subgraph.

in this article, the initial solution space of subgraph Gj is denoted by S(Gj), where j = 1, 2, .. ., m, and m is the number of sub- graphs. Let t = |Gj| and V(G) = {v1, v2, .. ., vt}. Then S(Gj) = {x1x2.. .xt}, where xi∈ Cvj c {r, b, y}, and i = 1, 2, .. ., t. If each xi has l bases, then the length of the DNA sequence in the initial solution space will be t × l.

 It should be noted that if e = vivi+1 ∈E(G) then xi and xi+1 cannot share the same color when constructing the probes. In this way, the false solutions in which adjacent vertices have the same color will be deleted when constructing the initial solution space. Thus, the more edges e = {vi, vi+1} there are, the more false solutions will be deleted in the initial solution space. The details of the theoretical analysis will be shown in Section 5.2.

《3.7. Deleting false solutions》

3.7. Deleting false solutions

In this section, the methods of deleting the false solutions of each subgraph are described. Let Ej1 = {vivi+1 ∈E(G), vivt ∈ E(Gj); i = 1, 2, .. ., t — 1}, and Ej2 = {v1vi∈E(Gj), i = 2, 3, .. ., t}. Then E3j = E(Gj) — Ej1 — Ej2 and j = 1, 2, .. ., m.

In fact, the process of deleting false solutions involves eliminating DNA strands that represent the improper coloring of the vertices incident to the edges in Ej3 from the initial solution space S(Gj). In the process of bio-operation, PCR is used to delete false solutions. To improve the processing speed, the edges in Ej3 are divided into two groups and each group is operated using different procedures.

(1) Forward Path. In our experiment, a false solution is deleted by a so-called parallel PCR operation on the forward paths. Here, we introduce a definition of forward path.

Let G1 be a subgraph with V(G1) = {v1, v2, .. ., vt}, and let v1, v2, .. ., vt be a vertex ordering of G1 according to the method shown in Section 3.3. For , if i<j  , then e = vivj is a forward edge; otherwise, it is a backward edge. If all the edges in a path are forward edges, then the path is a forward path; otherwise, it is a backward path.

Now,  for example, we take a path P= vivjvyvz , 1 < i < j < y < z < t (other paths are treated in the same way). In this path, there are three edges: vivj, vjvy , and vyvz . First, PCR is done with primer pairs:, and  , where xi and xj, xj and xy, and xy and xz should not have the same color at the same time. In this way, the DNA strands are separated into five fragments. Subsequently, these small fragments will be combined gradually to generate DNA strands with t×l base pairs (bp). Thus, at the end of PCR operations, the DNA strands representing the improper coloring of these vertices incident to the three edges will be eliminated from S(Gj). Detailed experimental methods and the theoretical analysis are given in Sections 4.4 and 5.3, respectively.

(2) Single edge. Some single edges exist in Ej3 after deleting all false solutions from forward paths. Let e= vivj, xi ∈ Cvi , and xj ∈ Cvj . PCR is run with primer pairs  ,Here, xi and xj should not have the same color. In this way, the DNA strands representing improper coloring of the vertices incident to the edge are not amplified. That is, the DNA strands representing false solutions are eliminated. When the DNA strands representing improper coloring of the vertices incident to the edges in Ej3 are all deleted, the amplified DNA strands in S(Gj) represent the colorings of subgraph Gj, where j = 1, 2, .. ., m.

《3.8. Subgraph combination and deletion of false solutions》

3.8. Subgraph combination and deletion of false solutions

After obtaining the true solutions of each subgraph, these subgraphs are combined to generate secondary subgraphs. Then the process of deleting false solutions continues. This process involves the following three steps:

Step 1: Combine subgraphs Gj and Gj+1, where j = 1, 2, .. ., m-1. The DNA strands representing the improper colors of the subgraphs will be linked together to generate new DNA strands by the DNA sequence of the bridge vertices. These new DNA strands will be processed as the initial solution space of the secondary graph G[V(Gj) ∪ V(Gj+1)], If ∣V(Gj)∣= tand ∣V(Gj+1)∣= t2 ,then the length of the DNA strands generated is (t1 +t2 -1)×l.

Step 2: Delete the false solutions of secondary graph G[V(Gj )∪V (Gj+1)] according to the method described in Section 3.7. That is, the DNA strands representing the color of the vertices incident to the edge in Ej,j+1 = {uv∈E(G); u∈V (Gj); v∈V (Gj+1)} will be amplified, but the false solutions will be eliminated.

Step 3: Repeat Steps 1 and 2 for the secondary and new combined subgraphs until the original graph is completely reshaped. Thus, the DNA strands representing the true solutions of G will be obtained.

《3.9. Detecting the true solutions》

3.9. Detecting the true solutions

The DNA strands representing the true solutions are sequenced and analyzed by Gene Tools to obtain the proper coloring of the graph.

《4. Implementation of the model》

4. Implementation of the model

To implement this DNA computing model, consider the graph with 61 vertices shown in Fig. 1 as an example. The scale of this problem is the largest to be solved by DNA computing. The protocol of the assay is given in this section. The result illustrates that the computational capability of the DNA computing model proposed here is as high as O(359).

《4.1. Dividing the graph and determining the color set》

4.1. Dividing the graph and determining the color set

According to the methods demonstrated in Section 3.2, graph G (Fig. 1) can be divided into four subgraphs (Fig. 4); the color set of each vertex is shown in Fig. 6.

《4.2. Encoding》

4.2. Encoding

According to the methods in Section 3.5, 129 oligonucleotides (Table 1) were designed to represent the possible colors of all vertices, and 185 probes were constructed according to the probe graph (Fig. 7).

All the oligonucleotides used in this article were synthesized by Sangon Biotech (Shanghai) Co., Ltd.

《4.3. Implementing the initial solution space of each subgraph》

4.3. Implementing the initial solution space of each subgraph

Here, we use the subgraph G1 as an example to show how to construct the initial solution space; the result is shown in Fig. 8(a). The steps and reaction systems were as follows:

Step 1: Phosphorylation. DNA sequences xi, where vi ∈ V(G1), xi∈Cv , and i =1,2...16, were phosphorylated at the 50 end by T4 PNK. In a sterile 0.2 mL amplification tube, the phosphonated reaction components were: 16.5 μL of oligonucleotide (0.5 μL for each xi; vi ∈ V (G1), xi ∈ Cv , i = 1; 2; ... ; 16), 3 μL of T4 PNK, 3 μL of buffer forT4 PNK, 5 μL of adenosine triphosphate (ATP) (10 mmol·L-1), and 2.5 μL of double-distilled water. The total volume of the mixture was 30 μL. The mixture was incubated at 37 °C for an hour.

Step 2: Annealing. In a sterile 0.2 mL amplification tube, the annealing reaction components were: 30 μL of the phosphorylated product, 23 μL of probes (0.5 μL for each probe), 3 μL of buffer for T4 PNK, and 4 μL of double-distiμLed water for a total volume of 60 μL. The mixture was incubated at 94 °C for 5 min and at 50 °C for 10 min, and was then slowly cooled to room temperature.

Step 3: Ligation. In a sterile 0.2 mL amplification tube, the ligation reaction components were: 6 μL of the annealing product, 2 μL of T4 DNA ligase, and 2 μL of buffer for T4 DNA ligase. The total volume of the mixture was 10 μL. The mixture was incubated at 16 °C overnight.

Step 4: PCR amplification. The ligation product was amplified by the primer pairs  . In a sterile 0.2 mL amplification tube, the PCR reaction components were: 1 μL of the template, 2 μL of the primer pairs (1 μL of each), 4 μL of dNTP Mix (2.5 mmol·L-1 for each), 0.5 μL of the Taq polymerase, and 5 μL of the buffer for the Taq polymerase, in a total volume of 30 μL. The reaction conditions were: 94 °C for 5 min; then 94 °C for 30 s, 54 °C for 30 s, and 72 °C for 45 s for 35 cycles; then 72 °C for 10 min.

After PCR, 4% agarose gel electrophoresis was used to display the products, and the 320 bp bands were cut and purified with a Gel Extraction Kit (U-gene) in a final volume of 50 μL. The DNA strands representing the possible solutions of subgraph G1 were denoted by S (G1) , where S(G1) = {x1x2.. .x16}.

These steps were also used to construct the initial solution space of G2, G3, and G4; the results are shown in Fig. 8(b–d). These results were denoted by S(G2), S(G3), and S(G4), respectively, where S(G2) = {x16x17.. .x31} ,  S(G3) = {x31x32.. .x46}, and  S(G4) = {x46x47.. .x61} , with χi∈Cvi and vi∈V(G).

PCR was also used to test the completeness of each initial solution space of each subgraph. Taking S(G1) as an example, a 50-fold dilution of the initial solution space S(G1) was amplified with primer pairs:  , where i =1, 2, ... ,16. These products are shown in Fig. 8(e,f).

《Fig.8》

《4.4. Deleting false solutions of the subgraph》

4.4. Deleting false solutions of the subgraph

In this section, we describe how to delete false solutions of the subgraph. PCR is also used to delete false solutions. According to the different walks, PCR operations were classified into three basic types:

The first type was designed for the walk or circle, and parallel PCR was applied to find the true solutions. We give the steps using a path P = 16–20–22–26–30 of subgraph G2 as an example.

First, the 50-fold dilutions of S(G2) were amplified by different primer pairs:    After PCR, five different DNA segments with different lengths were  obtained. The DNA segments of 100 bp  60 bp ,100 bp , 100 bp  and 40 bp  are shown in Fig. 9(a) and (b), where xi∈Cvi , i = 16, 17, .. ., 31. Lane 5 in Fig. 9(a) shows no product with the primer pair , which means that there is no DNA strand with r22 and y26 in the initial solution space. This is the first PCR.

《Fig.9》

Next, the second PCR was run using different templates and different primer pairs, and the five DNA segments were combined to form three segments. Here Ti (i = 1, 2, 3) is used to represent the different templates. T1 represents the mixtures of PCR products from Lanes 1 and 3 in Fig. 9, T2 represents the mixtures from Lanes 1 and 4 in Fig. 9(b) as the template, and T3 represents the mixtures from Lanes 2 and 5 in Fig. 9(b). With the primer pairs ,  , and  , three segments with 140 bp, 120 bp, and 120 bp, respectively, were acquired. The results are shown in Fig. 9(c).

Subsequently, the third PCR was run, and a 220 bp band was acquired with the primer pair  by using the mixtures of PCR products from Lane 6 in Fig. 9(a) and Lane 1 in Fig. 9(c). The result is shown in Fig. 9(d).

Finally, the mixtures from Lane 1 in Fig. 9(d) and Lane 2 in Fig. 9(c) and the mixtures from Lane 1 in Fig. 9(d) and Lane 3 in Fig. 9(c) were amplified separately by primer pair  , and two different 320 bp bands were acquired (see Fig. 9(e)). Thus, at the end of the PCR operations, two sets consisting of 320 bp DNA sequences were obtained. In these sets, the coloring of vertices 20, 22, 26, and 30 were ascertained as follows: b16r20y22 r26y30 r31 and b16r20y22 r26b30r31. These steps are shown in Fig. 9(f).

The second type was designed for the single edge whose two end vertices colors were undetermined. Here, we take e ={4, 8} in subgraph G1 as an example.

At first, the PCR products of the third cycle of subgraph G1 were amplified by the primer pairs   , and the 320 bp full-length DNA strands were divided into sequences of 80 bp , 100 bp , and 180 bp , respectively, where (χi=Cvi, i=1,2,...,16)  . The results are shown in Fig. 10(a); there is no product for .

《Fig.10》

Next, PCR was run with primer pair  , using the mixtures of PCR products from Lanes 1 and 3 in Fig. 10(a) as a template, and a 160 bp band was acquired (see Fig. 10(b)).

At last, the mixtures of PCR products from Lane 5 in Fig. 10(a) and Lane 1 in Fig. 10(b) were amplified by primer pair , and a 320 bp band was acquired. To test the color of v4 and v8, the 320 bp band was amplified by primer pairs , respectively. No product was obtained. Thus, the colors of vertices 4 and 8 were determined as r4 and y8.

The third type was designed to check the edges whose two end vertices colors had been determined. Here, e={39; 42 }was used to illustrate the operation.

After finishing all PCRs of C = 34–38–42–46–34 and the first PCR of C = 34–38–42–46–34 of subgraph G3, there were a total of six solutions (marked as (1) to (6) in Fig. 11), and the color of v39 and v42 could be determined as b39, y39, r42, and y42. In these cases, only the DNA strands with y39 and y42 are false solutions. To delete the false solutions, these six solutions were amplified by primer pair  separately; only the false solutions would produce 80 bp bands after PCR. The results are shown in Fig. 11(a), and the principle is shown in Fig. 11(b).

《Fig.11》

Fig. 11. PCR results for e ={39; 42}. (a) All lanes correspond to primer pair : Lanes 1 and 2 correspond to solution (1), Lanes 3 and 4 correspond to solution (2), Lanes 5 and 6 correspond to solution (3), Lanes 7 and 8 correspond to solution (4), Lanes 9 and 10 correspond to solution (5), and Lanes 11 and 12 correspond to solution (6); (b) an illustration of deleting false solutions; "/" stands for deleting false solutions.

《4.5. Finding the true solution of graph G》

4.5. Finding the true solution of graph G

To find the solutions of graph G; the solution spaces of the subgraphs must be combined, and then the false solutions must be deleted according to the operations described in Section 4.4.

First, the solution spaces mixtures of subgraphs G1 and G2 were amplified by primer pair . The PCR results were DNA strands with a 620 bp band, which represented all possible three-colorings of G[V1∪V2]. Meanwhile, false solutions might also exist in this space, because there were five edges connecting G1 and G2. After deleting the false solutions, four true solutions of G[V1∪V2] (X1–X4) were obtained (shown below). The results are shown in Fig. 12(a) and (b).

《Fig.12》

Fig. 12. The solutions for G[V1∪V2] . (a) Lanes 1 and 2 correspond to solutions X1 and X2 , respectively; (b) Lanes 1 and 2 correspond to solution X3 , and Lanes 3 and 4 correspond to solution X4 ; (c) Lanes 1 and 2 correspond to solution Y1 , and Lane 3 corresponds to solution Y2 ; (d) Lane 1 corresponds to solution Y3 , and Lane 2 corresponds to solution Y4 ; (e) Lane 1 corresponds to solution Y5 , Lane 2 corresponds to solution Y6 , Lane 3 corresponds to a positive control, and Lanes 4 and 5 correspond non-solution. All   the solutions denoted as Yj, where j = 1; 2; 3; 4; 5; 6, correspond to primer pair

Similarly, the solution spaces mixtures of subgraphs G3 and G4 were amplified by primer pair  to obtain the possible solutions of G[V3∪V4]After deleting the false solutions, six true solutions Y1–Y6 remained (shown below). The results are shown in Fig. 12(c)–(e).

Finally, the solution space of subgraphs G[V1∪V2] and G[V3∪V4] were combined by primer pair . This primer pair only resulted in a band corresponding to 1220 bp. After deleting false solutions, eight true solutions Z1–Z8 were obtained (shown below). The results are shown in Fig. 13.

《Fig.13》

Fig. 13. The solutions for G[V3[V4]. M3 is the DNA molecular weight marker 150 bp ladder. Lane 1 to Lane 8 correspond to solutions Z1 to Z8, respectively.

All eight solutions Z1–Z8 were purified and sequenced by Beijing AuGCT Biology Company. The sequences were analyzed by Gene Tools and improved to be the true solutions of graph G.

《5. Complexity analysis》

5. Complexity analysis

In this section, we analyze the complexity of this DNA computing model in two aspects: the complexity of the initial solution space and the complexity of the experiments. We provide a calculation formula to estimate the amount of initial solution space for typical and for worst cases.

《5.1. Calculation of the walks in a given graph》

5.1. Calculation of the walks in a given graph

To illustrate the complexity of the initial solution space, we first introduce a formula, which can be used to calculate the number of the walks between any two vertices.

Lemma 5.1 [28]: Let G be a graph with n vertices and V(G) = {v1,v2, .. ., vn}. A is the adjacent matrix of G. The number of walks of length l in G, from vi to vj, is the entry in position (i, j) of the matrix Al.

《5.2. Complexity analysis of decreasing the initial solution space》

5.2. Complexity analysis of decreasing the initial solution space

In previous DNA computing models for solving NP-complete problems, a large number of DNA molecules were frequently used to encode and enumerate an infinite amount of data. However, those models rarely considered decreasing false solutions by constructing the initial solution space; rather, they used the enumerating method. In fact, this enumerating idea does not work for large-scale problems for two reasons. First, with an increase of problem scale, the number of DNA molecules becomes too large to obtain. Second, even if the number of DNA molecules can be designed, the experiments will be too difficult to implement due to the larger size and greater number of molecules. Thus, it is necessary to find more efficient methods to decrease the initial solution space when designing a DNA computing model to solve NP problems. In this article, we designed a DNA computing model for the coloring problem by using an optimization idea to construct an initial solution space, rather than by enumerating.

Theorem 5.2: Suppose that G is a graph with |V(G)| = n and G1 is its subgraph, as proposed by Section 3.2, where V(G)={v1, v2, ..., vn}, with the bridge vertices v1 and vt. Let B(G1) be the probe graph of G1. Then the number of DNA sequences representing all possible coloring solutions in the initial solution space to G1 is equal to the number of paths connecting the bridge vertices v1 and vt with length t-1 in B(G1), denoted by Np(v1, vt), which is given by the following formula:

Proof: The vertex xi in 99:9997% represents a coloring of the vertex  vi  in  G1  by  the  definition  of  B(G1).  The  edge {xi, zi+1} ∈ B(G1) represents the probe between xi and zi+1. That is, when the color of vis x, the color of vi+1 can be chosen as z, where x; z ∈{r; y; b}. Consequently, each path from vertex r1 to vertex bt denotes one possible coloring of subgraph G1. Furthermore, all the paths between r1 and bt in B(G1)represent all the possible solutions of the initial solution space.

Considering the structure of the probe graph B(G1), we divide V(B( G1 )) into vt parts, where the vertex with subscript i is the ith part. It is concluded that there must at least exist an edge connect ing the ith part and the (i + 1)th part by the definition of a probe graph. Thus, the length of the paths between r1 and bt in B(G1) is only t-1. And each walk from r1 to bt is just the path between r1 and bt. Thus, the theorem is proved by Lemma 5.1.

Corollary 5.3: The number of DNA strands in initial solution spaces S(G1), S(G2), S(G3), and S(G4) of the subgraphs G1, G2, G3, and G4 shown in Fig. 4 are 89, 81, 412, and 151, respectively. The corresponding percentage of DNA strands deleted in the enumerable initial solution spaces are 99.9998%, 99.9998%, 99.999%, and 99.9997%, respectively.

Proof: Notice that vertices v1 and v16 are the bridge vertices of subgraph G1. Thus, the number of DNA strands in S(G1) is N33(r1,b16), which is the number of paths with length 15 from vertex r1 to b16 in B1 (see Fig. 7) by Theorem 5.2.

The adjacent matrix A(B1) of B1 is

The correspondence between the vertices of A(B1) and B1 is

According to Lemma 5.1, we have A(B1)15 (1; 33)=89, and 89/43046721≈0:000002=0.0002%. The result shows that 99.9998% of the false solutions are deleted when constructing the initial solution space S(G1).

Similarly, we have A(B2)15 (1, 32)=81, and 81/43046721≈0:00000188=0.000188 % for subgraph G2; A(B3)15 (1, 35)=412, and 412/43046721≈0:0000096=0.00096% for subgraph G3; and A(B4)15 (1, 32)=151, and 151/43046721≈0.0000035=0:00035% for subgraph G4, respectively. The results show that 99.9998%, 99.999%, and 99.9997% of the false solutions were deleted when constructing the initial solution spaces S(G2), S(G3), and S(G4), respectively.

《 5.3 Reducing operation complexity by parallel PCR technology》

 5.3 Reducing operation complexity by parallel PCR technology

In our experiment, false solutions are deleted by designed parallel PCR operation on the forward paths (see Section 3.7). In this way, the bio-operation times are greatly reduced. The bio-operation complexity is analyzed in this section.

Let P = vi0vi1.. .vim with │V(P)│ = m+ 1 be a forward path in subgraph G1 with bridge vertices v1 and vt . Then we must consider the relationships between the two bridge vertices and path P = vi0vi1.. .vim with three cases when we design corresponding parallel PCR operation:

Case 1: There are no bridge vertices in V(P). That is, vi0≠v1 and vim≠vt . For this case, we need to add the two bridge vertices so as to analyze the advantages of the parallel PCR operations (see Fig. 14(a)).

Case 2: There is one bridge vertex in V (P). That is, vi0 = v1 but vim≠v, or vi0≠v1 but vim = v. For this case, we need to add the one bridge vertex v1 or vt (see Fig. 14(b)).

Case 3: There are two bridge vertices in V(P). That is, vi0 = vand vim=vt . For this case, we need only to consider the ordering vertices in path P (see Fig. 14(c)).

Since the color of the bridge vertex v1 (or vt ) is fixed in advance, its adjacent vertices in Cases 2 and 3 are taken as the other colors aside from the color of v1 (or vt ). Since there are three cases shown in Fig. 14, we will only discuses Case 1, because the other two cases can be treated as special cases of Case 1. Let P belongs to Case 1, then P'=P-{vi, vi0} or P'=P-{vm, vt} should belongs to Case 2, and P''=P-{vi, vi0}- {vm, vt} will be Case 3.

Next, let us discuss the parallel PCR method. First, we determine the number and sequence of the vertices for the forward path P = vi0vi1...vim using the approach given in Fig. 14. Next, we perform the PCR operations. In practice, we do several times of PCR to fulfill one deleting false solutions operation. We found that the times of PCR operation are only relevant to the number of vertices or edges. Taking the first case above as an example, it is necessary to amplify the strands with the primer pairs <vi, vi0>, <vi0, vil>,... <vi(m-1), vim> and <vim, vt> in the first PCR operation. The second PCR operation is designed to connect the consecutive two segments of the results of the previous PCR operation. The following PCR processes perform similar operations to the second PCR operation. In Fig. 15, we refer to cases with five vertices (a path with two edges) and six vertices (a path with three edges) as examples in order to illustrate the process of parallel PCR, where the edges mean how we do the PCR operation to connect the DNA segments to get a completed DNA sequence.

It is concluded from Fig. 15 that the number of PCR operation is 3 for the case of the path with two edges, and for the case of the path with six edges the number of PCR operation is 4. Thus, we have Theorem 5.4.

Theorem 5.4: Suppose that P is the forward path of a subgraph with x edges. If x satisfies 2l-1 < x ≤2l, we define,< x > =2l. Let the number of PCR operation be represented as PCR (x). Then, it can be concluded that

 

Proof: First, we consider the case x + 2 = 2l. Suppose that the number of edges in path P is x. That is, the number of vertices in path P is x + 1. We already analyze the three cases between the two bridge vertices in a forward path in Fig. 14, and Cases 2 and 3 are the special cased of Case 1. So without loss of generality,  we focus our attention on Case 1.

《Fig.14》

Fig. 14. Three cases between the two bridge vertices and a forward path. (a) vi0 ≠v1  and vim ≠ vt ; (b) vi0 =v1  but vim ≠ vt , or vi0 ≠ v1  but vim = vt ; (c) vi0 = v1 and vim = vt .  

Since there are no bridge vertices in path P, it is necessary to increase two bridge vertices according to the requirements of the PCR operations, such that the number of vertices increases to x + 3. Then, we obtain the following equation: X+3 =2l +1

《Fig.15》

Fig. 15. An illustration of parallel PCR technology.

As a result, the number of primer pairs of the first PCR can be concluded as shown below:

   

Following the above operations, the primer pair number in the second PCR is 2l-1 . In general, the primer number in the lth PCR is concluded to be:

Thus, the PCR can be accomplished in the (l + 1)th time. In other words, the number of PCR operation required are l + 1 when x + 2 = 2l. Then, we obtain the following mathematical relation:

Similarly, if x + 2 = 2l+1, then the number of PCR operation satisfy the formula as follows:

Here, we demonstrate the case x + 2 = 2l + 1:

As shown above, the number of vertices in path P is x + 3 by increasing two bridge vertices. Thus, when x + 2 = 2l + 1, there are 2l + 2 vertices. The number of primer pairs in the first PCR is 2l + 1, as induced by the following formula: (x + 3) 1 = x + 2 = 2l + 1.

The number of primer pairs in the second PCR is then 2l-1 + 1. Similarly, the number of primer pairs in the (l + 1)th PCR is 2. The (l + 2)th PCR can be completed.

On the other hand, we can obtain the number of the PCR operation as follows:

Thus, the conclusion above is verified when x + 2 = 2l + 1.

For 2 ≤y < 2l , the number of PCR operation whenx + 2 = 2l + y is more than that of the case x + 2 = 2l+1 . The number of PCR operation for these two cases above is l+2. Thus, the conclusion holds when 2 ≤ y < 2l and x + 2 = 2l +y. Consequently, Theorem 5.4 is demonstrated. 

Theorem 5.4 fully shows the relations between the length x of forward path P and the number of the PCR operation. Of course, the number can be reduced by parallel PCR operations, which will greatly accelerate the computation speed. We can now see the advantages of parallel PCR operations, as shown in Table 2 and Fig. 16.

《Table 2》

Table 2 A comparison between parallel PCR operations and general PCR operations for a single edge

x is the number of edges, and 3x and PCR(x) denote the number of operation of general PCR and parallel PCR for a single edge, respectively.

《Fig.16》

Fig. 16. A comparison between parallel PCR and general PCR operations for a single edge.

Corollary 5.5: Let P be the forward path in a graph G1 with x edges. It can be induced by using Theorem 5.4 that

Proof: Let x +2=2l + y, where 1≤ y ≤2l . Then, we have

In addition, since y is less than 2l, then it is induced that 2l+1 >y + 2l. Consequently, the following consecutive inequalities hold: 

That is, x satisfies the relation below:

Then, we obtain

On the basis of L’Hospital’s rule, we obtain

Therefore, the formula is true.

《6. Conclusions》

6. Conclusions

Since shortly after the discovery of the double helix structure of DNA, the idea of using DNA molecules to process information has been a dream, one that originated from Feynman’s concept of constructing submicroscopic computers [1]. This dream was ultimately fulfilled by Adleman in 1994, when he established a DNA computing model for a Hamiltonian graph with seven vertices. These works greatly contributed to the development of DNA computing.

The success of our DNA computing model not only demonstrates its huge computational capability, but also provides a seemingly feasible method to solve the exponential explosion problem, which may be the biggest bottleneck preventing DNA computing from solving larger-scale NP problems. In this paper,  a graph vertex coloring problem with a computing complexity of O(359) is solved by using the advantages of the DNA computing model: the construction of an un-enumerating initial solution space, the division and combination of subgraphs, and parallel bio-operation.

By dividing subgraphs, reducing the vertex colors without losing solutions, and ordering the vertices in subgraphs, the exponential explosion problem is effectively avoided. According to Lemma 5.1, the initial solution space of each subgraph is 89, 81, 412, and 151, respectively, all of which are much lower than 316 (the worst complexity). Therefore, the values of DNA sequences in the unenumerating initial solution space are 99.9998%, 99.9998%, 99.999%, and 99.9997% lower than those in an enumerating initial solution space.

In this model, we also improve the computing efficiency by dividing the graph into four subgraphs and simultaneously deleting the false solutions of these subgraphs. After obtaining the solutions of each subgraph, the true solutions are found by combining the subgraphs. It is possible that the ability of this model to solve the graph vertex coloring problem will reach 100 vertices.

In this model, false solutions are deleted according to the path rather than the edge. Thus, all edges in one path are dealt with at the same time. In this way, the number of PCR operation required decreases. The efficiency of this DNA computing model is greatly improved by a significant innovation: the ‘‘parallel deleting method.”

In general, many researchers use PCR for amplification with only two primers and one template, because they think that PCR is not very reliable with many templates. However, in this model, we use PCR to amplify all templates in a test-tube. In order to obtain reliable results, the primer pairs are strictly designed according to the constraints of PCR, and the reaction conditions and the concentration of the components are optimized for each PCR. Only the bands representing the specific amplification are cut and purified for the next operation. In addition, contrast tests are implemented to avoid false positives and false negatives in the experiments. The sequencing shows that the results of the experiments are reliable. Because of its reliability and its ability to reduce the solution space, we assert that this DNA computing model can be used to solve larger-size problems, and that DNA molecular tools can be used to carry out larger-scale NP problems. In future, it may be possible to use DNA computing for certain practical engineering problems.

《Acknowledgements》

Acknowledgements

The authors are grateful for the support from the National Natural Science Foundation of China (61632002, 61379059, and 61572046).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Jin Xu, Xiaoli Qiang, Kai Zhang, Cheng Zhang, and Jing Yang declare that they have no conflict of interest or financial conflicts to disclose.