《1. Introduction》

1. Introduction

In 1961, Feynman [1] proposed the idea of molecular computing, which was demonstrated by Adleman 33 years later. In 1994, Adleman [2] presented a DNA computing model in which DNA molecules were used as data, and enzymes and biological operations were used as tools in information processing. Since then, the DNA computing model has been developed in several new dimensions including theory, experiments, and applications. Studies in the past decade have shown that the DNA computing model is superior in solving optimal and graphical problems, especially in non-deterministic polynomial-time (NP)-complete problems.

However, the main challenge of this model is that the size of DNA molecules used as data in information processing grows exponentially with an increase of problem size. This phenomenon is called exponential explosion, and is the biggest bottleneck preventing the development of DNA computing. To solve this problem, a parallel type of DNA computing model is proposed in this article, and several methods are used to overcome the biggest bottleneck.

In this section, we briefly introduce the latest developments in DNA computing research, and then outline the major innovative points of this article.

《1.1 Previous results》1.1 Previous results

In 1994, Adleman [2] first explored the computing feasibility of DNA molecules by presenting a DNA computing model for the Hamiltonian path problem. After that, many studies were designed to show the advantage of the huge parallelism that is inherent in DNA-based computing. In 1995, Lipton [3] proposed a DNA computing model to solve the satisfiability (SAT) problem. In 1997, Ouyang et al. [4] designed a DNA computing model for the maximal clique problem. By using the DNA hairpin formation, Sakamoto et al. [5] solved the SAT problem. Rothemund [6] attempted to use DNA as instruments to implement Turing machine. In addition, DNA computing was applied to manipulate gene expression [7,8]. Winfree [9] proposed a sticker model by DNA self-assembly. In 2002, Adleman’s research group invented a DNA computer to solve a 20-variable 3-SAT problem with a searching ability of 2^{20} [10].

Graph vertex coloring is a way of coloring the vertices of graph G such that no two adjacent vertices share the same color. Although it is a classical NP-hard problem, graph coloring arises naturally in a variety of applications such as timetable schedules [11], register allocation [12,13], and so forth. The analysis of approximation algorithms for graph coloring started with the work of Johnson [14]. Subsequently, O（n^{2/5}log^{8/5}n） colors were used to color a 3-colorable graph with n vertices by Blum [15]. Karger et al. [16] provided a randomized polynomial time algorithm that colors a 3-colorable graph on vertices with min {O(∆^{1/3}log^{1/2}∆log n), O(n^{1/4} log^{1/2}n)} colors. Schiermeyer[17] gave a complicated algorithm for deciding 3-colorability in less than time O(1.415^{n}). Beigel and Eppstein [18] improved the bound by giving a fast algorithm for (3, 2)-constraint satisfaction problem (CSP) for solving the 3-coloring problem in time O (1.3289^{n}).

The major research into graph vertex coloring based on DNA computing is as follows: In 1999, Jonoska et al. [19] proposed the potential application of three-dimensional (3D) structures in the field of DNA computing, and showed that some DNA structures could theoretically be used in DNA computing. In 2003, they constructed 3-arm and 4-arm DNA molecules to solve the graph vertex 3-coloring problem based on the aforementioned studies [20,21]. Liu et al. [22] proposed a DNA algorithm for the graph coloring problem based on surfaces. In 2003, Gao and Xu [23] presented a new DNA computing model for solving the graph vertex coloring problem, in which the technology of enzyme cutting was used to eliminate false solutions. In 2006, using the magnetic beads separation technology, Xu et al. [24] established a DNA computing model to solve the graph vertex coloring problem; this model was validated by a series of experiments.

《1.2 Our results》1.2 Our results

We now present a novel DNA computing model for the graph vertex coloring problem; although we focus on analyzing k = 3, this method can be generalized to the situation of k > 3.

The basic idea of our model is to reduce the initial solution space by using the optimal method and process the data by a parallel polymerase chain reaction (PCR) operation. First, a given graph is divided into several subgraphs in order to make the bio-operations easier and delete as many false solutions possible. The subgraphs can then be solved in parallel according to the following steps: Determine the order of the vertices, determine the color set of each vertex in the subgraph, encode the DNA sequences, determine the probes, construct the initial solution space, and delete the false solutions (see Sections 3.3–3.7). Finally, the subgraphs are combined into the graph and implemented to delete false solutions gradually (see Section 3.8).

The main points of this article are as follows: ① The exponential explosion problem is solved by dividing subgraphs, reducing the vertex colors without losing solutions, and ordering the vertices in subgraphs; ② the bio-operation times can be greatly reduced by using a parallel PCR technology. Thus, the processing speed is remarkably improved in this model.

A 3-colorable graph with 61 vertices (Fig. 1) is used as an example to demonstrate how to use this computing model to solve the graph vertex coloring problem. In general, for a 3-colorable graph, the computing complexity is 3^{n}. Therefore, the computing capacity of our parallel DNA computing model can reach O(3^{59}), when the colors of vertices v_{1} and v_{n} are given.

《Fig.1》

**Fig. 1. A 3-colorable graph G with 61 vertices.**

1.3. Outline of the article

This article is laid out as follows. In Section 2, we explain some notations and definitions of the graph coloring problem and PCR technology. In Section 3, the DNA computing model is illustrated, including algorithm steps, bio-operations, and technologies. In Section 4, a 3-coloring problem of a graph with 61 vertices is then solved by this computing model, and the specific experimental operations are described. Section 5 contains theory analysis, as we carefully examine the complexity of overcoming the exponential explosion phenomenon. In the last section, we conclude the article and point out the next possible research direction.

《2. Notation and definition》2. Notation and definition

《2.1 The graph coloring problem》2.1 The graph coloring problem

The work in this paper is always limited to finite, simple, and undirected graphs. In a given graph G, V(G), E(G), d_{G}(v), and Γ_{G}(v) denote the vertex set, edges set, degree of vertex v, and set of vertices adjacent to v of graph G, respectively, and are denoted using the short forms V, E, d_{G}, and Γ_{G}, respectively. We let V = {v_{1}, v_{2}, ..., v_{n}} be the vertex set of a graph G, and denote the degree of v_{i} as d(v_{i}) (abbreviated as d_{i}, i = 1, 2, .. ., n). A walk denoted by W is an alternating sequence of vertices and edges, beginning and ending with a vertex, respectively, where each vertex is incident to both the edge that precedes it and the edge that follows it in the sequence, and where the vertices that precede and follow an edge are the end of that edge. A sequence of vertices is denoted by W, where W = v_{1...}v_{k} (k≥0), beginning with vertex v1 and ending with vertex v_{k}, and where v_{i} and v_{i+1} are adjacent (i = 1, 2, .. ., k - 1). The length of a walk is the number of edges in it. A path is a walk in which all the vertices are distinct.

The vertex coloring of a graph G is an assignment from its vertex set to a color set such that two ends of each edge are assigned two different colors. Namely, it is a partition of V, where V(G)=V_{1}∪V_{2}∪...∪V_{k}, V_{i}≠∅ (empty set), and V_{i}∩V_{j}=∅ ,i=1, 2, …,k.

The minimum number of colors needed for such a coloring is called the chromatic number of G, and is usually denoted by χ(G) . If a graph G can be colored by k(k≥χ(G)) colors, then the color set is denoted by C_{k}(G). In this article, we limit ourselves to a 3-colorable graph G and we always assume that C_{3}(G) = {r, b, y}, where r, b, and y denote red, blue, and yellow, respectively. In other words, the 3-coloring problem of a given graph G can be solved if we find a map f: V(G) →{r, b, y} with the property ∀uv∈E(G) , f(u) ≠f(v). The 3-coloring problem is a classical NP-complete problem [25]. Any terms and notations not defined here can be found in Ref. [26].

Definition 2.1: Let G be a simple graph with the vertex set V(G) = {v_{1}, v_{2}, .. ., v_{n}}. Let v_{i1}v_{i2}.. .v_{in} be an ordering of V(G), where v_{ij }∈V(G), j = 1, 2, .. ., n, and v_{ij} ≠ v_{il} if j≠ l. The number of edges in the ordering v_{i1}v_{i2}.. .v_{in}, denoted by N_{edge}(v_{i1}v_{i2}.. .v_{in}), is defined as follows:

《2.2 Polymerase chain reaction》

2.2 Polymerase chain reaction

PCR is a technique that amplifies a few copies of DNA across several orders of magnitude, thereby generating millions or more copies of a particular DNA sequence. The three parts of PCR are carried out in the same vial with different temperatures. The three steps in PCR are: the denaturation of the strands, annealing the primer to the template, and the extension of new strands. At the end of a cycle, each DNA template in the vial has been exponentially duplicated. After n cycles, the number of the DNA molecules is y = (1 + x)^{n}, where y is the copy number of DNA molecules, x is the efficiency of amplification, and n is the number of cycles.

3. DNA computing model

In this section, we describe the DNA computing model in detail, including algorithm steps, bio-operations, and technologies.

《3.1 The algorithm of the model》3.1 The algorithm of the model

In this model, limited PCR operations are exploited to delete the false solutions of a graph, and the DNA strands representing the true solutions of a given graph are sequenced to obtain the coloring of the given graph. The algorithm of this model is shown in Fig. 2.

《Fig.2》

**Fig. 2. A flowchart of the algorithm.**

3.2. Subgraph division and bridge vertices determination

Subgraph division can improve the computing efficiency in two aspects: On the one hand, the experiment can be operated more easily; on the other hand, most of false solutions can be deleted when constructing the initial solution space. Therefore, the appropriate division of a given graph G is of great significance in our model. Here G_{j}, V(G_{j}) ∈V(G), E(G_{j}) ∈E(G), where j = 1, 2, .. ., m-1, is used to denote the primary subgraph. The method of subgraph division is demonstrated as follows:

**Step 1: **Determine the size of the subgraphs. The number of vertices in each subgraph should be consistent, ranging from 15 to 20. In addition, the edges in a subgraph should be as numerous as possible. In this way, the experiments can be conveniently performed, and most of the false solutions will be deleted when the initial solution space is constructed for each subgraph. Taking the graph G in Fig. 3 as an example, there are two divisions. In one case, the graph can be divided into two subgraphs G01 with vertex set V'_{1}= {1,2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22}, and G'_{2} with V'_{2} ={8, 9, 10, 11, 12, 13, 14, 15, 23, 24, 25, 26, 27, 28, 29, 30 } In another case, the graph can be divided into two subgraphs G'_{3} with V'_{3} ={1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15} and G'_{4} with V'_{4} ={16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30}. Obviously, the latter division method is better because the number of edges in subgraphs G'_{3} and G'_{4} is greater than in G'_{1} and G'_{2} .

《Fig.3》

**Fig. 3. An example of primary subgraph division. (a) Graph G; (b) subgraph G ^{'}_{1} ; (c)subgraph G^{'}_{2} ; (d) subgraph G^{'}_{3} ; (e) subgraph G^{'}_{4} .**

**Step 2:** Obtain the first subgraph G'_{1} with the bridge vertices u_{1} and u_{2}. The bridge vertices should be adjacent and the sum of the degree of the bridge vertices should be maximal. If more than one pair of bridge vertices satisfy these conditions, we select the one pair (u_{1} and u_{2}) that result in N_{edge}(u_{1} = v_{i1} v_{i2} .. . v_{in} = u_{2}) being the greatest (see Section 3.3 for more details). If more than one pair of bridge vertices satisfy this condition, we then choose any pair.

**Step 3:** Determine the second subgraph G^{'}_{2} containing bridge vertiex u_{2} and a new bridge vertex u_{3}, which can be found using the following methods: The bridge vertex should be adjacent to the vertex u_{2}, and the degree is maximal in G^{'}_{2}. If more than one pair of bridge vertices satisfy these conditions, then choose one pair according to Step 2.

**Step 4: **Perform similar operations as those in Step 3 to obtain the other subgraphs with their bridge vertices.

It should be noted that: ① the size of the vertex set of the last subgraph G_{m} may not be equal to the others; and ② the bridge vertices in the last subgraph G_{m} may be u_{1} and u_{m-1}, which is the bridge vertex of G_{m-1} if u_{1} and u_{m-1} are adjacent.

In our DNA computing model, the bridge vertices are very important for the subgraph combination. The principle in choosing bridge vertices is that the sum of the degrees of two bridge vertices in one subgraph should be as large as possible. In this way, the size of the color set of the vertices adjacent to bridge vertices can be reduced (see Section 3.3). Moreover, the size of the initial solution space can also be reduced.

According to the steps above, the four divided subgraphs of graph G in Fig. 1 are shown in Fig. 4; the bridge vertices are 1, 16, 31, 46, and 61, respectively.

《Fig.4》

**Fig. 4. Four primary subgraphs of graph G (shown in Fig. 1). (a) Subgraph G _{1}; (b) subgraph G_{2}; (c) subgraph G_{3}; (d) subgraph G_{4}.**

3.3 Determining the order and color set for each vertex in the subgraphs

After subgraph division, the bridge vertices in each subgraph are determined. The algorithm for ordering the vertices in a subgraph should satisfy the following conditions: ① The bridge vertices must be considered as two endpoints in the ordering of the vertices subgraphs; and ② the number of edges in ordering (see Definition 2.1) should be as great as possible. Here, we take subgraph G_{1 }as an example. Suppose that the vertex set of subgraph G_{1} is V(G_{1}) = {v_{1}, v_{2}, ..., v_{t}} with the bridge vertices v_{1} and v_{t}. Without loss of generality, let the vertex sequence be v_{1} = v_{i1}v_{i2}.. .v_{it }= v_{t}, which satisfies the following constraint:

here, we give the algorithm for ordering vertices. Let v be a vertex of graph G. We start by defining the set of v-rooted paths, denoted by P(v), then define all the paths starting with the vertex v in G. The set of v-rooted paths P(v) in G can be determined as follows: First, the neighborhood of v is determined; next, the neighborhood of each vertex in Γ(v) that is represented as Γ^{2}(v) without v is obtained. Thus, Γ^{3}(v), Γ^{4}(v) , etc. are obtained by the same method above, which is called a pruning algorithm. Because the number of vertices in each subgraph is less than 20, this algorithm can be implemented easily.

Suppose that G_{1} with V(G) = {v_{1}, v_{2}, .. ., v_{t}} is a subgraph, where v_{1} and v_{t} are the bridge vertices. We now give the algorithm for ordering the vertices of the subgraph G_{1}:

**Step 1**: Determine on the rooted paths P(v_{1}) and P(v_{t}). If the v_{1}-rooted paths P'(v_{1}) contain a Hamiltonian path of subgraph G_{1}, then order the vertices according to the Hamiltonian path. Otherwise, order the vertices according to the following steps.

**Step 2**: If P(v_{1}) ∪ P(v_{t}) = V(G_{1}), then we consider path P_{1} in P(v_{1}) and path P_{2} in P(v_{t}), respectively, where P_{1} and P_{t} satisfy the following conditions: ① P_{1} = v_{1}v_{i2.}.. .v_{ir} and P_{t} = v_{is}v_{i(s+1)}.. .vt; ② max{|E(P_{1})| + |E(P_{t})|}; and ③ V(P_{1}) ∩ V(P_{2}) =∅.

**Step 3:** If V'=V(G_{1})- P(v_{1}) ∪ P(v_{t})=∅ , then we further find the longest path from the induced subgraph G[V '] and represent it as P'=u_{1}u_{2.}..u_{m} , If V''=V(G_{1})-V(P_{1})-V(P_{t})-V'=∅, then the ordering of vertices for the subgraph G_{1} is v_{1}v_{i2}v_{i3}.. .v_{ir}u_{1}u_{2}... u_{m}v_{is}v_{i(s+1)}.. .v_{t}. Otherwise, if V'' =V(G_{1})—V (P_{1})— V(P_{t})— V'={u_{i1}, u_{i2}, ... ; u_{iq}≠∅ , then the ordering of vertices for the subgraph G_{1} is v_{1}v_{i2}v_{i3}.. .v_{ir}u_{1}u_{2}.. .u_{m}u_{i1}u_{i2}.. .u_{iq}v_{is}v_{i(s+1)}.. .v_{t}.

Consider subgraph G'_{3} with the bridge vertices 1 and 13 as an example (Fig. 3). The path 1–10–11–2 is picked out from P(1), and the matched path 13–14–7–8–15–3–9–4 is also obtained from P(13). Then the path 5–6 is found in the remaining vertices. Finally, V'' ={12}. A vertex sequence of G'_{3} and the edges incident to the vertices are shown in Fig. 5(a). A similar vertex sequence of G'_{3} is shown in Fig. 5(b).

《Fig.5》

**Fig. 5. The ordering vertices and set of the corresponding 11 edges for subgraph G' _{3} in Fig. 3(d).**

The ordering vertices 1–2–.. .–16 in G1 in Fig. 4 are the best order because P_{1} = 1–2–.. .–16. P_{1} = 1–2–.. .–16 is a Hamiltonian path of G_{1}, and 1 and 16 are the bridge vertices in G_{1}. Another ordering of the vertices in G_{1} is 1–3–5–7–9–11–13–15–2–4–6–8– 10–12–14–16, which has only five edges {5; 7} ,{9; 11}, {13; 15},{15; 2}, and {6; 8}.

**Step 4: **Relabel the vertices in the subgraphs. Let the vertex sequence be v_{1}v_{2}.. .v_{t} before labeling, and v_{σ(1)}v_{σ(2)}...v_{σ(t)} after reordering. Obviously, the latter vertex ordering is obtained by the permutation of the vertex subscript:

For example, for the vertex ordering of subgraph G'_{3} in Fig. 5, the mapping of the reordered vertices can be shown as follows:

In general, the vertices of a subgraph should be relabeled after ordering, and the vertex σ(i) will be relabeled by i, where i = 1, 2, ..., t.

The chromatic number k in this article is 3, and the color set C(3) = {red, yellow, blue} = {r, y, b}. The variables r_{i}, y_{i}, and b_{i }indicate that the color of vi can be chosen as red, yellow, or blue.

Considering subgraph G_{1}, where V(G_{1}) = {v_{1}, v_{2}, .. ., v_{t}}, the vertices v_{1} and v_{t} are the bridge vertices in G_{1}. In our model, the bridge vertices are always adjacent, so vertices v_{1} and vt can be colored red (r_{1}) and blue (b_{t}) in advance. (It should be noted that vertex v_{t }can be colored red or yellow, of course, when it is not adjacent with a vertex colored blue. The true solutions with r_{t} or y_{t} can be obtained by color permutation according to the coloring with b_{t}.) The vertices belonging to the set will be blue or yellow, and those belonging to will be red or yellow. The color sets of the vertices of the four subgraphs in Fig. 4 are shown in Fig. 6.

《Fig.6》

**Fig. 6. All possible colors for the vertices of the subgraph when C(1) ={r1}, C(16) ={b _{16}}, C(31) = {r_{31}}, C(46)={b46}, and C(61)={y_{61}} are designated in advance. Color set of each vertex in (a) subgraph G_{1}, (b) subgraph G_{2}, (c) subgraph G_{3}, and (d) subgraph G_{4}**.

This method can be generalized to any situation where k > 3. If a graph G can be colored by k(k≥χ(G)) colors, then the color set is denoted by C_{k}(G), which can be defined as C_{3}(G) = {c_{1}, c_{2}, .. ., c_{k}}. Then the k-coloring problem of a given graph G can be solved if we find a map f: V(G)→C3(G) with the property that "uv 2 E(G), f(u) ≠f(v). The other steps are completely the same as for the situation where k = 3.

This method greatly reduces the complexity of the initial solution space. The relevant formula and the analysis are given in Sections 3.2 and 5.2, respectively.

3.4. Encoding

As DNA molecules work through specific hybridization in the computing process, encoding in DNA computing will be constrained by many factors such as the Gibbs free energy of chemistry, melting temperature, similarity of DNA sequences, length of DNA sequences, experimental environment, scale of problems, and so forth. Therefore, encoding is a very complicated and difficult problem in DNA computing, and is considered to be an NP- complete problem by several scholars. Many algorithms of DNA encoding based on various constraints have been formulated thus far. To provide good DNA strands for both specific hybridization and PCR in this article, the DNA strands are designed based on the following constraints:

• All sequences have no occurrence of five or more consecutive identical bases;

• The GC content of all sequences is 40%–60%;

• There are no more than eight of the same bases between any two sequences;

• There are no more than four matched bases in every sequence;

• For every sequence, five consecutive bases in both the 3' and 5' ends are mismatched with any five bases in other sequences;

• The absolute value of the change in the Gibbs free energy of chemistry （│ΔG│） of the primer dimer formed by its own sequence is no more than 6.0 kcal·mol^{-1} (1 cal = 4.1868 J); and

• The absolute value of the change in the Gibbs free energy of chemistry （│ΔG│） of the primer dimer is no more than 9.0 kcal·mol^{-1}, and the absolute value of the change in the Gibbs free energy of chemistry （│ΔG│） of the primer dimer formed at the 30 end is no more than 6.0 kcal mol^{-1}.

Based on these constraints, the method described in Ref. [27] was used to design the DNA strands; next, the proper sequence was selected using the software Primer Premier 5.0.

It is easy to prove the following theorem.

**Theorem 3.1:** Let G be a 3-coloring graph with bridge vertices v1 and vn. According to the possible color set of each vertex, it can be determined that the number of DNA sequences representing vertex colors for any 3-coloring graph with n vertices is

**Table 1 ****DNA sequences for each x _{i}.**

《3.5. Determining probes》

3.5. Determining probes

The advantages of DNA computing are its huge parallelism and its great capacity for information storage. Over decades of research, it has been recognized that the biological activity in artificial DNA molecules is the main feature of the DNA computer. In particular, the gravitation between hydrogen bonds dominates the process. By making use of this feature, most of the false solutions can be eliminated from the initial solution space. This eliminating process cannot be carried out by electronic computers. To implement this idea, the method of designing hybridization probes plays a significant role in deleting false solutions.

In this model, the method of designing probes is described as follows (with chromatic number k = 3 as an example):

**Step 1: **Determine the color sets of each vertex. Let V(G) = {v_{1}, v_{2}, .. ., v_{n}}. Without loss of generality, let its ordering of vertices be v_{1}v_{2}.. .v_{n}. The set of possible colors for each vertex v_{i} is denoted by C_{vi} , where i = 1, 2, .. ., n.

**Step 2: **Determine the set of probes. The probes are used to con_{i} and v_{i+1}, denoted by , where x_{i }∈Cvi , x_{i+1} ∈ C_{vi+1} , and i = 1, 2, .. ., n — 1. If then xi and xi+1 cannot have the same color, and vice versa. For example, C_{v7} ={r_{7, }y_{7}}, C_{v8} ={r_{8}; y_{8}}, and _{7}v_{8} ∈E（G_{1}） exist in subgraph G_{1} (see Fig. 4), so only two probes are constructed.

**Step 3: **Determine the DNA sequence for each probe. The probes consist of the last half of the DNA sequence of xi and the first half of the DNA sequence of x_{i+1}. In this article, the length of the probe is 20 bases. For example, if y_{7} =5'-AATACGCACTC ATCACATCG-3' and r_{8} = 5'-GACCTTACCGTTTAGAGTCG-3', then =5'-CGGTAAGGTCCGATGTGATG-3'.

To explain probe sets clearly, we utilize the graph theory method. This method is very useful for reducing initial solution space and theoretical analysis.

The probe graph of a graph G with n vertices, denoted by B G (abbreviated as B), is defined as follows:

According to the definition of probes and Fig. 6, the 3-coloring probe graphs of four subgraphs (see Fig. 4) are shown as B_{1}, B_{2}, B_{3}, and B_{4} in Fig. 7. It is easy to identify information on probe numbers and distribution from each probe graph.

《Fig.7》

**Fig. 7. Probe graphs for the subgraph shown in Fig. 4. (a) B _{1}; (b) B_{2}; (c) B_{3}; (d) B_{4}.**

《3.6. Construction of the initial solution space》

3.6. Construction of the initial solution space

The method presented by Adleman [2] is used to construct the initial solution space of each deduced subgraph. The DNA sequences representing the possible solutions in the initial solution space are amplified in a series of operations:

• First, the 5' -ends of DNA sequences xi are phosphorylated by T4 polynucleotide kinase (PNK);

• Second, the product of phosphorylation and probes are mixed in 10 × T4 PNK buffer for annealing;

• Third, the annealed products are ligated by T4 DNA ligase in 10 × T4 DNA ligase buffer; and

• Finally, the ligated products are amplified by PCR with the primer pair . The amplified DNA strands represent all possible solutions of each subgraph.

in this article, the initial solution space of subgraph Gj is denoted by S(G_{j}), where j = 1, 2, .. ., m, and m is the number of sub- graphs. Let t = |G_{j}| and V(G) = {v_{1}, v_{2}, .. ., vt}. Then S(G_{j}) = {x_{1}x_{2}.. .x_{t}}, where x_{i}∈ C_{vj} c {r, b, y}, and i = 1, 2, .. ., t. If each xi has l bases, then the length of the DNA sequence in the initial solution space will be t × l.

It should be noted that if e = v_{i}v_{i+1} ∈E(G) then x_{i }and x_{i+1} cannot share the same color when constructing the probes. In this way, the false solutions in which adjacent vertices have the same color will be deleted when constructing the initial solution space. Thus, the more edges e = {v_{i}, v_{i+1}} there are, the more false solutions will be deleted in the initial solution space. The details of the theoretical analysis will be shown in Section 5.2.

3.7. Deleting false solutions

In this section, the methods of deleting the false solutions of each subgraph are described. Let E_{j}^{1} = {v_{i}v_{i+1} ∈E(G), v_{i}v_{t} ∈ E(G_{j}); i = 1, 2, .. ., t — 1}, and E_{j}^{2} = {v_{1}v_{i}∈E(G_{j}), i = 2, 3, .. ., t}. Then E^{3}_{j }= E(G_{j}) — E_{j}^{1} — E_{j}^{2} and j = 1, 2, .. ., m.

In fact, the process of deleting false solutions involves eliminating DNA strands that represent the improper coloring of the vertices incident to the edges in E_{j}^{3} from the initial solution space S(G_{j}). In the process of bio-operation, PCR is used to delete false solutions. To improve the processing speed, the edges in E_{j}^{3} are divided into two groups and each group is operated using different procedures.

(1) **Forward Path. **In our experiment, a false solution is deleted by a so-called parallel PCR operation on the forward paths. Here, we introduce a definition of forward path.

Let G_{1} be a subgraph with V(G_{1}) = {v_{1}, v_{2,} .. ., v_{t}}, and let v_{1}, v_{2,} .. ., v_{t} be a vertex ordering of G_{1} according to the method shown in Section 3.3. For , if i＜j , then e = v_{i}v_{j} is a forward edge; otherwise, it is a backward edge. If all the edges in a path are forward edges, then the path is a forward path; otherwise, it is a backward path.

Now, for example, we take a path P= v_{i}v_{j}v_{y}v_{z} , _{i}v_{j}, v_{j}v_{y} , and v_{y}v_{z} . First, PCR is done with primer pairs:, and , where x_{i} and x_{j}, x_{j} and x_{y}, and x_{y} and x_{z} should not have the same color at the same time. In this way, the DNA strands are separated into five fragments. Subsequently, these small fragments will be combined gradually to generate DNA strands with t×l base pairs (bp). Thus, at the end of PCR operations, the DNA strands representing the improper coloring of these vertices incident to the three edges will be eliminated from S（G_{j}）. Detailed experimental methods and the theoretical analysis are given in Sections 4.4 and 5.3, respectively.

(2) **Single edge.** Some single edges exist in E_{j}^{3} after deleting all false solutions from forward paths. Let e= v_{i}v_{j}, x_{i }∈ C_{vi} , and x_{j} ∈ C_{vj} . PCR is run with primer pairs ,Here, x_{i} and x_{j} should not have the same color. In this way, the DNA strands representing improper coloring of the vertices incident to the edge are not amplified. That is, the DNA strands representing false solutions are eliminated. When the DNA strands representing improper coloring of the vertices incident to the edges in E_{j}^{3} are all deleted, the amplified DNA strands in S(G_{j}) represent the colorings of subgraph G_{j}, where j = 1, 2, .. ., m.

《3.8. Subgraph combination and deletion of false solutions》

3.8. Subgraph combination and deletion of false solutions

After obtaining the true solutions of each subgraph, these subgraphs are combined to generate secondary subgraphs. Then the process of deleting false solutions continues. This process involves the following three steps:

**Step 1:** Combine subgraphs G_{j} and G_{j+1}, where j = 1, 2, .. ., m-1. The DNA strands representing the improper colors of the subgraphs will be linked together to generate new DNA strands by the DNA sequence of the bridge vertices. These new DNA strands will be processed as the initial solution space of the secondary graph G[V(G_{j}) ∪ V(G_{j+1})], If ∣V(Gj)∣= t_{1 }and ∣V(Gj+1)∣= t_{2} ,then the length of the DNA strands generated is (t_{1} +t_{2} -1)×l.

**Step 2: **Delete the false solutions of secondary graph G[V(G_{j} )∪V (G_{j+1})] according to the method described in Section 3.7. That is, the DNA strands representing the color of the vertices incident to the edge in E_{j},_{j+1} = {uv∈E(G); u∈V (G_{j}); v∈V (G_{j+1})} will be amplified, but the false solutions will be eliminated.

**Step 3:** Repeat Steps 1 and 2 for the secondary and new combined subgraphs until the original graph is completely reshaped. Thus, the DNA strands representing the true solutions of G will be obtained.

《3.9. Detecting the true solutions》

3.9. Detecting the true solutions

The DNA strands representing the true solutions are sequenced and analyzed by Gene Tools to obtain the proper coloring of the graph.

4. Implementation of the model

To implement this DNA computing model, consider the graph with 61 vertices shown in Fig. 1 as an example. The scale of this problem is the largest to be solved by DNA computing. The protocol of the assay is given in this section. The result illustrates that the computational capability of the DNA computing model proposed here is as high as O(3^{59}).

4.1. Dividing the graph and determining the color set

According to the methods demonstrated in Section 3.2, graph G (Fig. 1) can be divided into four subgraphs (Fig. 4); the color set of each vertex is shown in Fig. 6.

《4.2. Encoding》4.2. Encoding

According to the methods in Section 3.5, 129 oligonucleotides (Table 1) were designed to represent the possible colors of all vertices, and 185 probes were constructed according to the probe graph (Fig. 7).

All the oligonucleotides used in this article were synthesized by Sangon Biotech (Shanghai) Co., Ltd.

《4.3. Implementing the initial solution space of each subgraph》4.3. Implementing the initial solution space of each subgraph

Here, we use the subgraph G_{1} as an example to show how to construct the initial solution space; the result is shown in Fig. 8(a). The steps and reaction systems were as follows:

**Step 1: Phosphorylation.** DNA sequences xi, where vi ∈ V(G_{1}), x_{i}∈Cv , and i =1,2...16, were phosphorylated at the 50 end by T4 PNK. In a sterile 0.2 mL amplification tube, the phosphonated reaction components were: 16.5 μL of oligonucleotide (0.5 μL for each x_{i}; v_{i} ∈ V （G_{1}）, xi ∈ Cv , i = 1; 2; ... ; 16), 3 μL of T4 PNK, 3 μL of buffer forT4 PNK, 5 μL of adenosine triphosphate (ATP) (10 mmol·L^{-1}), and 2.5 μL of double-distilled water. The total volume of the mixture was 30 μL. The mixture was incubated at 37 °C for an hour.

**Step 2: Annealing. **In a sterile 0.2 mL amplification tube, the annealing reaction components were: 30 μL of the phosphorylated product, 23 μL of probes (0.5 μL for each probe), 3 μL of buffer for T4 PNK, and 4 μL of double-distiμLed water for a total volume of 60 μL. The mixture was incubated at 94 °C for 5 min and at 50 °C for 10 min, and was then slowly cooled to room temperature.

**Step 3: Ligation. **In a sterile 0.2 mL amplification tube, the ligation reaction components were: 6 μL of the annealing product, 2 μL of T4 DNA ligase, and 2 μL of buffer for T4 DNA ligase. The total volume of the mixture was 10 μL. The mixture was incubated at 16 °C overnight.

**Step 4: PCR amplification.** The ligation product was amplified by the primer pairs . In a sterile 0.2 mL amplification tube, the PCR reaction components were: 1 μL of the template, 2 μL of the primer pairs (1 μL of each), 4 μL of dNTP Mix (2.5 mmol·L^{-1} for each), 0.5 μL of the Taq polymerase, and 5 μL of the buffer for the Taq polymerase, in a total volume of 30 μL. The reaction conditions were: 94 °C for 5 min; then 94 °C for 30 s, 54 °C for 30 s, and 72 °C for 45 s for 35 cycles; then 72 °C for 10 min.

After PCR, 4% agarose gel electrophoresis was used to display the products, and the 320 bp bands were cut and purified with a Gel Extraction Kit (U-gene) in a final volume of 50 μL. The DNA strands representing the possible solutions of subgraph G_{1} were denoted by S (G_{1}) , where S(G_{1}) = {x_{1}x_{2}.. .x_{16}}.

These steps were also used to construct the initial solution space of G_{2}, G_{3}, and G_{4}; the results are shown in Fig. 8(b–d). These results were denoted by S(G_{2}), S(G_{3}), and S(G_{4}), respectively, where S(G_{2}) = {x_{16}x_{17}.. .x_{31}} , S(G_{3}) = {x_{31}x_{32}.. .x_{46}}, and S(G_{4}) = {x_{46}x_{47}.. .x_{61}} , with χ_{i}∈C_{vi} and v_{i}∈V(G).

PCR was also used to test the completeness of each initial solution space of each subgraph. Taking S(G_{1}) as an example, a 50-fold dilution of the initial solution space S(G_{1}) was amplified with primer pairs: , where i =1, 2, ... ,16. These products are shown in Fig. 8(e,f).

《Fig.8》

《4.4. Deleting false solutions of the subgraph》4.4. Deleting false solutions of the subgraph

In this section, we describe how to delete false solutions of the subgraph. PCR is also used to delete false solutions. According to the different walks, PCR operations were classified into three basic types:

The first type was designed for the walk or circle, and parallel PCR was applied to find the true solutions. We give the steps using a path P = 16–20–22–26–30 of subgraph G2 as an example.

_{2}) were amplified by different primer pairs: After PCR, five different DNA segments with different lengths were _{i}∈C_{vi} , i = 16, 17, .. ., 31. Lane 5 in Fig. 9(a) shows no product with the primer pair , which means that there is no DNA strand with r22 and y_{26} in the initial solution space. This is the first PCR.

《Fig.9》

Next, the second PCR was run using different templates and different primer pairs, and the five DNA segments were combined to form three segments. Here T_{i} (i = 1, 2, 3) is used to represent the different templates. T_{1} represents the mixtures of PCR products from Lanes 1 and 3 in Fig. 9, T_{2} represents the mixtures from Lanes 1 and 4 in Fig. 9(b) as the template, and T_{3} represents the mixtures from Lanes 2 and 5 in Fig. 9(b). With the primer pairs , , and , three segments with 140 bp, 120 bp, and 120 bp, respectively, were acquired. The results are shown in Fig. 9(c).

Subsequently, the third PCR was run, and a 220 bp band was acquired with the primer pair by using the mixtures of PCR products from Lane 6 in Fig. 9(a) and Lane 1 in Fig. 9(c). The result is shown in Fig. 9(d).

Finally, the mixtures from Lane 1 in Fig. 9(d) and Lane 2 in Fig. 9(c) and the mixtures from Lane 1 in Fig. 9(d) and Lane 3 in Fig. 9(c) were amplified separately by primer pair , and two different 320 bp bands were acquired (see Fig. 9(e)). Thus, at the end of the PCR operations, two sets consisting of 320 bp DNA sequences were obtained. In these sets, the coloring of vertices 20, 22, 26, and 30 were ascertained as follows: b_{16}r_{20}y_{22} r_{26}y_{30} r_{31} and b_{16}r_{20}y_{22} r_{26}b_{30}r_{31}. These steps are shown in Fig. 9(f).

The second type was designed for the single edge whose two end vertices colors were undetermined. Here, we take e ={4, 8} in subgraph G_{1} as an example.

_{1} were amplified by the primer pairs , and the 320 bp full-length DNA strands were divided into sequences of 80 bp , 100 bp , and 180 bp , respectively, where (χi=Cvi, i=1,2,...,16) . The results are shown in Fig. 10(a); there is no product for .

《Fig.10》

Next, PCR was run with primer pair , using the mixtures of PCR products from Lanes 1 and 3 in Fig. 10(a) as a template, and a 160 bp band was acquired (see Fig. 10(b)).

At last, the mixtures of PCR products from Lane 5 in Fig. 10(a) _{4} and v_{8}, the 320 bp band was amplified by primer pairs , respectively. No product was obtained. Thus, the colors of vertices 4 and 8 were determined as r_{4} and y_{8}.

The third type was designed to check the edges whose two end vertices colors had been determined. Here, e={39; 42 }was used to illustrate the operation.

After finishing all PCRs of C = 34–38–42–46–34 and the first PCR of C = 34–38–42–46–34 of subgraph G_{3}, there were a total of six solutions (marked as (1) to (6) in Fig. 11), and the color of v39 and v42 could be determined as b39, y_{39}, r_{42}, and y_{42}. In these cases, only the DNA strands with y_{39} and y_{42} are false solutions. To delete the false solutions, these six solutions were amplified by primer pair separately; only the false solutions would produce 80 bp bands after PCR. The results are shown in Fig. 11(a), and the principle is shown in Fig. 11(b).

《Fig.11》

**Fig. 11. PCR results for e ={39; 42}. (a) All lanes correspond to primer pair : Lanes 1 and 2 correspond to solution (1), Lanes 3 and 4 correspond to solution (2), Lanes 5 and 6 correspond to solution (3), Lanes 7 and 8 correspond to solution (4), Lanes 9 and 10 correspond to solution (5), and Lanes 11 and 12 correspond to solution (6); (b) an illustration of deleting false solutions; "/" stands for deleting false solutions.**

To find the solutions of graph G; the solution spaces of the subgraphs must be combined, and then the false solutions must be deleted according to the operations described in Section 4.4.

First, the solution spaces mixtures of subgraphs G1 and G2 were amplified by primer pair . The PCR results were DNA strands with a 620 bp band, which represented all possible three-colorings of G[V_{1}∪V_{2}]. Meanwhile, false solutions might also exist in this space, because there were five edges connecting G_{1} and G_{2}. After deleting the false solutions, four true solutions of G[V_{1}∪V_{2}] (X_{1}–X_{4}) were obtained (shown below). The results are shown in Fig. 12(a) and (b).

《Fig.12》

**Fig. 12. The solutions for G[V _{1}∪V_{2}] . (a) Lanes 1 and 2 correspond to solutions X_{1} and X_{2} , respectively; (b) Lanes 1 and 2 correspond to solution X_{3} , and Lanes 3 and 4 correspond to solution X_{4} ; (c) Lanes 1 and 2 correspond to solution Y_{1} , and Lane 3 corresponds to solution Y_{2} ; (d) Lane 1 corresponds to solution Y_{3} , and Lane 2 corresponds to solution Y_{4} ; (e) Lane 1 corresponds to solution Y_{5} , Lane 2 corresponds to solution Y_{6} , Lane 3 corresponds to a positive control, and Lanes 4 and 5 correspond non-solution. All the solutions denoted as Y_{j}, where j = 1; 2; 3; 4; 5; 6, correspond to primer pair **

Similarly, the solution spaces mixtures of subgraphs G_{3} and G_{4} were amplified by primer pair to obtain the possible solutions of G[V_{3}∪V_{4}]After deleting the false solutions, six true solutions Y1–Y6 remained (shown below). The results are shown in Fig. 12(c)–(e).

Finally, the solution space of subgraphs G[V_{1}∪V_{2}] and G[V_{3}∪V_{4}] were combined by primer pair . This primer pair only resulted in a band corresponding to 1220 bp. After deleting false solutions, eight true solutions Z_{1}–Z_{8} were obtained (shown below). The results are shown in Fig. 13.

《Fig.13》

**Fig. 13. The solutions for G[V _{3}[V_{4}]. M3 is the DNA molecular weight marker 150 bp ladder. Lane 1 to Lane 8 correspond to solutions Z_{1} to Z_{8}, respectively.**

All eight solutions Z_{1}–Z_{8} were purified and sequenced by Beijing AuGCT Biology Company. The sequences were analyzed by Gene Tools and improved to be the true solutions of graph G.

5. Complexity analysis

In this section, we analyze the complexity of this DNA computing model in two aspects: the complexity of the initial solution space and the complexity of the experiments. We provide a calculation formula to estimate the amount of initial solution space for typical and for worst cases.

《5.1. Calculation of the walks in a given graph》

5.1. Calculation of the walks in a given graph

To illustrate the complexity of the initial solution space, we first introduce a formula, which can be used to calculate the number of the walks between any two vertices.

**Lemma 5.1** [28]: Let G be a graph with n vertices and V(G) = {v_{1},v_{2}, .. ., v_{n}}. A is the adjacent matrix of G. The number of walks of length l in G, from v_{i} to v_{j}, is the entry in position (i, j) of the matrix A^{l}.

5.2. Complexity analysis of decreasing the initial solution space

In previous DNA computing models for solving NP-complete problems, a large number of DNA molecules were frequently used to encode and enumerate an infinite amount of data. However, those models rarely considered decreasing false solutions by constructing the initial solution space; rather, they used the enumerating method. In fact, this enumerating idea does not work for large-scale problems for two reasons. First, with an increase of problem scale, the number of DNA molecules becomes too large to obtain. Second, even if the number of DNA molecules can be designed, the experiments will be too difficult to implement due to the larger size and greater number of molecules. Thus, it is necessary to find more efficient methods to decrease the initial solution space when designing a DNA computing model to solve NP problems. In this article, we designed a DNA computing model for the coloring problem by using an optimization idea to construct an initial solution space, rather than by enumerating.

**Theorem 5.2: **Suppose that G is a graph with |V(G)| = n and G_{1} is its subgraph, as proposed by Section 3.2, where V(G)={v_{1}, v_{2}, ..., v_{n}}, with the bridge vertices v_{1} and v_{t}. Let B(G_{1}) be the probe graph of G_{1}. Then the number of DNA sequences representing all possible coloring solutions in the initial solution space to G_{1} is equal to the number of paths connecting the bridge vertices v_{1} and v_{t} with length t-1 in B(G_{1}), denoted by N_{p}(v_{1}, v_{t}), which is given by the following formula:

**Proof:** The vertex x_{i} in 99:9997% represents a coloring of the vertex v_{i} in G_{1} by the definition of B(G_{1}). The edge {x_{i}, z_{i+1}} ∈ B(G_{1}) represents the probe between x_{i} and z_{i+1}. That is, when the color of v_{i }is x, the color of v_{i+1} can be chosen as z, where x; z ∈{r; y; b}. Consequently, each path from vertex r_{1} to vertex b_{t} denotes one possible coloring of subgraph G_{1}. Furthermore, all the paths between r_{1} and b_{t} in B(G_{1})represent all the possible solutions of the initial solution space.

Considering the structure of the probe graph B(G_{1}), we divide V(B( G_{1} )) into v_{t} parts, where the vertex with subscript i is the ith part. It is concluded that there must at least exist an edge connect ing the ith part and the (i + 1)th part by the definition of a probe graph. Thus, the length of the paths between r1 and bt in B(G_{1}) is only t-1. And each walk from r1 to bt is just the path between r1 and bt. Thus, the theorem is proved by Lemma 5.1.

**Corollary 5.3: **The number of DNA strands in initial solution spaces S(G_{1}), S(G_{2}), S(G_{3}), and S(G_{4}) of the subgraphs G_{1}, G_{2}, G_{3}, and G_{4} shown in Fig. 4 are 89, 81, 412, and 151, respectively. The corresponding percentage of DNA strands deleted in the enumerable initial solution spaces are 99.9998%, 99.9998%, 99.999%, and 99.9997%, respectively.

**Proof: **Notice that vertices v_{1} and v_{16} are the bridge vertices of subgraph G_{1}. Thus, the number of DNA strands in S(G_{1}) is N_{33}(r_{1},b_{16}), which is the number of paths with length 15 from vertex r_{1} to b_{16} in B_{1} (see Fig. 7) by Theorem 5.2.

The adjacent matrix A(B_{1}) of B_{1} is

The correspondence between the vertices of A(B_{1}) and B_{1} is

According to Lemma 5.1, we have A(B_{1})^{15} (1; 33)=89, and 89/43046721≈0:000002=0.0002%. The result shows that 99.9998% of the false solutions are deleted when constructing the initial solution space S(G_{1}).

Similarly, we have A(B_{2})^{15 }(1, 32)=81, and 81/43046721≈0:00000188=0.000188 % for subgraph G_{2}; A(B_{3})^{15} (1, 35)=412, and 412/43046721≈0:0000096=0.00096% for subgraph G_{3}; and A(B_{4})^{15 }(1, 32)=151, and 151/43046721≈0.0000035=0:00035% for subgraph G_{4}, respectively. The results show that 99.9998%, 99.999%, and 99.9997% of the false solutions were deleted when constructing the initial solution spaces S(G_{2}), S(G_{3}), and S(G_{4}), respectively.

5.3 Reducing operation complexity by parallel PCR technology

In our experiment, false solutions are deleted by designed parallel PCR operation on the forward paths (see Section 3.7). In this way, the bio-operation times are greatly reduced. The bio-operation complexity is analyzed in this section.

Let P = v_{i0}v_{i1}.. .v_{im} with │V(P)│ = m+ 1 be a forward path in subgraph G_{1} with bridge vertices v_{1} and v_{t} . Then we must consider the relationships between the two bridge vertices and path P = v_{i0}v_{i1}.. .v_{im} with three cases when we design corresponding parallel PCR operation:

**Case 1:** There are no bridge vertices in V(P). That is, v_{i0}≠v_{1} and v_{im}≠v_{t} . For this case, we need to add the two bridge vertices so as to analyze the advantages of the parallel PCR operations (see Fig. 14(a)).

**Case 2: **There is one bridge vertex in V （P）. That is, v_{i0} = v_{1} but v_{im}≠v_{t }, or v_{i0}≠v_{1} but v_{im} = v_{t }. For this case, we need to add the one bridge vertex v_{1} or v_{t} (see Fig. 14(b)).

**Case 3: **There are two bridge vertices in V（P）. That is, v_{i0} = v_{1 }and v_{im}=v_{t} . For this case, we need only to consider the ordering vertices in path P (see Fig. 14(c)).

Since the color of the bridge vertex v_{1} (or v_{t} ) is fixed in advance, its adjacent vertices in Cases 2 and 3 are taken as the other colors aside from the color of v_{1} (or v_{t} ). Since there are three cases shown in Fig. 14, we will only discuses Case 1, because the other two cases can be treated as special cases of Case 1. Let P belongs to Case 1, then P'=P-{v_{i}, v_{i0}} or P'=P-{v_{m}, v_{t}} should belongs to Case 2, and P''=P-{v_{i}, v_{i0}}- {v_{m}, v_{t}} will be Case 3.

Next, let us discuss the parallel PCR method. First, we determine the number and sequence of the vertices for the forward path P = v_{i0}v_{i1}...v_{im} using the approach given in Fig. 14. Next, we perform the PCR operations. In practice, we do several times of PCR to fulfill one deleting false solutions operation. We found that the times of PCR operation are only relevant to the number of vertices or edges. Taking the first case above as an example, it is necessary to amplify the strands with the primer pairs <v_{i}, v_{i0}>, <v_{i0, }v_{il}>,... <v_{i(m-1)}, v_{im}> and <v_{im}, v_{t}> in the first PCR operation. The second PCR operation is designed to connect the consecutive two segments of the results of the previous PCR operation. The following PCR processes perform similar operations to the second PCR operation. In Fig. 15, we refer to cases with five vertices (a path with two edges) and six vertices (a path with three edges) as examples in order to illustrate the process of parallel PCR, where the edges mean how we do the PCR operation to connect the DNA segments to get a completed DNA sequence.

It is concluded from Fig. 15 that the number of PCR operation is 3 for the case of the path with two edges, and for the case of the path with six edges the number of PCR operation is 4. Thus, we have Theorem 5.4.

**Theorem 5.4: **Suppose that P is the forward path of a subgraph with x edges. If x satisfies 2^{l-1} < x ≤2^{l}, we define，< x > =2^{l}. Let the number of PCR operation be represented as PCR (x). Then, it can be concluded that

**Proof: **First, we consider the case x + 2 = 2^{l}. Suppose that the number of edges in path P is x. That is, the number of vertices in path P is x + 1. We already analyze the three cases between the two bridge vertices in a forward path in Fig. 14, and Cases 2 and 3 are the special cased of Case 1. So without loss of generality, we focus our attention on Case 1.

《Fig.14》

**Fig. 14. Three cases between the two bridge vertices and a forward path. (a) v _{i0} ≠v_{1} and v_{im} ≠ v_{t} ; (b) v_{i0} =v_{1} but v_{im} ≠ v_{t} , or v_{i0} ≠ v_{1 } but v_{im} = v_{t }; (c) v_{i0} = v_{1} and v_{im} = v_{t} . **

Since there are no bridge vertices in path P, it is necessary to increase two bridge vertices according to the requirements of the PCR operations, such that the number of vertices increases to x + 3. Then, we obtain the following equation: X+3 =2^{l} +1

《Fig.15》

**Fig. 15. An illustration of parallel PCR technology.**

As a result, the number of primer pairs of the first PCR can be concluded as shown below:

Following the above operations, the primer pair number in the second PCR is 2^{l-1} . In general, the primer number in the lth PCR is concluded to be:

Thus, the PCR can be accomplished in the (l + 1)th time. In other words, the number of PCR operation required are l + 1 when x + 2 = 2^{l}. Then, we obtain the following mathematical relation:

Similarly, if x + 2 = 2^{l+1}, then the number of PCR operation satisfy the formula as follows:

Here, we demonstrate the case x + 2 = 2^{l} + 1:

As shown above, the number of vertices in path P is x + 3 by increasing two bridge vertices. Thus, when x + 2 = 2^{l} + 1, there are 2^{l} + 2 vertices. The number of primer pairs in the first PCR is 2^{l }+ 1, as induced by the following formula: (x + 3) 1 = x + 2 = 2^{l} + 1.

The number of primer pairs in the second PCR is then 2^{l-1} + 1. Similarly, the number of primer pairs in the (l + 1)th PCR is 2. The (l + 2)th PCR can be completed.

On the other hand, we can obtain the number of the PCR operation as follows:

Thus, the conclusion above is verified when x + 2 = 2^{l} + 1.

For 2 ≤y < 2^{l} , the number of PCR operation whenx + 2 = 2^{l} + y is more than that of the case x + 2 = 2^{l+1} . The number of PCR operation for these two cases above is l+2. Thus, the conclusion holds when 2 ≤ y < 2^{l} and x + 2 = 2^{l}_{ }+y. Consequently, Theorem 5.4 is demonstrated.

**Theorem 5.4 **fully shows the relations between the length x of forward path P and the number of the PCR operation. Of course, the number can be reduced by parallel PCR operations, which will greatly accelerate the computation speed. We can now see the advantages of parallel PCR operations, as shown in Table 2 and Fig. 16.

《Table 2》

**Table 2 ****A comparison between parallel PCR operations and general PCR operations for a single edge**

x is the number of edges, and 3x and PCR(x) denote the number of operation of general PCR and parallel PCR for a single edge, respectively.

《Fig.16》

**Fig. 16. A comparison between parallel PCR and general PCR operations for a single edge.**

**Corollary 5.5:** Let P be the forward path in a graph G_{1} with x edges. It can be induced by using Theorem 5.4 that

**Proof:** Let x +2=2^{l} + y, where 1≤ y ≤2^{l} . Then, we have

In addition, since y is less than 2^{l}, then it is induced that 2^{l}+1 >y + 2^{l}. Consequently, the following consecutive inequalities hold:

That is, x satisfies the relation below:

Then, we obtain

On the basis of L’Hospital’s rule, we obtain

Therefore, the formula is true.

《6. Conclusions》

6. Conclusions

Since shortly after the discovery of the double helix structure of DNA, the idea of using DNA molecules to process information has been a dream, one that originated from Feynman’s concept of constructing submicroscopic computers [1]. This dream was ultimately fulfilled by Adleman in 1994, when he established a DNA computing model for a Hamiltonian graph with seven vertices. These works greatly contributed to the development of DNA computing.

The success of our DNA computing model not only demonstrates its huge computational capability, but also provides a seemingly feasible method to solve the exponential explosion problem, which may be the biggest bottleneck preventing DNA computing from solving larger-scale NP problems. In this paper, a graph vertex coloring problem with a computing complexity of O(3^{59}) is solved by using the advantages of the DNA computing model: the construction of an un-enumerating initial solution space, the division and combination of subgraphs, and parallel bio-operation.

By dividing subgraphs, reducing the vertex colors without losing solutions, and ordering the vertices in subgraphs, the exponential explosion problem is effectively avoided. According to Lemma 5.1, the initial solution space of each subgraph is 89, 81, 412, and 151, respectively, all of which are much lower than 3^{16} (the worst complexity). Therefore, the values of DNA sequences in the unenumerating initial solution space are 99.9998%, 99.9998%, 99.999%, and 99.9997% lower than those in an enumerating initial solution space.

In this model, we also improve the computing efficiency by dividing the graph into four subgraphs and simultaneously deleting the false solutions of these subgraphs. After obtaining the solutions of each subgraph, the true solutions are found by combining the subgraphs. It is possible that the ability of this model to solve the graph vertex coloring problem will reach 100 vertices.

In this model, false solutions are deleted according to the path rather than the edge. Thus, all edges in one path are dealt with at the same time. In this way, the number of PCR operation required decreases. The efficiency of this DNA computing model is greatly improved by a significant innovation: the ‘‘parallel deleting method.”

In general, many researchers use PCR for amplification with only two primers and one template, because they think that PCR is not very reliable with many templates. However, in this model, we use PCR to amplify all templates in a test-tube. In order to obtain reliable results, the primer pairs are strictly designed according to the constraints of PCR, and the reaction conditions and the concentration of the components are optimized for each PCR. Only the bands representing the specific amplification are cut and purified for the next operation. In addition, contrast tests are implemented to avoid false positives and false negatives in the experiments. The sequencing shows that the results of the experiments are reliable. Because of its reliability and its ability to reduce the solution space, we assert that this DNA computing model can be used to solve larger-size problems, and that DNA molecular tools can be used to carry out larger-scale NP problems. In future, it may be possible to use DNA computing for certain practical engineering problems.

《Acknowledgements》

Acknowledgements

The authors are grateful for the support from the National Natural Science Foundation of China (61632002, 61379059, and 61572046).

《Compliance with ethics guidelines》

Compliance with ethics guidelines

Jin Xu, Xiaoli Qiang, Kai Zhang, Cheng Zhang, and Jing Yang declare that they have no conflict of interest or financial conflicts to disclose.