Processing DNA Storage through Programmable Assembly in a Droplet‐Based Fluidics System

Abstract DNA can be used to store digital data, and synthetic short‐sequence DNA pools are developed to store high quantities of digital data. However, synthetic DNA data cannot be actively processed in DNA pools. An active DNA data editing process is developed using splint ligation in a droplet‐controlled fluidics (DCF) system. DNA fragments of discrete sizes (100–500 bps) are synthesized for droplet assembly, and programmed sequence information exchange occurred. The encoded DNA sequences are processed in series and parallel to synthesize the determined DNA pools, enabling random access using polymerase chain reaction amplification. The sequencing results of the assembled DNA data pools can be orderly aligned for decoding and have high fidelity through address primer scanning. Furthermore, eight 90 bps DNA pools with pixel information (png: 0.27–0.28 kB), encoded by codons, are synthesized to create eight 270 bps DNA pools with an animation movie chip file (mp4: 12 kB) in the DCF system.

For the macroscale fluidics mold of the PDMS fluidics chip, 3D printing was carried out to construct the complex structure with mm level height, due to the limitation of the sub-mm level height 2D pattern by photolithography process with a high viscosity photoresistant polymer, such as SU-8.Unlike our previous 3D printed microfluidics method, the 3D printed mold was not coated with a polymer to ensure a smooth surface.For the strong bonding, the PDMS chip, which has a non-smooth surface, was bonded onto the PDMS substrate.

Supporting Information 2
To obtain droplets of a few microliters, a 3D mold was designed by AutoCAD 2022.The channels were designed to have 1 mm height and width at Figure S1.The 3D printed mold was practically cured to a ~ 2 mm channel level width, owing to UV intensity and polymer curing.Our 500 bps DNA samples were analyzed using Illumina Miseq, which can be read at 300 bps by paired-end reading.(Macrogen, Inc., Seoul, South Korea) For analysis of the reserved complementary sequences, the sequences were matched using the primer information.The sequencing results were aligned and statistically analyzed using MATLAB 2022a.Figure S2 and S3 show the consensus sequencing results with a high quality per base pair.To create the single word DNA sequences with RS error correction code (ECC) redundancy, the codewords (the length of the codeword define n) were defined as the message words (the length of the message word was defined as k), which are based on 7 bits American Standard Code for Information Interchange (ASCII) code and two symbols, which were encoded with the RS ECC redundancy.For ECC encoding of the single word DNA fragments, the shortened RS (n, k) code 1 , which has 2 7 Galois field applied on each DNA sequence.n is code length and k is message length.In our protocol, the error can correct one alphanumeric character by 2 RS parities within 7 nt, which was located in end of the payload sequence.For suitable length parameters for k of the message, the word information was automatically implanted and encoded to DNA sequence by Matlab 2022a.
Additionally, for correction efficiency, the RS ECC codes, which can be corrected one character (ASCII 7 bits) and 7 bits, were compared between single sentence DNA in the sequencing file.The RS ECC code for one character was applied to the "DNAdata" and "Apple" sequence and the RS ECC code for 7 bits was applied to the "Stores" sequence.Table S2 shows that the RS ECC bits correction as 200, while RS ECC ASCII character correction was 2 and 105.
Table S3.RS ECC code for correction of single.The decoding protocol for sorting and arranging word data in a sequence using primer indexing involves extracting the sentences from the payload section, determining the size and order of the sentences with a flexible length, and then arranging the sentences in a sequence using the primer.This ensures data integrity and minimizes errors, ultimately resulting in finalizing the sentence data through the process.
Supporting Information 8 Figure S5 displays the consensus sequencing outcome for DNA sentences (lane A, B, C, and D), which were constructed using DNA droplets.The primer sequences, indicated by diagonal lines, were excluded, and the payload data sequences containing 7nt RS parities were decoded in a sequential manner to form sentences.

Supporting Information 10
Since the sentence DNAs were aligned using primer sequences, the consensus sequencing results and the count of perfect matching sequences were analyzed by the number of primer matched bases from 15 bps to 20 bps. Figure S5 shows the consensus sequencing results, including payload data with and without primer alignment.
As a result of the number of primer matched bases from 15 bps to 20 bps, we observed that the primer alignments count the number of sorted reads by the number of primer matched bases in 20 bps forward and reverse primers.Additionally, we monitored the perfect matched reads.Furthermore, the matched length of payload data was counted.To monitor the data DNA sequencing fidelity with ECC, we monitored the perfect matched reads, applying ECC.For random access of DNA storage, PCR based file selection is widely used.Since the DNA data were programmbly synthesized by primer information, we provided the random access process using software, which is searched by primer sorting.The DNA pools, which were synthesized with various information with primer sequences, were sequenced by NGS.The sequencing results file was aligned and sorted by primer information.Due to the base sequence position, the DNA data can be categorized as shown in Figure S7 c.Thus, the DNA data, which is associated by process in memory (PIM), has the merits for processing as well as decoding.Furthermore, the synthesized DNA data by PIM can allow random access by PCR technology, which is used to physically select DNA data files.

Figure S1 .Figure S2 .
Figure S1.The 3D mold image of the droplet controlled fluidic system.
Figure S4.NGS results with the quality score for DNA fragments in the series order DNA A, B, C, D, and E.

Figure S5 .
Figure S5.NGS results with the quality score for DNA fragments in the series order DNA A, D, C, B, and E.

Figure S6 .
Figure S6.The sequencing results of sentence DNAs.

Figure S7 .
Figure S7.The sequencing analysis for sentence DNAs.

Figure S8 .
Figure S8.The sequencing results of sentence DNAs after primer sorting.

Figure S9 .
Figure S9.The number of counts for address sequence alignment, perfect match, payload data length match, and applying ECC by the matched address sequence, which was from 15 bps to 20 bps sentence (a) "DNAdata Stores Apple", b) "JamesWatson Likes Orange", c) "FrancisCrick Likes Grape", and d) "JohnVonNeumann Has Apple")

Figure S10 .
Figure S10.a) A diagram of the consensus sequencing for DNA pools by primer-based sorting.b) Consensus sequencing results of whole DNA pools.c) Consensus sequencing results of DNA pools by primer-based sorting.

Table S1 .
Primer and splint DNA sequences for DNA A, B, C, D, and E.

Table S4 .
Splint DNA information and PCR primer information.