Polygrammar: Grammar for Digital Polymer Representation and Generation

Abstract Polymers are widely studied materials with diverse properties and applications determined by molecular structures. It is essential to represent these structures clearly and explore the full space of achievable chemical designs. However, existing approaches cannot offer comprehensive design models for polymers because of their inherent scale and structural complexity. Here, a parametric, context‐sensitive grammar designed specifically for polymers (PolyGrammar) is proposed. Using the symbolic hypergraph representation and 14 simple production rules, PolyGrammar can represent and generate all valid polyurethane structures. An algorithm is presented to translate any polyurethane structure from the popular Simplified Molecular‐Input Line‐entry System (SMILES) string format into the PolyGrammar representation. The representative power of PolyGrammar is tested by translating a dataset of over 600 polyurethane samples collected from the literature. Furthermore, it is shown that PolyGrammar can be easily extended to other copolymers and homopolymers. By offering a complete, explicit representation scheme and an explainable generative model with validity guarantees, PolyGrammar takes an essential step toward a more comprehensive and practical system for polymer discovery and exploration. As the first bridge between formal languages and chemistry, PolyGrammar also serves as a critical blueprint to inform the design of similar grammars for other chemistries, including organic and inorganic molecules.


Introduction
Polymers are important materials with diverse structure variations, and applications.
Ideally, a chemical design model would include three components: (1) a well-defined representation capable of capturing known structures, (2) a generative model capable of enumerating all structures in a given class, and (3) an inverse modeling procedure capable of translating known molecular structures into the representation. For a given class of molecules, an ideal chemical design model should satisfy the following five criteria: i. Complete: representation is able to encode all possible structures in the given class. ii. Explicit: representation directly specifies the molecular structure. iii. Valid: every generated output is a physically valid chemical structure in the given class. iv. Explainable: the generation process is understandable to the user. v. Invertible: the inverse procedure can translate molecular structures into the given representation. However, designing a chemical model that meets all these criteria is challenging, especially for structurally complex molecules. Most existing approaches are limited to small, simple chemical structures [32][33][34][35][36] . Even with this limited scope, the design is labor intensive: the representation language is typically developed first, then extended for generation and inverse modeling. In particular, there have been many systems for molecular line notations [32,33] and fragment-level description [34,35] , which were then used as the basis for generative and inverse schemes [5][6][7][8] .
Yet, a comprehensive chemical design model for large polymers remains elusive due to the polymers' inherent complexity. We present a detailed account for each property, including polymer-specific challenges and the performance of existing methods (see Table 1). Some of the most popular methods like SMILES and BigSMILES are only partial design models, as they define a representation but not a generative model. In this case, we assume the simplest generative model for our comparison: randomly chosen strings of permissible symbols. Other chemical design models like auto-encoders (AE) [5][6][7][8]36] have a direct mapping to our framework: the learned latent space is the representation, the decoder is the generative model, and the encoder is the inverse model. After exploring the state of the art for all five properties, we give an overview of our proposed approach. Table 1. Comparison with related chemical design models. Since SMILES and BigSMILES only explicitly provide a representation, we assume the simplest generative modeling scheme: randomly choosing strings of permissible symbols. Our PolyGrammar is the only approach that satisfies all five properties. detailed arrangement of the polymer's components. As for the AE, the latent variable is an implicit representation and it is impractical to understand the polymer structures merely from the numeric vector.
Valid. Generative models that build on a well-defined representation scheme are highly coveted [40] , particularly for their ability to efficiently build large corpora of example structures. However, the result is only useful if the examples generated by the model are guaranteed to be chemically valid. This is challenging to enforce for polymers, as there are many hard chemical constraints (e.g., valency conditions) and other restrictions to account for. The likelihood of violating these constraints increases as the target molecules get larger.
Machine learning techniques including support vector machines (SVM) [41] , recurrent neural networks (RNN) [1][2][3][4] , generative adversarial networks (GAN) [9][10][11][12] , and AE have been used as generative models for molecules. However, these methods often produce outputs that are chemically invalid, even when limited to small molecules. It is even more challenging for these methods to generate valid polymers, due to the large number of generation steps required to realize such large molecules. Although several recent efforts based on AE [35,36] and reinforcement learning (RL) [42,43] have been proposed to produce valid polymers, it is not clear how well they generalize -i.e., the AE may be unable to ensure validity when generating polymers that significantly deviate from the training data. Non-learning methods also struggle to enforce validity, particularly with simple probabilistic generative models -e.g., randomly choosing SMILES/BigSMILES strings. Even with additional considerations for line notation syntax and additional semantic constraints, these probabilistic generation schemes can produce invalid line notations [44] .
Explainable. To ensure confidence in the results of the generative model, the generation process itself must be fully transparent and understandable to chemists. This property is not necessarily more challenging for large polymers (compared to small molecules), but it is much more critical to facilitate understanding of the resulting polymer structure. Interpretable generation processes also aid the exploration of possible polymer variations. AE and other deep learning based generative models [1][2][3][4]9,10,45] produce structures based on implicit latent variables. These models are effectively black-box functions that cannot be easily interpreted. By contrast, the generative model of SMILES can be interpreted since each generated symbol has an explainable meaning: it either indicates the type of the generated atom, or the bonding relationship. The generative model based on BigSMILES is not explainable since it cannot show the detailed arrangement of constituent monomers.
Invertible. When designing a new chemical design model, it is critical to ensure compatibility with existing notations. In particular, it should be possible (via an inverse modelling procedure) to translate any final representation from an existing scheme into the proposed representation. This inverse procedure should yield the same process and final representation as if the structure were created via the integrated generative model. This is critical for two reasons: (i) it makes existing knowledge accessible in the new representation, and (ii) it confirms the representative power of the new chemical design model.
To judge invertibility for polymer models, we consider translation from one of the most popular molecule notations: SMILES. As shown in Table 1, invertibility is already an important feature common to many existing methods. For example, the encoder of a chemical AE takes a SMILES string as input, then outputs the corresponding latent variable. Big-SMILES is built directly upon SMILES so it can easily covert SMILES strings of polymers into the BigSMILES representation. When building our own representation, we also consider "invertibility" with respect to the SMILES format. However, in principle, it is possible to design inverse procedures that translate from other existing representations schemes as well.
Our Approach. In this paper, we propose a new chemical design model for polymers that respects all five of the ideal properties discussed above. We introduce PolyGrammar, a parametric context-sensitive grammar for polymers. In formal language theory, a grammar describes how to build strings from a language's alphabet following a set of production rules.
PolyGrammar represents the chain structure as a hypergraph. In particular, each polymer chain is represented as a string of symbols, each of which refers to a particular molecular fragment of the original chain. This symbolic hypergraph representation supports explicit descriptions for infinite amount of diversely structured polymer chains by changing the form of symbolic strings.
Based on this representation, we establish a set of production rules that can effectively generate chemically valid symbolic strings. The recursive nature of grammar production makes it possible to generate any polymer in our given class using only a simple set of production rules. In particular, it is possible for PolyGrammar to enumerate all valid polymers structures within a given class. Figure 1. Schematic of our chemistry design model, PolyGrammar, which represents molecular chain structure as a string of symbols (center). PolyGrammar consists of a set of production rules { % | = 1, … ,14} (left). The generation process starts from an initial symbol . At each iteration, each non-terminal symbol (ℎ, or ) in the current string is replaced by the successor of a production rule whose predecessor matches the symbol. The generation process concludes when the string does not contain any non-terminal symbols. The resulting symbol string (center) is then translated to a polymer chain (right) by hypergraph conversion.
As a demonstrative example, we focus on a particular class of polymers: polyurethanes. We choose polyurethanes due to their wide-ranging applications, including antistatic coating [46] , foams [47] , elastomers [48] , and drug delivery for cancer therapy [49] . Consider PolyGrammar … Extract Rules … HyperGraph Conversion generating a polyurethane of chain length of 20, using 1 polyol type (e.g., PTMO) and 1 isocynate type (e.g., MDI). Under these assumptions (which are representative of the average polyurethane chain [50] ), PolyGrammar can generate more than 2 × 10 & distinct polyurethane chains using only 14 production rules. Moreover, we show that PolyGrammar can be easily extended to the other types of polymers, including both copolymers and homopolymers. We further propose an inverse modeling algorithm that translates a polymer's SMILES string into the sequence of production rules used to generate it. More than 600 polyurethanes collected from literature are validated by this inverse model, demonstrating the representative power of PolyGrammar. Schematic of our PolyGrammar is shown in Figure 1.

Hypergraph-based Symbolic Representation
In this section, we introduce the hypergraph representation of polyurethane structures and describe how to use symbolic strings to represent polyurethane chains. The structure produced by reacted by two monomers (1,3-bis(isocyanatomethyl)cyclohexane and diethylene glycol). The standard graph representation (i) uses 21 nodes and 21 edges, but the hypergraph (ii) only requires two hyperedges. Each hyperedge corresponds to the nodes of a given monomer. Both hyperedges have the urethane group in common. We use the line graph (iii) to visualize the hypergraph representation in the remaining figures of the paper for convenience.

Polymers as Hypergraphs
It is a common practice [7,12,51,52] to regard the structural formula of a molecule as an ordinary graph, where atoms are nodes, bonds are edges, and edges connect exactly two nodes.
For polyurethanes, ordinary graph depictions would require prohibitively many nodes and edges. To address this, we employ a generalized graph called a hypergraph [53] , which allows individual edges to join more than one node. Any edge that connects a subset of the nodes in the hypergraph is called a hyperedge. Consider the product of two monomers (1,3 bis(isocyanatomethyl)cyclohexane and diethylene glycol) as shown in Figure 2 For increased convenience, we will visualize the hypergraph representations using the line graph [54] form shown in Figure 2

Symbolic Representation
Given the hypergraph of a polyurethane chain, we construct a corresponding symbolic string for use in PolyGrammar. In the symbolic string, the hyperedges corresponding to the isocyanate (hard segment) are denoted with "ℋ" and those corresponding to the polyol (soft segment) are denoted as " ". The chain extenders are omitted, since they can only exist between two adjacent ℋ (or ) symbols. For those polyurethanes containing multiple isocyanate or polyol types, we use subscripts = 1, 2, … to distinguish different subtypes of certain hyperedge. For instance, if two different types of isocyanates are used [38] , we use ℋ # and ℋ ! to distinguish the hyperedges corresponding to each hard-segment type. These rules allow us to represent any polyurethane chain as a string of symbols. Examples are shown in Figure 4.  We emphasize that our symbolic representation is invertible, such that a symbolic string can be converted back to the corresponding chemical structure if the constituent isocyanate(s), polyol(s) and chain extender(s) are specified. We call this process hypergraph conversion. The invertibility of hypergraph representation ensures our PolyGrammar can simultaneously serve as a representation and a generative model for polyurethanes.

PolyGrammar
In this section, we first present the basic mechanism of grammar production using an illustrative example. Then, we introduce our parametric context-sensitive PolyGrammar comprehensively. Finally, we propose several advanced features based on our basic Poly-Grammar for the representation of polyurethanes, which encourage the generation of more general structures.

Figure 5.
An illustrative example of grammar production. Starting from the initial symbol , we sequentially invoke four production rules from = { → ℋℎ; ℎ → ℋℎ; ℎ → ; → }. The process continues until all symbols in the string are terminal symbols. By specifying the constituent structures, i.e., isophorone diisocyanate (IPDI), polyhexamethylene carbonate glycol (PHA) and EG, the string of the symbols can be translated to the corresponding polyurethane chain via hypergraph conversion.

Basic PolyGrammar
In formal language theory, a grammar = ( , , ) is used to describe a language, where is a set of non-terminal symbols, is a set of terminal symbols and is a set of production rules, each of which consists of a predecessor and a successor separated by a right arrow "→". In the language represented by the grammar , each word is a finite-length string containing both terminal and non-terminal symbols. The non-terminal symbols in a word can be further replaced and expanded by invoking one production rule from at a step. In our PolyGrammar, the set of non-terminal symbols is { , ℎ, } and the set of terminal symbols is {ℋ, }. Figure 5 shows an illustrative example to demonstrate the process for producing a string via the grammar . This example uses four production rules: = { → ℋℎ; ℎ →

HyperGraph Conversion
: IPDI : PHA ℋℎ; ℎ → ; → }. Starting from the initial symbol , at each iteration, each non-terminal symbol in the current string is replaced with the successor of a production rule whose predecessor matches the symbol. The process continues until no non-terminal symbols exist in the string.
According to Chomsky's classification [55] , the grammar used in this illustrative example is a Type-2 grammar, also called context-free grammar, where the predecessor of each production rule consists of only one single non-terminal symbol. Similar paradigms are also utilized in L-systems to model the morphology of organisms [56,57] .

Context-Sensitive Grammar
The context-free grammar discussed above is insufficient to imitate the polyurethane generation process because the symbolic string can only expand along one direction; however, polyurethanes generally grow along two opposite directions to form chain structures. To address this, our PolyGrammar utilizes a context-sensitive grammar. In particular, our Poly-Grammar is a Type-1 grammar, a more general form of Type-2 grammar [58] , where the production rules also consider the context (i.e., the surrounding symbols) of the given non-terminal symbol within the string.
By considering the symbol contexts, the production rules of a context-sensitive grammar can explicitly depict the growing direction of the polyurethane chain. The production rules are as follows: In each production rule, the non-terminal symbol to be replaced is inside the angle brackets "< >" of the predecessor. The contexts are the symbols located at both sides of "< >" in the predecessor ( indicates no constraints). The rule can only be deployed when both contexts of the symbol have been matched.
Each rule has an intuitive function. Rules # and ! initialize the start symbol , while $ , * , ## and #( terminate the growth. Rules ' , ( , & and ) extend the string along the left direction, and + , #" , #! and #' extend the string along the right direction. ' and + indicate the reaction between two isocyanates, imitating the formation of the hard segment, while ) and #' indicate the reaction between two polyols, imitating the formation of the soft segment. Lastly, ( , & , #" and #! imitate the formation of the urethane group.
Another important feature of the PolyGrammar is that there are multiple possible production rules to expand a given symbol. For instance, ' , ( and $ share the same predecessor and expand the non-terminal symbol h along the left direction. There are many possible schemes for selecting among these options, including hand-tuned heuristics or manual intervention to guide the scheme toward particular results. For simplicity, we have implemented a uniformly random selection technique: at each iteration, we randomly sample one rule from all of the candidate rules that meet the contexts and apply it to the symbol. An example of the production process is illustrated in Figure 6. Example of context-sensitive grammar. At each production step, only the rules that match the non-terminal symbol's context are adopted. Hence, the production process can explicitly depict the growing direction of the polyurethane chain. If there are multiple candidate rules at a given step, selection can be done manually or randomly. The selected rule is then applied to the symbol to continue production.

Parametric Grammar
Although the context-sensitive grammar makes it possible to generate a variety of polyurethane chain structures, its modeling power is still limited. One important problem is that the total chain length of the generated polyurethanes cannot be controlled. In practice, the chain length is an essential factor that influences the physical and chemical properties of the polyurethanes [50,59] . It is non-trivial to control the chain length of each generated polyurethane merely using the grammar discussed above due to the stochastic production. In order to address this problem, we introduce a parameter associated with each terminal symbol in the grammar and augment our PolyGrammar as a parametric context-sensitive grammar. The proposed parametric grammar is illustrated as follows,  Figure 7. Example of parametric grammar. To control the length of generated polyurethane, we introduce parameters, denoted with parentheses "( )" after terminal symbols.
The production rules now feature parameters, which are denoted with parentheses "( )" following terminal symbols. Furthermore, each production rule is augmented with a logical "condition" that determines whether the rule can be invoked or not (None indicates no constraints). By specifying L (the initial value of parameter x in production rules # and ! ), the HyperGraph Conversion end : MDI : PBA grammar can produce strings with length 2 + 1, corresponding to polyurethane chains with length 2 + 1. By varying the value of , the chain length of generated polyurethanes can be controlled. An example of this production process is illustrated in Figure 7.

Extensions for Branched Polyurethanes
So far, all of our polyurethanes have featured linear chain structures. However, it is possible for polyurethanes to have branched structures [60] , as shown in Figure 3(ii). To generate branched polyurethanes, we augment the parametric context-sensitive grammar with several rules:

Global Controllable Parameters
We have already discussed the use of parameters for controlling the chain length of the generated polyurethanes. However, it is still difficult for our baseline parametric grammar to achieve more advanced controllable parameters such as the ratio of hard segment to soft segment. This is because the context-sensitive grammar only captures "local" information about the chain during the generation process, as the view of each production rule is limited to the context immediately surrounding the predecessor symbol. When it comes to global constraints, such as specific ratios of hard versus soft segments, the generative model needs to be aware of the relevant information (number of hard segments, chain length) over the whole chain. It is non-trivial to handle these constraints with the basic PolyGrammar discussed in previous sections.
To address this issue, we introduce an additional symbol "ℳ" which serves as a mes-

PolyGrammar as a Generative Model
Generative models are critical for the efficient, thorough exploration of possible polymer structures. These models are also particularly powerful in conjunction with machine learning algorithms, in order to address complicated problems like human-guided exploration and property prediction. In this section, we discuss how our parametric context-sensitive Pol-yGrammar can serve as a generative model.
The generation process of PolyGrammar begins with a simple string that contains the initial symbol . On each step, we traverse the symbols in the current string and find the position of all the non-terminal symbols. For each non-terminal symbol, we identify a candidate set of production rules. Each candidate production rule must meet the following conditions: 1) the context in the predecessor clause matches the context of the current symbol in the string, Using our generative model, it is possible to enumerate all valid polyurethane structures in a target class (e.g., length 20 with 1 type of polyol and 1 type of isocyanate). In particular, any distinct sequence of production rules on the start symbol yields a distinct string, which in turn represents a unique polyurethane chain. Since the production rules encode all permissible local configurations of the constituent molecules, it follows that our grammar is able to generate any valid polyurethane.
To emphasize the volume of achievable molecules, we also quantitatively analyze the diversity of generated chains for our PolyGrammar. Given a chain length parameter and the number of isocyanate and polyol types ( , and -, respectively), the basic PolyGrammar (with 14 production rules) allows the generation of a total number of = X !./# %0" polyurethane chains with different structures. With = 10, , = 1, -= 1, which are representative of an average polyurethane chain [50] , is more than 2 × 10 & . This demonstrates the powerful capacity of our PolyGrammar. Several polyurethane chains generated using Poly-Grammar are shown in Figure 8. More examples can be found in Supporting Information.  Figure 9. Schematic for translating a polyurethane from a SMILES string into our PolyGrammar representation, which also reveals the complete sequence of rules required for its generation. The pipeline can be regarded as a search process. Starting from the initial symbol, we iteratively select and invoke production rules until all symbols in the string are terminal symbols. Then given the component types, we convert the symbolic string into a polyurethane structure by hypergraph conversion and compare it with the input structure. The total process repeats until the search structure matches with the input structure.

Translation from SMILES
To complete our chemical design model, we also develop an inverse model capable of translating a SMILES string into the corresponding sequence of PolyGrammar production rules.The overall pipeline of translation from SMILES can be regarded as a search process, as shown in Figure 9. Starting from the initial symbol, we iteratively select and invoke production rules until all symbols in the string are terminal symbols. Once we have a complete string and the specific component types, we use hypergraph conversion to convert the symbolic string into a polyurethane structure. We then compare our result with the input structure; if they do not match, we restart our search from scratch. The process repeats until our structure matches the original input.
Specifically, our inverse model proceeds as follows. Given the SMILES string of the polyurethane chain, we break it into multiple molecular fragments by disconnecting all of the  [63] ) to identify the type of it: an isocyanate, a polyol or a chain extender. During the enumeration, we also record the connectivity between each fragment. Based on the types and the connectivity of the fragments, we can obtain a hypergraph representation of the original SMILES string. The final step is to convert the hypergraph into the sequence of the production rules of PolyGrammar. We traverse the hypergraph using the breadth-first search (BFS) algorithm, which explores all of the neighbouring hyperedges at the present depth before moving on to the nodes at the next depth level.
BFS starts at the tree root, which is an arbitrary hyperedge of the hypergraph. Each step of the exploration returns a tuple of two hyperedges, which is then matched with a specific production rule in the PolyGrammar. Hence, the sequence of the production rules can be obtained once the entire hypergraph has been explored. The pipeline of this algorithm is illustrated in Figure 10 and the corresponding pseudo-code is in Supporting Information.
This pipeline is sufficient for our needs, but it could be improved with a heuristic search such as * search [64] , best-first search [65] , or learned heuristic search [66] where a heuristic function accelerates the search process by directing attention toward the most promising regions of the search space.
To validate our approach and demonstrate the capacity of our proposed PolyGrammar, we have collected and inversely modelled over 600 polyurethane structures from literature.

Generalization to Other Polymers
Our PolyGrammar can also be easily extended to new classes of polymers. These extensions would use the same framework described above, with very few modifications. In the Supporting Information, we illustrate the extended PolyGrammar for different types of copolymers, including alternating copolymers and block copolymers. Note that our PolyGrammar in the main paper can already cover random copolymers, branched copolymers, and graft copolymers. Users only need to add new types of reactants to the symbolic representation in order to determine the species of monomer.
For now, PolyGrammar focus on the backbone structure, i.e., the arrangement of monomers, which largely determines the property of copolymers (derived from more than one species of monomer). The grammar treats the monomer fragment as a whole and distinguishes different monomer types using different symbols. However, there is also a wide range of pol-

Discussion
PolyGrammar is an effective chemistry design model that satisfies all five desirable properties discussed in Introduction. In particular, our symbolic representation can convey all possible polyurethane structures in an explicit yet concise manner. The generative model based on this representation is exhaustive (it is capable of generating any polyurethane) and trustworthy (every generated polyurethane is guaranteed to be valid). Moreover, the generation process is fully transparent and understandable to the user, as it returns a sequence of meaningful production rules that yield our model's result. Lastly, the generation process is invertible, so molecules can be translated from other popular representations such as SMILES.
These superior properties make PolyGrammar more comprehensive and practical than existing representation schemes and generative models. Our full chemical design model (representation, generative model, and inverse model) are also simple and efficient to use in practice. For a polyurethane chain of length 20, the average generation time via PolyGrammar is 4 ms and its translation from SMILES costs 11 ms on a PC with an Intel Core i7 CPU.
For now, our PolyGrammar focuses on single-chained molecular structures. However, real synthesized polyurethanes are a mixture of differently structured chains, where interactions between chains such as hydrogen bonding and crosslinking may occur [47,48] . These interactions influence the physical and chemical property of the polyurethane, largely determining whether the synthesized polyurethane is thermoset or thermoplastic. Such interactions are not currently addressed in PolyGrammar, but they could be added by augmenting the production rules to support interactions between multiple chains.
The current generative model of the PolyGrammar also only imitates the chain-growth polymerization. Although this polymerization mechanism has some benefits for the simulation of polyurethane chains [61] , it would be ideal for our PolyGrammar to imitate step-growth polymerization as well. More advanced grammar such as universal grammar [62] will be helpful to achieve this.
These aforementioned features are intriguing and will be implemented and demonstrated in future work. However, even without these augmentations, our proposed PolyGrammar takes an important step toward a more practical and comprehensive system for polymer discovery and exploration.

Conclusion
In summary, we propose a parametric context-sensitive grammar, called PolyGrammar, for the representation and generation of polymers. The recursive nature of grammar production enables the generation of any polymer chain using only a simple set of production rules. We also implement an algorithm that can transfer a SMILES string of a polymer chain to the sequence of production rules used to generate it. Capable of reproducing a large literature-collected dataset, this algorithm demonstrates the completeness and effectiveness of our PolyGrammar. Our PolyGrammar will benefit the polymer community in several ways.
The most immediate contribution is our ability to efficiently generate an exhaustive collection of polymer samples. This corpus could be very powerful in conjunction with other methods (e.g., machine learning) to guide the synthesis of physical polymers and facilitate complex tasks like molecular discovery [2][3][4] and property optimization [13,14,17] . PolyGrammar is also helpful for the reverse engineering of polymer design and production. Our PolyGrammar serves as a blueprint to construct chemical design models for different classes of chemistries, including both organic and inorganic molecules. Eventually, PolyGrammar could improve chemical communication and exploration, by providing a more efficient and effective representation scheme that is widely suitable for complicated polymers.

Supporting Information
Supporting Information is available from the author: production rules of global controllable

S1. Production Rules of Global Controllable Grammar
The The symbol to be replaced is inside the "< >". The contexts are the symbols located at both sides of "< >" in the predecessor ( indicates no constraints). Parameters of message symbol ℳ locates inside "( )". The logic condition of the parameters for each production rule is between ":" and "→" ( also indicates no constraints here). During the production process, the production rule can only be applied to a symbol when both of its context and the logic condition are satisfied. The production process will stop when no production rules can be invoked, i.e. for each symbol of the string, there are no production rules that can meet the condition and the contexts of the symbol.

S4. Generalized PolyGrammar to Other Polymers
The extended PolyGrammar for different types of copolymers and functional groups is illustrated as follows, • for block copolymers, • for alternating copolymers,

S5. Pseudo-code of Translation from SMILES
The inverse design process contains three parts: disconnecting carbamate bonds, constructing the hypergraph and BFS searching for rules matching. The pseudo code is illustrated as follows.