Quick Subset Construction

A finite automaton can be either deterministic (DFA) or nondeterministic (NFA). An automaton‐based task is in general more efficient when performed with a DFA rather than an NFA. For any NFA there is an equivalent DFA that can be generated by the classical Subset Construction algorithm. When, however, a large NFA may be transformed into an equivalent DFA by a series of actions operating directly on the NFA, Subset Construction may be unnecessarily expensive in computation, as a (possibly large) deterministic portion of the NFA is regenerated as is, a waste of processing. This is why a conservative algorithm for NFA determinization is proposed, called Quick Subset Construction, which progressively transforms an NFA into an equivalent DFA instead of generating the DFA from scratch, thereby avoiding unnecessary processing. Quick Subset Construction is proven, both formally and empirically, to be equivalent to Subset Construction, inasmuch it generates exactly the same DFA. Experimental results indicate that, the smaller the number of repair actions performed on the NFA, as compared to the size of the equivalent DFA, the faster Quick Subset Construction over Subset Construction.

(possibly large) portion of the NFA that is deterministic already is unnecessarily processed.Put another way, a large NFA may possibly be determinized more efficiently than Subset Construction does by fixing directly the NFA instead of generating an equivalent DFA from scratch.Fixing an NFA means applying a series of repair actions on the nondeterministic transition function that progressively transform the NFA into exactly the same equivalent DFA produced by Subset Construction.This alternative technique to determinization of NFAs is said to be inward-oriented.
Consider, for example, an autonomous agent (robot) that is designed to learn by experience the environment in which it is being operated.To this end, the robot may construct incrementally a model of the environment based on a DFA, where states represent configurations of the environment, whereas transitions indicate actions of the robot that cause a change in the environment configuration.Given the current instance of the DFA, an interaction of the robot with the environment is bound to extend the DFA with new transitions and states.If such an extension comes with nondeterminism, the DFA stops being deterministic, thereby becoming an NFA.This NFA may be very large (certainly so if the previous DFA is large), with nondeterminism being confined to the newly-created transitions.Now, since the model of the environment in which the robot is being operated is supposed to be a DFA, just-in-time determinization of the NFA is required in order to update the deterministic model of the environment.Determinization by Subset Construction would generate the new equivalent DFA from scratch, however, even if the new DFA is likely to be almost equal to the previous DFA.From a computational point of view, regenerating the DFA from scratch on the fly may be impractical, especially so if performed under stringent time constraints.
A more practical approach may be to update directly the NFA by fixing the points in which the transition function becomes nondeterministic, thereby avoiding the unnecessary regeneration of the deterministic portion of the NFA.This alternative approach to NFA determinization is substantiated by a novel algorithm, named Quick Subset Construction, which is the subject of this article.Unlike Subset Construction, the proposed algorithm progressively transforms the NFA into an equivalent DFA that, remarkably, is identical to the DFA generated from scratch by Subset Construction.In this sense, Quick Subset Construction is equivalent to Subset Construction, even though the same DFA is obtained in a very different mode.
Quick Subset Construction was not conceived overnight: it is the result of a series of algorithms designed for a specific research domain of artificial intelligence named model-based diagnosis. 2][6][7] To our knowledge, in the literature there is no inward-oriented determinization algorithm.Indeed, starting from Subset Construction, all the alternative determinization algorithms are invariably outward-oriented, where the equivalent DFA is generated from scratch.* Perhaps, in this respect, the closest algorithm to Quick Subset Construction is Subset Restructuring, 7 which was designed for the determinization of mutating automata.A mutating finite automaton is an NFA that changes its morphology over discrete time by a sequence of mutations, where a mutation causes some changes in the transition function of an NFA, such as the insertion and/or removal of some transitions as well as involved states.This results in a temporal sequence of NFAs, namely  0 ,  1 , … ,  i , … , where  0 is the initial NFA, whereas each subsequent NFA results from a mutation of the preceding NFA.The idea is to generate on the fly an equivalent DFA of each NFA, say  i+1 that is equivalent to  i+1 , based on the DFA  i (generated previously), that is equivalent to  i , and the difference (in the transition function) between  i and  i+1 .Instead of generating  i+1 from scratch as Subset Construction does,  i+1 is obtained by updating  i while accounting for the mutation leading  i to  i+1 .On the other hand, Subset Restructuring cannot cope with the determinization of any given NFA outside the narrow context of mutations, which is instead possible for Quick Subset Construction, which can therefore be regarded as a general-purpose algorithm for NFA determinization, exactly like Subset Construction, although based on a very different approach.
Specifically, the main contributions of the current paper are: 1.A detailed presentation of the Quick Subset Construction determinization algorithm; 2. A proof of equivalence of Quick Subset Construction, which demonstrates that the resulting DFA is identical to the DFA generated from scratch by Subset Construction; 3.An implementation of a software framework (available online as open source) for experimenting with Quick Subset Construction, as well as for generating pseudo-random test cases based on given configuration parameters; 4. A variety of experimental results that compare Quick Subset Construction with Subset Construction.
In the rest of the article, Section 2 recalls the notion of a finite automaton, specifically DFA and NFA.Section 3 presents the classical outward-oriented technique for NFA determinization by Subset Construction.Section 4 illustrates in detail the inward-oriented technique for NFA determinization by Quick Subset Construction.The equivalence of Subset Construction and Quick Subset Construction is formally proven in Section 5. Experimental results are presented in Section 6, along with a discussion on the convenience of using Quick Subset Construction rather than Subset Construction.Conclusions are drawn in Section 7.

FINITE AUTOMATA
A finite automaton (FA) is a mathematical formalism for the modeling of a variety of systems, 9 which can be either abstract or concrete.Abstract formal systems modeled as FAs are designed to solve problems like pattern matching based on regular expressions, 10 language recognizers and translators, 11 as well as analysis of protein sequences. 12For instance, a lexical analyzer, which is designed to recognize the words of a language, can be implemented as an FA, where the input symbols are the characters of a given alphabet, while a state represents a prefix of the word being recognized, for example, an identifier in a programming language.The (possibly infinite) set of strings that can be recognized by an FA is called a regular language.Every regular language can be specified by a regular expression, and each language specified by a regular expression is regular and can be recognized by an FA.Put another way, FAs and regular expressions share the same expressive power.
As pointed out, an FA can also model a concrete system, which is typically discrete and dynamic.A discrete system is characterized by input variables and system states that can be represented by discrete values.A dynamic (or time-varying) system evolves over time from one state to another.Moreover, in a system modeled by an FA, both the domain of input symbols and the set of states are finite.In both control theory and artificial intelligence, FAs are exploited to model a class of physical systems called discrete-event systems. 13][16][17][18][19][20][21][22][23][24][25][26] Formally, an FA can be either deterministic (DFA) or nondeterministic (NFA).Specifically, a DFA is a 5-tuple,  = (Σ, Q, , q 0 , F), (1)   where Σ is the alphabet (a finite set of symbols called hereafter labels), Q is the (finite) set of states,  is the transition function,  ∶ Q Σ  → Q, where Q Σ ⊆ Q × Σ, q 0 is the initial state, and F is the set of final states, where F ⊆ Q. Determinism in  comes from the peculiarity of the transition function mapping each pair (q, ) ∈ Q Σ into a single state q ′ , namely (q, ) = q ′ .Example 1 (DFA).Centered in Figure 1 is a graphical representation of a DFA, which includes six states and eight transitions, where A is the initial state and G is the (unique) final state.Determinism is grounded on the fact that all the transitions exiting the same state have different symbols of the alphabet, in other words, the transition function maps each pair in Q Σ into a single state, for instance, (D, a) = E and (D, b) = G.
F I G U R E 1 An NFA (left) and an equivalent DFA (center), which recognize the regular language (aba * |ba + ) b.A slight variation of the left NFA is displayed on the right side.
An NFA is defined almost identically to a DFA, with the exception of the transition function.Formally, an NFA is a 5-tuple,  = (Σ, Q, , q 0 , F), (2)   where Σ, Q, q 0 , and F are defined as in Equation ( 1), whereas the transition function  is defined differently,  ∶ Q  Σ  → 2 Q , where Q  Σ ⊆ Q × (Σ ∪ {}), with  being the empty symbol ( ∉ Σ).Nondeterminism in  lies in the transition function, which, on the one hand, involves the empty symbol , which allows for spontaneous state transitions and, on the other, it maps a pair (q, ), where q ∈ Q and  ∈ Σ ∪ {}, into a set of states.Intuitively, an NFA may be in several states when recognizing a string of symbols, which is impossible for a DFA, which is always in a single state during processing.† Roughly, the regular language recognized by an FA, either deterministic or nondeterministic, is the set of strings (composed of the symbols in the alphabet) that we obtain by traversing all paths of transitions from the initial state to a final state, where the instances of the empty symbol  are immaterial.Two FAs are equivalent if they share (recognize) the same regular language.From a practical stand point, however, equivalence is not synonymous with equality in computation.Given an NFA and an equivalent DFA, the processing based on the NFA is in general more complex than the processing based on the DFA.In lexical analysis, for instance, the recognition of a string based on a DFA requires the processing just to keep track of one state, namely the state computed by the transition function (q, ), where q is the current state and  is the next character in input.By contrast, the same recognition based on an NFA requires the lexical analyzer to keep track of a set of states.In fact, since the transition function maps one state to several states, the NFA may be in several states during processing, namely in a set Q = {q 1 , … , q k }.Hence, the recognition of the next character  in input will move the NFA to a set of states Q  ∪ Q *  , where and Q *  is the set of states that are reachable from any state in Q  by paths of -transitions.We have, therefore, good reasons to prefer an equivalent DFA over an NFA, especially considering that for any NFA there is always an equivalent DFA, more generally, several (possibly an infinite number of) equivalent DFAs.
Example 2 (Equivalent FAs).With reference to Figure 1, note how the DFA displayed in the center is equivalent to the NFA on the left side.In fact, they both recognize the same regular language, which is defined by the following regular expression: For instance, the string bab is recognized by the DFA via the following sequence of transitions: Before presenting the classical algorithm for NFA determinization, we introduce a little handy terminology.Given a state s of an FA and a label  of the alphabet, a transition mapping (s, ) is called an -transition.Let n be a state of an NFA  .The -closure of n, denoted -closure(n,  ), is the set of states in  that is composed of n and all the states that are reachable from n by paths of -transitions.The -closure of a set N of states in  is defined as (5) † In the literature, there is a distinction between an NFA and an -NFA, where the former does not include -transitions, while the latter does.In this article, we do not make that distinction and consider a more general notion of an NFA, which may or may not include -transitions.Let n be a state of an NFA  and let  be a label in the alphabet of  .The -mapping of n, denoted -mapping(n,  ), is the -closure of the set of states generated by the transition function when applied to (n, ).The -mapping of a set N of states in  , is defined as Example 4 (-mapping).Considering the NFA  displayed on the right of Figure 1, we have a-mapping(2,  ) = {3, 4, 5} and b-mapping({0, 1},  ) = {2, 3, 5}.

OUTWARD-ORIENTED DETERMINIZATION BY subset construction
The classical algorithm for NFA determinization is Subset Construction. 1 The reason for this name stems from the mode in which each state of the equivalent DFA is constructed, namely as a subset of the NFA states.Moreover, if the NFA includes m states, in the worst case, the DFA will include 2 m − 1 states, which is the number of possible subsets of NFA states (empty set excluded).In other words, in the worst case, the complexity of the DFA generated by Subset Construction is exponential in the number of NFA states.In practice, however, this is very pessimistic, as possibly a large number of subsets of NFA states are not involved in the DFA constructed.
The modus operandi of Subset Construction is outward-oriented: it maps an NFA to an equivalent DFA that is constructed from scratch, which, for this reason, is said to be SC-equivalent to the input NFA.Although there are in general several DFAs that are equivalent to a given NFA, nonetheless there exists just one minimal equivalent DFA, namely a DFA including the minimum number of states.8][29][30][31][32][33][34] Therefore, even if the SC-equivalent DFA is not minimal, it can be minimized automatically.
The pseudocode of Subset Construction is listed in Algorithm 1 (lines 1-23).It takes as input an NFA  and generates as output a DFA  that is equivalent to  .For practical reasons, the algorithm makes a distinction between the identifier d of a state of the output DFA and the subset of states marking d, called the extension of d, denoted ||d||.It exploits a stack onto which the newly-created states are put.The idea is to pop one state at a time from the stack and to generate the relevant transition function, which may cause the creation of new states.The processing terminates when the stack becomes empty, that is, when the transition function of all states has been completed.Specifically, the set D of states, the set  d of transitions, where a transition  d (d, ) = d ′ is denoted as an arc ⟨d, , d ′ ⟩, and the set F d of final states are initialized as empty (line 5).In line 6, the initial state d 0 is inserted into D, where ||d 0 || = -closure(n 0 ,  ).In other words, the extension of d 0 includes the initial state n 0 of  plus the states of  that are reachable from n 0 by -transitions only.If the extension of d 0 includes a final state of  , then d 0 is inserted into the set of final states of  also (line 7).Then, d 0 is pushed onto the stack (line 8).Afterwards, the loop in lines 9-22 is executed until the stack of states becomes empty.At each iteration, a state d is popped from the stack (line 10) and the transitions exiting d in  are generated within the nested loop in lines 11-21.To this end, each label  of the alphabet Σ relevant to a transition exiting an NFA state in ||d|| is considered (line 11).In line 12, a set N of states in  is computed as -mapping(||d||,  ), that is, the set of states of  that are reached by the -transitions exiting the states in ||d|| plus the states in  that are reachable from these states by -transitions only.N is the extension of the target state of the -transition exiting d in .Hence, if there is no such a state (line 13), then it is created, namely d ′ with extension N (line 14) and possibly added to the set of final states provided that ||d ′ || includes at least one final state of  (line 15); then, since it is a newly-created state, d ′ is pushed onto the stack (line 16).If, instead, that state exists already, then it is referenced as d ′ (line 18).In either case, a new transition ⟨d, , d ′ ⟩ is eventually created in  (line 20).Notice that, being the set N of states in  finite, the set of states that can be generated for  is finite also, as it is bounded by the powerset of N; hence, sooner or later, Subset Construction terminates and  is in fact a DFA, namely the SC-equivalent DFA of  .
Example 5 (Subset Construction).Detailed in Figure 2 is the generation, performed by Subset Construction, of the DFA that is SC-equivalent to the NFA depicted on the left side of Figure 1.Specifically, for all  ∈ Σ such that there is a state n ∈ ‖d‖ for which  n (n, ) is defined do until the stack is empty 23: end function each intermediate instance of the DFA is associated with the content of the stack (placed on top of each instance), which grows from left to right; hence, the rightmost state (in bold) represents the top of the stack, that is, the next state to be processed.At the beginning of the main loop (lines 9-22), the DFA is composed of the initial state only, namely A, which is also the unique element in the stack.Then, state A is popped from the stack and the relevant transition function is materialized by means of two exiting transitions, while the two newly-created states, namely B and C, are inserted onto the stack.The algorithm continues popping one state from the stack and generating the relevant transition function, while putting possible new states onto the stack.For instance, the processing of state C gives rise to the creation of the new state E, where ||E|| = a-mapping({2}) = {3, 4}.When the stack becomes empty (right side of Figure 2), the DFA is complete.Note that the resulting DFA is identical to the DFA displayed on the center of Figure 1, except that each state is marked with the relevant extension, a by-product of the mode in which Subset Construction identifies the DFA states.Once the DFA is generated, however, all the extensions may be removed and the states are simply identified by chosen symbols (in our example, capital letters).
Since Subset Construction is a function, for each NFA  there is one and only one DFA that is SC-equivalent to  , albeit, generally speaking, several DFA might exist that are equivalent to  .

INWARD-ORIENTED DETERMINIZATION BY quick subset construction
In order to generate an equivalent DFA, the Subset Construction algorithm does not operate directly on the NFA: the DFA is constructed from scratch leaving the input NFA untouched.Yet, this outward-oriented approach is not strictly necessary to the determinization task.In fact, the same equivalent DFA can also be obtained using a series of repair actions operating directly on the input NFA: the nondeterminism is progressively removed from the NFA by fixing the transition function wherever necessary, thereby eventually obtaining the SC-equivalent DFA.This idea is substantiated by an alternative algorithm for NFA determinization, called Quick Subset Construction ‡ To this end, each point of nondeterminism in the transition function of the NFA is considered and a specific repair action is applied in order to remove the nondeterminism.Nondeterminism occurs when a state is exited either by an -transition or by several transitions with the same label.

Singularities
In order to capture all the nondeterminism points, the relevant states of the NFA are marked with a set of labels in Σ ∪ {}, where Σ is the alphabet of the NFA.The initial marking is based on three rules: 1.If an -transition exits the initial state n 0 of the NFA, then n 0 is marked with .
2. If there is a transition ⟨n, , n ′ ⟩, where  ≠ , and n ′ is exited by an -transition, then n is marked with .
3. If a state n is exited by several -transitions, where  ≠ , then n is marked with .
If a label  marks a state n, then (n, ) is called a singularity.Since the collection of labels marking a state is an ordered set, no duplication of the same singularity is allowed.A singularity initially marking the NFA indicates that the transition function mapping (n, ) results in several states, in other words, nondeterminism arises; consequently, a repair action is required in order to fix that singularity.Fixing a singularity may cause the creation of additional singularities, which indicate that the transition function of the relevant states needs to be either adjusted or completed with new transitions.
Any given NFA is therefore associated with a set of initial singularities.The notion of an initial singularity is exploited for providing a precise measure of the quantity of nondeterminism affecting an NFA (Definition 1).§ Definition 1 (quantity of nondeterminism).The quantity of nondeterminism in an NFA is the number of initial singularities.Definition 1 should not be misleading, however: as shown in Section 6.6, the quantity of nondeterminism in an NFA is generally unrelated to the number of singularities actually processed, which, in some circumstances, can be far larger than the number of initial singularities.That is, a small quantity of nondeterminism may translate into a large number of repair actions.
The order in which singularities are processed is important.Based on our experience, choosing the singularities randomly may cause the disconnection of the resulting DFA and/or the nontermination of the algorithm.In order to avoid both disconnection and nontermination, singularities are sorted based on the level of the states.The level of a state s in an FA with initial state s 0 , denoted (s), is the minimum number of transitions (including -transitions) that connect s 0 with s.Operationally, the level of a state can be defined inductively by the following two rules: NFA (left) and DFA (right), with states marked with corresponding levels (cf. Figure 1).

𝜆(s
Example 6 (level).Shown in Figure 3 are the NFA (left) and DFA (right) displayed in Figure 1, where each state is marked with the relevant level.
Since the level imposes only a partial order on states, as several states may have the same level, in order for Quick Subset Construction to behave deterministically, # the singularities (q, ) are totally ordered based not only on (q) but also on the natural (ascending) order of the identifiers q and  (with the label  being in first position).Therefore, the singularities (q, ) are sorted based first on (q), then on the identifier of q, and finally on .The resulting (sorted) list of singularities is called a singularity list.Since processing a singularity may cause the insertion/removal of transitions, with possible changes in the level of states, the position of each singularity within the singularity list may change during processing.
As every state q in the FA being manipulated by Quick Subset Construction is marked with a nonempty subset of the states of the NFA, as for Subset Construction, the new algorithm makes a distinction between an identifier q and the subset of states marking q, namely the extension ||q||.Note that the extension of a state q can be changed by the repair actions, whereas the identifier q cannot.For formal reasons, we widen the notion of extension to an NFA state n, namely ||n|| = {n}.Also, given a set Q of states in the automaton being manipulated by Quick Subset Construction, ||Q|| denotes the union of the extensions of the states in Q, that is, a subset of the states of the input NFA.

Quick Subset Construction
The pseudocode of Quick Subset Construction is listed in Algorithm 2 (lines 1-47).|| The algorithm exploits five auxiliary functions/procedures, namely Unsafe, Enlarge, New, Level, and Unify (detailed in Section 4.3).Quick Subset Construction takes as input an NFA  and generates a DFA that is SC-equivalent to  .To this end, first a copy  of  is created along with its initial singularities (line 5).Upon the termination of the algorithm,  has been transformed into the equivalent DFA.For the determinization of , the singularities are considered one by one within the singularity list.The repair actions associated with the singularity considered, however, depend on a relevant scenario.Three scenarios are defined, called  0 ,  1 , and  2 , respectively.Scenario  0 (lines 6-19) occurs for the singularity (q 0 , ) only, called the -singularity, where q 0 is the initial state of .This means that q 0 is exited by at least one -transition.Since, the -singularity is in the first position in the singularity list, this is the first singularity that is processed.Besides, since no singularity (q, ) can be generated subsequently, neither for the initial state nor for any other state, the processing of the -singularity is placed upfront in the algorithm, before the main loop (lines 20-46), in which scenarios  1 and  2 are possibly considered several times.Roughly, the repair actions associated with (q 0 , ) remove the -transitions exiting q 0 while enlarging the extension of q 0 .First the sets Q, Ū, and N ¶ A state s is a parent of a state s when there is a transition from s to s, namely ⟨s, , s⟩.# The deterministic behavior of Quick Subset Construction should not be confused with the deterministic function of a DFA: it means that Quick Subset Construction always performs the same sequence of actions for the same input NFA to generate the SC-equivalent DFA.|| The processing of final states is omitted: as in Subset Construction, a DFA state is final when its extension includes an NFA that is final.output:  = (Σ, Q,  q , q 0 , F q ): a DFA that is equivalent to  , namely the SC-equivalent DFA of  4: begin 5: Generate a copy  of  and mark it with the initial singularities 6: if there is a singularity (q 0 , ) then ⟨ Scenario  0 : lines 6-19 ⟩ 7: , where Q is the -closure of q 0 , Ū is the set of the unsafe states in Q, and N is the set of NFA states involved in Q. Intuitively, a state q is unsafe if the removal of the -transitions may cause q to be no longer reachable from the initial state, in other words,  may become disconnected.To know whether a state is unsafe, the auxiliary function Unsafe is called (Algorithm 3, Section 4.3.1).Next, the extension of q 0 is expanded to N (line 8) by means of the Enlarge auxiliary procedure (Algorithm 4, Section 4.3.2),which may also create new singularities for q 0 .Then, the -transitions exiting q 0 are removed (line 9).Next, for each transition ⟨u, , q⟩ in  where u is unsafe,  ≠ , and q is not in the set of unsafe states (possibly outside the set Q), a transition ⟨q 0 , , q⟩ is created (lines 10-12), while the level of q and successive states is possibly updated by the Level auxiliary procedure (Algorithm 6, Section 4.3.4).These new transitions prevent  from becoming disconnected owing to the subsequent removal of transitions.Subsequently, every transition ⟨q, , u⟩ exiting a state that is not unsafe (possibly outside the set Q) and entering an unsafe state is removed and, if  ≠ , a singularity (q, ) is created (lines 13-16).This is required in order to avoid creating dangling transitions resulting from the subsequent removal of the unsafe states.Eventually, the unsafe states and relevant transitions are removed from  (line 17), as well as the -singularity from the singularity list (line 18).
Example 7 (Scenario  0 ).Shown in Figure 4 is the processing of scenario  0 for a fragment of an NFA (the initial configuration of ), where A is the initial state.Notice that, since the -singularity is the first being processed,  equals the input NFA  , hence the extensions of the states in  are all singletons involving the corresponding NFA state.Highlighted in the first configuration of  displayed on the left side of Figure 4 4.The processing in lines 10-12 causes the insertion of the transitions ⟨A, b, G⟩ and ⟨A, b, H⟩, while lines 13-16 provoke the removal of the transition ⟨B, b, C⟩, which brings  to the third configuration in Figure 4. Eventually, the processing in lines 17 and 18 brings  to the last configuration (right side of Figure 4), where the singularity (F, c) has been removed (owing to the removal of F), which terminates the processing of scenario  0 .
Scenario  1 (lines 22-28) occurs when no -transition exits q.Hence, a transition mapping (q, ) to a state q ′ such that ||q ′ || = N needs to be created.Two cases are possible: either q ′ exists or it does not.If q ′ exists already, then a transition ⟨q, , q ′ ⟩ is inserted (lines 23 and 24).Since the insertion of a new transition may shorten the level of the target state and possibly of other connected states, the auxiliary procedure Level is called (Algorithm 6, Section 4.3.4)for the adjustment of the relevant levels.If, instead, q ′ does not exist, then it is created with extension ||q ′ || = N by means of the New auxiliary function (Algorithm 5, Section 4.3.3) in line 26, which also generates the relevant singularities for the newly-created state.Moreover, a new transition ⟨q, , q ′ ⟩ is inserted and the level of q ′ is assigned (line 27).
Scenario  2 (lines 29-43) occurs when an -transition exiting q exists already.This scenario is reminiscent of scenario  0 and, hence, the repair actions are somewhat similar.The conditions in lines 29 and 30 avoid processing a singularity F I G U R E 4 Processing of scenario  0 by Quick Subset Construction.On top of each intermediate graph is the content of the singularity list, where the head (in bold) is the -singularity under processing (note how new singularities are generated).The states in gray are those in Q (namely, the -closure of q 0 , line 7), where the unsafe states (those in Ū) are dashed.
that does not change , thereby enabling the repair actions only if the transition mapping (q, ) actually needs adjustment.If so, first the sets Q and Ū are computed (line 31).Then a state q ′ is created by means of the auxiliary function New (line 32), where ||q ′ || = N (with N being the -mapping of ||q|| generated in line 21).Then, the -transitions exiting q are removed (line 33).Next, for each transition ⟨u,  ′ , q ′′ ⟩ where u is unsafe,  ≠ , and q ′′ is not among the unsafe states, a transition ⟨q ′ ,  ′ , q ′′ ⟩ is inserted (lines 34-36).Like in scenario  0 , these new transitions are meant to prevent  from becoming disconnected owing to the subsequent removal of transitions in lines 37-40, where for each transition ⟨q ′′ ,  ′ , u⟩ removed, with  ′ ≠ , a singularity (q ′′ ,  ′ ) is created.Then, all unsafe states and relevant transitions are removed (line 41).Afterwards, in line 42, a new transition ⟨q, , q ′ ⟩ is inserted into , where q ′ is the new state created in line 32.The insertion of the new transition requires the setting of the level of q ′ as well as the possible update of the level of other states by means of the auxiliary procedure Level.In case another existing state q ′′ turns out to have the same extension as q ′ , in order to avoid duplication of states in , q ′ and q ′′ are merged into a single state (line 43) by means of the Unify procedure (Algorithm 7, Section 4.3.5).
Example 8 (Scenario  2 ).Shown in Figure 5 is the processing of scenario  2 for a fragment of an intermediate configuration of , where the relevant singularity is (C, a), with C being exited by two a-transitions.Highlighted in the configuration of  in the left side of Figure 5 are the states in Q (line 31), where the unsafe states are dashed, namely E and G.Then, based on lines 32 and 33, assuming N = {3, 4, 5, 6, 7, 8} in line 21, a new state I is generated, with ||I|| = N, and the transitions exiting C are removed (second configuration of  in Figure 5).The processing in lines 34-36 causes the insertion of the transition ⟨I, c, H⟩, while lines 37-40 provoke the removal of the transition ⟨F, a, E⟩, which brings  to the third configuration.Eventually, the processing in lines 41-43 brings  to the last configuration in Figure 5, which terminates the processing of scenario  2 .
Example 9 (Quick Subset Construction).We now revisit the determinization of the NFA depicted on the left side of Figure 1 using Quick Subset Construction, which has been carried out already by Subset Construction in Example 5 (cf. Figure 2).Shown in Figure 6 are the various configurations of , which correspond to the processing of the singularities involved.Note that on top of each configuration of  is the relevant content of the singularity list, where the singularity in first position (the head of the list, the first singularity to be processed) is in bold.On the left side of Figure 6 is the initial configuration of  (the input NFA), along with the initial singularities, namely (C, a), since C is exited by two a-transitions, and (D, a), as there is a transition exiting D that enters the state D ′ which is exited by an -transition (cf. the rules for the initial singularities in Section 4.1).The first singularity (C, a) leads to scenario  2 which, once processed, brings  to the second configuration, with two new singularities (relevant to E) being generated.The processing of the singularity (D, a) leads to scenario  1 , which inserts the new transition ⟨D, a, E⟩ (third configuration in Figure 6).Processing the singularity (E, a) leads again to scenario  1 , which inserts the new auto transition ⟨E, a, E⟩ (fourth configuration).Eventually, the singularity (E, b) has no effect because none of F I G U R E 5 Processing of scenario  2 by Quick Subset Construction.On top of each intermediate graph is the content of the singularity list, where the head (in bold) is the singularity under processing (note how new singularities are generated).The states in gray are those in Q (namely, the -mapping of q, line 31), where the unsafe states (those in Ū) are dashed.

F I G U R E 6
Transformation by Quick Subset Construction of an NFA into its SC-equivalent DFA (cf. Figure 1).On top of each intermediate automaton is the content of the singularity list, where the head (in bold) is the singularity under processing.The algorithm terminates when the singularity list is empty.

3:
q: a state in Q 4: output: flag: a Boolean value indicating whether q is unsafe 5: begin 6: flag ← q ≠ q 0 and there is no transition ⟨q * ,  * , q⟩ in  q such that (q * ,  * ) ≠ (q, ) and (q * ) ≤ (q) 7: end function the conditions in lines 29 and 30 is fulfilled, thereby bringing  to the final configuration where the singularity list is empty, which terminates the execution of the algorithm.As expected, the resulting DFA (right side of Figure 6) is identical to the DFA generated from scratch by Subset Construction in Example 5 (cf. Figure 2).In other words, Quick Subset Construction has progressively transformed the NFA into its SC-equivalent DFA.

Auxiliaries
This section describes the five auxiliary functions/procedures exploited by Quick Subset Construction, namely Unsafe, Enlarge, New, Level, and Unify.

Unsafe
The pseudocode of the Unsafe Boolean function is listed in Algorithm 3 (lines 1-7).It takes as input a singularity (q, ) and a state q of .It generates as output a flag indicating whether q is unsafe or not.The state q is unsafe if the removal of an -transition exiting q might cause the disconnection of q from the initial state q 0 .The condition of unsafety is expressed in line 6, namely, when q is not the initial state and there is no transition ⟨q * ,  * , q⟩ either not exiting q or exiting q with a different label, such that the level of q * is not greater than the level of q.If so, since the level may increase from one state to a successive one, every parent state of q might be a state that is reachable from q and, hence, removing the -transition exiting q may result in the disconnection of q and possibly of other states that are reachable from q.
Example 10 (Unsafe).Shown in Figure 7 is a fragment of  when the Unsafe function is called, where (q, ) = (C, a).Within the set of gray states, the Boolean flag is true when q is a dashed state (E, F, H, I, and J), while it is false when the state is plain (D and G).For instance, the state E is unsafe because the condition Portion of a configuration of  when the auxiliary function Unsafe is called, where (q, ) = (C, a).

3:
N: a set of states of the NFA 

4:
side effects: the extension of q is enlarged by N and new singularities are possibly generated for q 5: begin for all  ∈ Σ such that there is a state n ∈ ( N ⧵ ‖q‖) that is exited by an -transition in  do 8: if a singularity (q, ) does not exist then generate a new singularity (q, ) end if end if 12: end procedure in line 6 is not fulfilled, specifically, because E is not the initial state, nor is there a transition ⟨q * , , E⟩ such that (q * , ) ≠ (C, a) and (q * ) ≤ (C).By contrast, the state D is not unsafe owing to the transition ⟨B, b, D⟩ which fulfills the condition in line 6 (in fact, (B) = (C) = 1).This means that the removal of the transition ⟨C, a, D⟩ will leave D still reachable from the initial state thanks to the connection with B. **

Enlarge
The pseudocode of the Enlarge procedure is listed in Algorithm 4 (lines 1-12).It takes as input a state q of  and a set N of states of  (the NFA given in input to Quick Subset Construction).As a side effect, the extension of q is expanded by N, possibly with new singularities being created for q.The condition in line 6 prevents that expansion from being immaterial.Thus, if there is at least one state in N that is not included in the current extension of q, then new singularities for q are possibly generated in lines 7-9, specifically, a singularity (q, ) for each label  marking a transition exiting a state in  that is in N but not in the current extension of q, thereby allowing for the subsequent adjustment of the transition function relevant to these labels.Eventually, the extension of q is enlarged by N (line 10).
** A state qualified as unsafe may be still reachable from the initial state after the removal of a transition (q, ).For instance, although being qualified as unsafe, state J is still connected with the initial state when all the a-transitions exiting C are removed, owing to its connection with G.In other words, an unsafe state might be disconnected from the initial state, but not for sure.By contrast, if a state is not unsafe, it will remain connected with the initial state for sure.side effects: a new state q is generated in , where ‖q‖ = N, along with relevant singularities 4: output: q: the newly-created state 5: begin 6: Insert a new state q into Q, where ‖q‖ = ∅ input: Λ = [(q, )]: a singleton list, where q ∈ Q and  is the actual level of q 3: side effects: the level of q and of other states reachable from q is possibly updated 4: begin 5: Remove the head ( q, λ) from the list Λ 7: if ( q) is unassigned or ( q)> λ then 8: for all transition ⟨ q, , q ′ ⟩ ∈  q do 10: Append (q ′ , λ + 1) to Λ until Λ is empty 14: end procedure

New
The pseudocode of the New function is listed in Algorithm 5 (lines 1-8).It takes as input a set of states of  (the NFA given in input to Quick Subset Construction) and creates a new state q in  having extension N. First, q is created as an empty state (line 6).Then, the (empty) extension of q is extended by N by calling the Enlarge procedure, hence, possibly generating new singularities for q also (cf.Algorithm 4).

Level
The pseudocode of the Level procedure is listed in Algorithm 6 (lines 1-14).It takes as input a singleton list Λ = [(q, )], where q is a state of , possibly with unassigned level, and  is the actual level of q.As a side effect, the level of q, as well as of other states that are reachable from q, is possibly updated.This is necessary whenever a new transition inserted into  may shorten the level of some states (lines 11, 24, and 42 of Quick Subset Construction).A loop is performed in lines 5-13, where the first pair (q, ) in Λ is considered (line 6).If the level of q is not yet assigned or the current level of q is greater than the actual level , then the level of q is assigned with  (line 8).Since this assignment may cause a change in the level of the successive states of q, for every state q ′ that is entered by a transition exiting q, a new pair (q ′ ,  + 1) is appended to Λ in order to subsequently propagate the change of levels to the states that are reachable from q (lines 9-11).The loop terminates when Λ becomes empty, that is, when all relevant levels have been updated (line 13).
Example 11 (Level).Shown on the left side of Figure 8  According to lines 7-12 of Level, the processing of the pair (H, 1) changes the level of H to 1 and appends the new pairs (E, 2) and (F, 2) to the list Λ (second graph).The processing of (E, 2) has no effect on the level, thereby leaving the graph unchanged.Instead, the subsequent processing of (F, 2) changes the level of F to 2, while appending the pairs (G, 3) and (I, 3) (third graph).Next, processing (G, 3) changes the level of G to 3 and appends the pair (D, 4) (fourth graph).Processing (I, 3) changes the distance of I to 3 and appends the pair (H, 4) (last graph on the right).Eventually, the processing of both (D, 4) and (H, 4) has no effect on the levels, thereby leaving the graph unchanged.Since now Λ is empty, the procedure Level terminates.As expected, the levels of the states F, G, H, and I are reduced in accordance with the new topology of the graph after the insertion of the transition.

Unify
The pseudocode of the Unify procedure is listed in Algorithm 7 (lines 1-14).It takes as input a newly-created state q ′ in  (cf.scenario  2 in Quick Subset Construction) and another state q ′′ in  that has the same extension as q ′ .In order to avoid duplication of states, all the transitions and singularities relevant to q ′ are inherited by q ′′ , whereas the duplicated state q ′ is removed.First, the unique transition entering q ′ is redirected towards q ′′ (line 6).Since this redirection may cause a change in the level of d ′′ , as well as of other states reached by q ′′ , level relocation is performed by the Level procedure.Then, all transitions exiting q ′ are moved to q ′′ (lines 7 and 8).Even in this case, since this redirection may cause changes in the level of the states entered by these transitions, Level needs to be called.Afterwards, every singularity relevant to q ′ is transformed into a singularity relevant to q ′′ (lines 10-12).The (isolated) duplicated state q ′ is eventually removed from Q (line 13).
Example 12 (Unify).Shown on the left side of Figure 9 is a configuration of  when the Unify procedure is called by Quick Subset Construction (line 43) with input parameters q ′ = F (to be removed) and q ′′ = G (to be preserved).The new configuration of  resulting from the unification of F and G is displayed on the right side of the figure, where F has been removed, while all its entering/exiting transitions (along with relevant singularities, omitted in the figure) have been moved to G. Note how G is eventually exited by two b-transitions, which in principle would require generating a singularity (G, b).More generally, redirecting towards q ′′ the transitions exiting q ′ may cause the transition function of q ′′ to become nondeterministic, which therefore requires the generation of new singularities for q ′′ .In fact, however, according to line 32 of Quick Subset Construction, the creation of the new state q ′ by New (cf.Section 4.3.3)invariably creates all the singularities of q ′ by means of the Enlarge auxiliary procedure (cf.Section 4.3.2).Consequently, when q ′′ inherits the singularities of q ′ , it also inherits the singularities that should be crated owing to the possible new nondeterminism stemming in q ′ .In other words, there is no need for the creation of new singularities for q ′′ .Algorithm 7. Unify (auxiliary procedure) 1: procedure Unify(q ′ , q ′′ ) 2: input: q ′ : a newly-created state in Q (scenario  2 ), 3: q ′′ : an existing state in Q where ‖q ′′ ‖ = ‖q ′ ‖,

ALGORITHM EQUIVALENCE AND DFA IDENTITY
Massive experimentation has convincingly shown that Quick Subset Construction generates a DFA that is SC-equivalent to the input NFA, in other words, that Quick Subset Construction is equivalent to Subset Construction, inasmuch it generates a DFA that is identical to the DFA generated by Subset Construction.Empirical evidence, however, cannot prove the equivalence of the two algorithms conclusively.This is why a proof of correctness of Quick Subset Construction is provided hereafter (Theorem 1), which shows that the DFA resulting from the repair actions performed on the NFA by this algorithm is identical to the DFA generated from scratch by Subset Construction.The proof makes use of the notion of a path of the algorithm.
Definition 2 (execution point and path).Let  be an NFA applied to Quick Subset Construction and let  be the FA being processed by Quick Subset Construction.Let  i , i ≥ 0, be a configuration of  when a singularity is considered (in either line 6 or 20), and let  i be the configuration of the singularity list when such a singularity is considered.The pair ( i ,  i ) is an execution point of Quick Subset Construction.
The sequence [( 0 ,  0 ), ( 1 ,  1 ), … ] of execution points relevant to the processing of all the singularities is the path of Quick Subset Construction when applied to  .
Theorem 1.Let  = (Σ, N,  n , n 0 , F n ) be an NFA, where every state is reachable from the initial state, a final state is reachable from every state, and there is no (auto) -transition ⟨n, , n⟩.† † Let  = (Σ, D,  d , d 0 , F d ) be the DFA generated by the application of Subset Construction to  (namely the DFA that is SC-equivalent to  ), and let  ′ = (Σ, Q,  q , q 0 , F q ) be the FA resulting from the application of Quick Subset Construction on  .We have Proof.The proof of the theorem is grounded on Definitions 3,4, and Lemmas 1-7.▪ Definition 3 (completable transition).Let ( i ,  i ) be an execution point of Quick Subset Construction and let ⟨q, , q ′ ⟩,  ≠ , be a transition in  i .If ||q ′ || = -mapping(||q||,  ), then ⟨q, , q ′ ⟩ is complete, otherwise the transition is incomplete.An incomplete transition ⟨q, , q ′ ⟩ in  i is completable if and only if  i includes a singularity (q, ).
Definition 4 (viability).Let ( i ,  i ) be an execution point of Quick Subset Construction.If every incomplete transition in  i is completable, then the execution point Proof.By contradiction, assume that Quick Subset Construction may not terminate.Since no recursive call is performed, this requires that a loop is executed endlessly.With regard to the auxiliary functions/procedures, only Level includes a loop to be considered for possible nontermination, as the input (singleton) list Λ is possibly extended within the loop whose termination requires the emptiness of Λ. Assume that Level does not terminate.Since each new pair (q ′ ,  + 1) appended to Λ in line 10 of Level is such that q ′ is a successive state of q (that is, a state reachable from q, the state relevant to the initial pair in Λ), sooner or later, the same state q ′ needs to be considered again in line 6, namely q.However,  cannot be less than the current level of q, as the level  continues growing in the processing, in other words, the loop in Level must terminate, a contradiction.Hence, the only mode in which Quick Subset Construction may not terminate is by traversing an endless path, namely Since both the set of states in  i and the alphabet Σ are finite,  i is finite also, being an ordered set of pairs (q, ), where q is a state in  i and  a symbol in Σ. Besides, since the set of transitions in  i is finite, the set of possible execution points is certainly finite.Hence, a necessary condition for the path  to be infinite is to reach an execution point at least twice, that is  = [ … , ( i ,  i ), … , ( j ,  j ), … ] where ( i ,  i ) = ( j ,  j ).Notice that, since the processing of Quick Subset Construction is deterministic, if an execution point is repeated, then the algorithm loops endlessly on this point (and all the other points within the loop).Hence, since the set of execution points is finite, to prove that  is finite (that is, Quick Subset Construction terminates), it suffices to show that every execution point cannot be reached more than once in the path .To this end, we consider each of the three scenarios when Quick Subset Construction processes a singularity (q, ).
In scenario  0 , the singularity being processed is (q 0 , ), while the execution point is ( 0 ,  0 ), that is, the first one in the path.Since the singularity (q 0 , ) cannot be generated again subsequently, ( 0 ,  0 ) cannot be reached again.
In scenario  1 , assuming an execution point ( i ,  i ), the singularity (q, ) to be processed is such that there is no -transition exiting q; thus, a transition ⟨q, , q ′ ⟩ is generated, with q ′ being possibly created.Notice that, in subsequent processing, q ′ cannot be unsafe, as a subsequent singularity (q, ) is such that (q) ≤ (q).Hence, the transition ⟨q, , q ′ ⟩ cannot be deleted subsequently to possibly repeat the execution point ( i ,  i ).Even a subsequent unification of another state with q by Unify in a new execution point ( j ,  j ), i < j, cannot repeat ( i ,  i ) because q will be exited by an -transition, a condition that makes  j different from  i .
In scenario  2 , similarly to scenario  1 , assuming an execution point ( i ,  i ), the generation of the new transition ⟨q, , q ′ ⟩ makes q ′ safe in subsequent processing, so that this transition cannot be removed afterwards.Hence, even a subsequent unification of another state with q by Unify in a new execution point ( j ,  j ), i < j, cannot repeat ( i ,  i ) because either there will be a single -transition exiting q (rather than several ones) or ||q ′ || will differ from the extension of the target state of the (single) transition exiting q in  i ; in other ‡ ‡ In other words, when the same NFA  is given in input to both Subset Construction and Quick Subset Construction, the DFA generated from scratch by Subset Construction is identical to the DFA obtained by means of the repair actions performed by Quick Subset Construction.
words, ( i ,  i ) cannot be reached again.In conclusion, assuming that the algorithm does not terminate leads to a contradiction; hence, Quick Subset Construction does terminate.▪ Lemma 2. Every state in  ′ is reachable from the initial state.
Proof.The proof is by induction on the (finite) path of Quick Subset Construction, (Basis) Every state in  0 is reachable from the initial state.Since  0 =  , this is true by assumption.
(Induction) If every state in  i , i ≥ 0, is reachable from the initial state, then every state in  i+1 is reachable from the initial state.Note that the only way to have a disconnection from the initial state q 0 is by removal of transitions, which may hold in scenarios  0 and  2 only.In scenario  0 , all the -transitions exiting q 0 are removed (line 9).Among the target states of the -transitions removed, the states that are not unsafe keep being reached from q 0 .Apparently, instead, all the (unsafe) states in Ū (including the target states of the -transitions removed), might become disconnected from q 0 .Moreover, since all states in Ū are removed along with their entering/exiting transitions (line 17), the target states of these exiting transitions might be disconnected also.However, the transitions ⟨q 0 , , q⟩ inserted in line 11 make these target states still reachable from q 0 .Finally, the transitions ⟨q, , u⟩ removed in line 14 are irrelevant to the disconnection of the target (unsafe) states u ∈ Ū because these states are removed and, as shown above, the removal of their exiting transitions does not cause any disconnection from q 0 .The actions performed in scenario  0 to save the connection with q 0 are somewhat replicated in scenario  2 , with the exception of the generation of a new transition ⟨q, , q ′ ⟩ (line 42).In line 33, all the -transitions exiting q are removed.Among the target states of the -transitions removed, the states that are not unsafe keep being reached from q and, hence, from q 0 .All the (unsafe) states in Ū (including the target states of the -transitions removed), instead, might become disconnected from q, and, consequently, from q 0 .Besides, since all states in Ū are removed along with their entering/exiting transitions (line 41), the target states of these exiting transitions might be disconnected also.However, the transitions ⟨q ′ ,  ′ , q ′′ ⟩ inserted in line 35 make these target states still reachable from q and, hence, from q 0 .Finally, the transitions ⟨q ′′ ,  ′ , u⟩ removed in line 38 are irrelevant to the disconnection of the target (unsafe) states u ∈ Ū because these states are removed and, as shown above, the removal of their exiting transitions does not cause any disconnection from q 0 .Finally, the possible unification of the new state q ′ with an existing state q ′′ in line 43 by means of Unify is apparently another source of possible disconnection.Based on Unify, the transition entering q ′ as well as all the transitions exiting q ′ are moved to q ′′ (lines 6 and 8, respectively), whereas q ′ is eventually removed (line 13).However, since q ′′ is reachable from q 0 , all the target states of the transitions exiting q ′ and inherited by q ′′ continue being reachable from q 0 .▪ Lemma 3. Every transition in  ′ is complete.
Proof.Let  = [( 0 ,  0 ), ( 1 ,  1 ), … , ( m ,  m )] be a path of Quick Subset Construction, which, according to Lemma 1, is finite, in other words,  m =  ′ and  m is empty.We show by induction that every execution point (Basis) Every incomplete transition in  0 is completable.In fact,  0 =  and  0 is the list of initial singularities (cf.Section 4.1), which are relevant to the points of nondeterminism in  .Apart from the singularity (q 0 , ), which is however irrelevant to the notion of completeness as it refers to -transitions exiting q 0 , each other singularity (q, ) is associated with the -transitions exiting q, which are therefore incomplete.In fact, either several -transitions exit q or we have ⟨q, , q ′ ⟩ where q ′ is exited by an -transition.In either case, ||q ′ || ≠ -mapping(||q||,  ), that is, these transitions are incomplete.Based on Definition 3, however, these incomplete transitions are completable owing to the initial singularities (q, ).
(Induction) If every incomplete transition in  i is completable, i ≥ 0, then every incomplete transition in  i+1 is completable.We have to show that all incomplete -transitions in  i+1 are still completable once the relevant singularity (q, ) has been processed in either scenario  1 or  2 .In scenario  1 , each of the two transitions generated in line 24 and 27, respectively, is complete.In scenario  2 , the transition generated in line 42 is complete, while every incomplete transition generated in line 35 is completable, as the creation of q ′ in line 32 comes with the relevant singularities.If unification comes into play (line 43), then the inheritance of the singularities (q ′ , ) relevant to q ′ by q ′′ (the state that has same extension as q ′ ) makes each incomplete transition (q ′′ , ) still completable.Hence, all incomplete transitions in  i+1 are completable, in other words, ( i+1 ,  i+1 ) is viable.
Eventually, since in  ′ =  m every incomplete transition is completable and  m is empty (no singularity exists), no transition is incomplete (otherwise a singularity would exist), in other words, every transition in Proof.This is a consequence of Lemma 3. By contradiction, assume that  ′ is nondeterministic.As such,  ′ either includes an -transition or a state q that is exited by several -transitions.But, on the one hand, no -transition may exist in  ′ because scenario  0 removes all the -transitions exiting the initial state q 0 , while all other -transitions in  are eliminated by the relevant initial singularities, after which no other -transition may be generated.On the other hand, if there are in  ′ two -transitions ⟨q, , q ′ ⟩ and ⟨q, , q ′′ ⟩ exiting q, where ||q no -singularity (q 0 , ) in , in other words, scenario  0 is not applicable.If, instead, n 0 is exited by at least one -transition, then there is the initial singularity (q 0 , ) in , which, based on scenario  0 , provokes in line 8 the enlargement of ||q 0 || to N = -closure(n 0 ,  ), as Q is the set of singleton states in  corresponding to the states in -closure(n 0 ,  ).Since the extension of q 0 cannot be changed afterwords by Quick Subset Construction, we have in any case ||q 0 || = ||d 0 ||.▪ Lemma 6.The transition function of  ′ equals the transition function of .
Proof.Let d ∈ D and q ∈ Q, where ||d|| = ||q||.We first show that the transition function of d (set of transitions exiting d in ) equals the transition function of q (set of transitions exiting q in  ′ ).
Since the transition function of d equals the transition function of q, based on Lemma 5, it follows inductively that  d =  q .▪ Lemma 7. The set of states of  ′ equals the set of states of .
Proof.This is a consequence of Lemmas 5 and 6.
Based on the lemmas above, assuming that the set of final states is computed correctly by Quick Subset Construction, the proof of Theorem 1 is eventually grounded on these facts: Quick Subset Construction always terminates (Lemma 1),  ′ is deterministic (Lemma 4), each state in  ′ is reachable from the initial state (Lemma 2), the initial state of  ′ equals the initial state of  (Lemma 5), the set of states in  ′ equals the set of states in  (Lemma 7), the transition function of  ′ equals the transition function of  (Lemma 6).▪

EXPERIMENTATION
Both Subset Construction and Quick Subset Construction were implemented in a software framework by means of the C++ programming language, under the GNU/Linux 5.4.0-42-genericx86_64 (Ubuntu 18.04.5 LTS) operating system, on a machine with Intel(R) Xeon(R) Gold 6140M CPU (2.30GHz) and 128 GB of working memory.§ § This framework was also required to generate pseudo-random test cases based on a variety of user-defined parameters, as well as to analyze the aggregate data generated by the relevant experiments.The software framework was exploited mainly to: 1. Verify the equivalence of Quick Subset Construction and Subset Construction empirically; 2. Compare the performance of the two algorithms in terms of processing time; 3. Suggest the conditions under which Quick Subset Construction may perform better than Subset Construction.
The main parameters considered for the pseudo-random generation of NFAs are: • Number of states; • Branching factor: the average number of transitions exiting a state; • Epsilon density: the percentage of -transitions; • Height: the number of strata composing a stratified NFA (cf.Section 6.1); • Determinism: the percentage of contiguous strata with deterministic transition function in a stratified NFA (cf.Section 6.1); ¶ ¶ • Alphabet cardinality: the number of labels in the alphabet; • Final density: the percentage of final states.
Once implemented the framework, a phase of experimentation was conducted in order to test the correctness of Quick Subset Construction empirically and to compare its performance with that of Subset Construction.A large number of experiments were carried out on various typologies of finite automata based on different parameter configurations.

Preliminaries
A recurring conjecture of this article is that the raison d'être of Quick Subset Construction lies in its ability to outperform Subset Construction when the NFA is large and the nondeterminism affecting the NFA can be removed by a limited number of repair actions (as compared to the size of the SC-equivalent DFA).Intuitively, these two conditions combined allow Quick Subset Construction to operate more efficiently than Subset Construction does because only a (possibly tiny) fraction of the NFA is updated, thereby leaving a (possibly large) deterministic portion of the NFA untouched.As pointed out in Section 4.1, however, the number of repair actions actually performed (that is, the number of singularities processed) is not directly related to the quantity of nondeterminism in the NFA (that is, the number of initial singularities, cf.Definition 1).In other words, limited quantity of nondeterminism in an NFA does not necessarily translate into limited processing by Quick Subset Construction: much depends on how the states are connected to one another.For example, if a singularity affects the initial state of the NFA, in the worst case, all the states of the NFA may be involved in the repair actions owing to the cascade effect of the generation of new singularities (cf. the experimentation on bad NFAs illustrated in Section 6.6).
A more effective notion roughly suggesting the amount of processing required by Quick Subset Construction in comparison with the processing required by Subset Construction is the impact.§ § The software framework is open source and available at https://github.com/MicheleDusi/QuickSubsetConstruction.¶ ¶ Stratified NFAs allow for tuning the degree of nondeterminism.Specifically, varying the determinism parameter in different problem configurations is key to varying the amount of singularities processed by Quick Subset Construction, which is directly related to the processing time spent to complete the task.This configuration parameter, however, should not be confused with the notion of quantity of nondeterminism introduced in Section 4.1 (cf.Definition 1) which is applicable to any NFA.
Definition 5 (impact).Let  be an NFA and  the corresponding SC-equivalent DFA.The impact ℑ of Quick Subset Construction to transform  into  is the ratio between the number n  of singularities processed and the number n  of transitions in , namely Roughly, the smaller the impact, the more convenient Quick Subset Construction over Subset Construction.## We can reformulate the statement above as follows: limited quantity of nondeterminism in the NFA does not necessarily translate into limited impact.
A precise measure of the benefit (if any) in using Quick Subset Construction is provided by comparing the actual processing time spent by both algorithms in the determinization of the same NFA.The notion of gain serves this purpose.
The gain provides a precise information on how much an algorithm outperforms the other.The convenience in using Quick Subset Construction versus Subset Construction occurs when the gain is positive: the larger the gain, the better the convenience.Like the impact, the gain too is measured a posteriori, after the termination of both algorithms.|||| In the experimentation, five classes of NFAs have been considered, namely random NFAs, acyclic NFAs, stratified NFAs, weak NFAs, and Bad NFAs.Experimental results for each NFA class considered are shown hereafter.

Random NFAs
As the name suggests, a random NFA is generated (pseudo-)randomly based on given configuration parameters.We present hereafter the results of three experiments on random NFAs, named R 1 , R 2 , and R 3 , respectively, where each experiment is designed to compare the performance of Quick Subset Construction with that of Subset Construction by varying one of two relevant configuration parameters, specifically the number of NFA states and the -density, while keeping the other parameters fixed.*** The results of each experiment are spread over two graphs (e.g., Figure 12).In the left-hand graph, the x-axis indicates the range of the varying parameter, the y-axis on the left indicates the range of the processing time, and the y-axis on the right indicates the range of the gain (cf.Definition 6 in Section 6.1).Three curves are plotted on that graph: the processing time of Subset Construction, the processing time of Quick Subset Construction, and the gain.The processing time indicated corresponds to an average of 100 different executions of the corresponding determinization algorithm (this is true for all the experiments, not only for random NFAs), each one relevant to a different NFA that has been generated based on the same parameter values.
In the second (right-hand) graph, which is specific to Quick Subset Construction only, the x-axis indicates the range of the varying parameter (the same as in the left-hand graph), while the y-axis indicates the range of the singularities processed.Displayed on the graph are the bars indicating the (average) number of singularities processed for each different value of the varying parameter.Each bar incorporates a triangle representing the number of initial singularities.A triangle ## The impact, however, can be measured only a posteriori, when Quick Subset Construction terminates; therefore, it cannot be exploited a priori for measuring the convenience in using that algorithm instead of Subset Construction, nor does it provide a precise measure of that convenience, as processing a singularity requires in general more time than creating a transition by Subset Construction (cf. the discussion in Section 6.8).|||| In contrast with the gain, however, the impact requires only the execution of Quick Subset Construction, which provides both the number of singularities processed and the number of transitions in the output DFA.*** These parameters are the branching factor (3), the alphabet cardinality (10), and the final density (0.1).Note how increasing the branching factor is bound to increase the quantity of nondeterminism in the NFA, as well as the impact.Conversely, extending the cardinality of the alphabet is likely to decrease the quantity of nondeterminism, as well as the impact.Instead, the percentage of final states is irrelevant to the impact.pointing upwards (as for all bars in Figure 10) indicates that the total number of singularities processed is larger than the number of initial singularities.If, instead, the triangle points downwards, then the actual number of singularities processed is smaller than the number of initial singularities (cf. Figure 17), a condition that may appear inconsistent.The explanation is simple: a singularity (q, ) may be removed from the singularity list before being processed owing to the removal of state q, which comes with the removal of the associated singularities also.
The results of the first experiment (R 1 ) are displayed in Figure 10, where the varying parameter is the number of NFA states, while no -transitions are involved (-density = 0).Based on the left-hand graph, Quick Subset Construction compares with Subset Construction (the two curves are practically identical, causing the gain to be approximately zero).Based on the right-hand graph in Figure 10, note how the amount of singularities processed by Quick Subset Construction increases linearly with the number of NFA states, which is consistent with an increasing processing time.
Displayed in Figure 11 are the results of the second experiment (R 2 ), where the varying parameter is again the number of NFA states, while there are 50% of -transitions (-density = 0.5).Similarly to the first experiment, Quick Subset Construction compares with Subset Construction, with the gain being approximately zero.Also, the number of singularities processed grows linearly with the number of NFA states.
Finally, shown in Figure 12 are the results of R 3 , the third experiment, where the varying parameter is the -density (percentage of -transitions), ranging from 0 (no -transitions) and 1 (100% of -transitions), while the number of NFA states is fixed (200).In this case, Subset Construction performs slightly better than Quick Subset Construction.Both time curves of the algorithms, however, grow monotonically up to a peak (when the -density is approximately 0.3), and then decrease almost symmetrically when the -density grows further.Unsurprisingly, the trend of the number of singularities processed mimics the trend of the processing time, with a peak when the -density is 0.3.The experimental results presented in Figure 12 raise a natural question: why the curve of the processing time of both algorithms is spire-shaped when varying the -density?Considering Subset Construction, when no -transitions exist in the NFA, nondeterminism is caused by multiple transitions with the same label  and exiting the same state.Hence, only the target states of these transitions (the -mapping) will be embodied in the extension of the corresponding DFA target state.Afterwards, inserting an increasingly percentage of -transitions is bound to increase the quantity of nondeterminism and to make the extensions of the DFA states increasingly large, as the -mapping of a DFA state is likely to include more and more states.This results in a larger number of states as well as a larger number of transitions in the DFA.However, after a certain threshold (about one third of -transitions), the extensions of the DFA states are so large that the number of actual subsets keeps decreasing until it becomes just one (when the -density is 1), whose extension will include the whole set of NFA states.On the other hand, considering Quick Subset Construction, we need to recall how initial singularities are created when -transitions occur.Based on the second rule defined in Section 4.1, if an NFA state n ′ is exited by an -transition, then, for each transition ⟨n, , n ′ ⟩ entering n ′ , where  ≠ , a singularity (n, ) is created.Hence, a single -transition is bound to cause the creation of several (possibly many) singularities, whose processing make Quick Subset Construction to slow down.Now, consider the bar graph displayed on the right side of Figure 12, indicating the number of singularities processed when varying the -density.Unsurprisingly, the number of singularities grows up to the same threshold of -density in which the processing time peaks, and then decreases monotonically when the -density exceeds that value.The point is, on the one hand, increasing a small -density causes the creation of more and more singularities (as pointed out above); on the other hand, however, the larger the number of -transitions, the smaller the probability that a state n ′ exited by an -transition is entered by a non -transition ⟨n, , n ′ ⟩, thereby causing a continuous decrease in the number of singularities created and, hence, a shorter and shorter processing time.
In summary, the experimental results presented above suggest that there is no significant difference in the performance of Subset Construction and Quick Subset Construction when they are applied to random NFAs.This means that the number of singularities processed is comparable to the number of transitions in the SC-equivalent DFA (impact ≃ 1).

Acyclic NFAs
As the name suggests, acyclic NFAs do not include any cyclic path of transitions.Two experiments have been conducted on acyclic NFAs, namely A 1 and A 2 , which are designed to compare Quick Subset Construction with Subset Construction by varying either the number of NFA states or the -density.The results of the fist experiment (A 1 ) are displayed in Figure 13, where no -transitions are involved, while the number of NFA states is varying.Based on the left-hand graph, Quick Subset Construction is invariably outperformed by Subset Construction, with an almost constant (negative) gain.As expected, the right-hand graph in Figure 13 shows how the number of singularities processed by Quick Subset Construction follows the trend of its processing time.
Shown in Figure 14 are the results of the second experiment (A 2 ), where the varying parameter is the -density ranging from 0 (no -transitions) to 1 (all -transitions), while the number of states is kept constant (1000).Even in this experiment, Quick Subset Construction is outperformed by Subset Construction, in a way similar to experiment R 3 (random NFAs), with a time peak around the middle of the -density.The number of the singularities processed shown in the right-hand graph is consistent with the trend of the processing time of Quick Subset Construction.
The results of the two experiments above suggests that Quick Subset Construction is not convenient when NFAs are acyclic.But why?A plausible explanation is that Quick Subset Construction was conceived for cyclic NFAs.In a sense, this algorithm is oversized when applied to acyclic NFAs, mainly because the determinization of an acyclic NFA does not require the management of the level of states.For example, testing the unsafety of a state (cf. the Unsafe function in Section 4.3.1)does not require any reasoning on the levels of the parent states: a state is unsafe simply when it is not entered by any other transition.Also, as shown in the left-hand side of Figure 33 in Section 6.8, the impact (cf.Definition 5) is invariably high (about 0.8), which is bound to make Quick Subset Construction not so convenient regardless of its implementation.
To possibly appreciate Quick Subset Construction over Subset Construction, we need to consider NFAs whose determinization is low-impact.Stratified NFAs serve this purpose.

Stratified NFAs
Since Quick Subset Construction is bound to outperform Subset Construction when the impact is low, that is, when the number of singularities processed is considerably smaller than the number of the transitions in the SC-equivalent DFA, we have considered a class of NFAs that allows for an indirect control on the impact, called stratified NFAs.
Definition 7 (stratified NFA).Let  be an NFA, with level of states in the range [0 .. m].A stratum of  at level , namely S  , is the set of states of  at level .The NFA  is stratified if and only if every transition exiting a state s enters a state s ′ that is either in the same stratum of s or in the next stratum, namely The height of  is the number of strata in  , that is m + 1.
Example 13 (stratified NFA).Depicted on the left side of Figure 15 is the same NFA shown on the left side of Figure 1.According to Definition 7, this NFA is stratified in four strata, namely S 0 , … , S 3 .Notice how the first stratum S 0 involves just the initial state, a property that holds indistinctly for every stratified NFA.
Proposition 1.Let  be a stratified NFA, let  be the DFA SC-equivalent to  , and let ⟨d, , d ′ ⟩ be a transition in .Then, This property can be proven by considering that, based on Subset Construction, ||d ′ || is the -mapping of ||d|| in  .Owing to the property that a transition exiting a state n in  cannot reach a state in an upper stratum, all the NFA states in ||d ′ || will have a level that cannot be shorter than the minimum level of the states in ||d||.
Based on Proposition 1, it follows that the actual set of states in the SC-equivalent DFA of  is constrained by the condition expressed in (10), which therefore is bound to limit the number of possible states, that is, the size of the equivalent DFA and, ultimately, the impact of Quick Subset Construction.In order to actually control the impact, it suffices confining the nondeterminism to a suffix of the stratified NFA, thereby leaving the rest of the NFA (a prefix of strata) deterministic.
Definition 8 (determinism in a stratified NFA).Let  be a stratified NFA with height h.The determinism of  is a number Δ, 0 ≤ Δ ≤ 1, defined as follows.If there is a S k in  , 0 ≤ k < h, such that the transition function of every state in all strata S 0 , … , S k is deterministic and either k = h − 1 or the transition function of the states in S k+1 is in general nondeterministic, then Otherwise, Δ = 0.
Intuitively, the determinism of  is the percentage of contiguous strata composing a prefix of  in which the transition function is deterministic.Consequently, that portion of  will not involve any singularities, thereby remaining untouched by the execution of Quick Subset Construction.
Example 14 (determinism in a stratified NFA).Shown on the right side of Figure 15 is a slight variation of the NFA on the left side, where the label of the transition ⟨2, a, 4⟩ has been replaced by b.This allows the stratified NFA to have all states in both strata S 0 and S 1 with a deterministic transition function.Based on Definition 8, the determinism of this stratified NFA is Note how the determinism of the NFA on the left side of Figure 15 is 0.25 as, based on Definition 8, we have k = 0.
The degree of determinism in a stratified NFA is key to tuning the impact of Quick Subset Construction.Roughly, based on Proposition 1, the larger the determinism, the smaller the number of states in the SC-equivalent DFA and, consequently, the smaller the impact.In other words, we can indirectly have control on the impact by varying the degree of determinism in the stratified NFA.
We present hereafter the results of five experiments on stratified NFAs, named S 1 , … , S 5 , where each experiment is designed to compare the performance of Quick Subset Construction with that of Subset Construction by varying one of the relevant configuration parameters, specifically the number of NFA states, the -density, the height of the NFA (number of strata), and the degree of determinism in the NFA, while keeping the other three parameters fixed.
Displayed in Figure 16 are the results of the first experiment (S 1 ), where the varying parameter is the number of NFA states, while no -transitions are involved (-density = 0).Based on the left-hand graph, Quick Subset Construction outperforms Subset Construction with almost constant gain of approximately 0.5.In other words, the execution of Quick Subset Construction takes about 50% less time than Subset Construction does to perform the determinization task.Based on the right-hand graph in Figure 16, note how the amount of singularities processed by Quick Subset Construction increases with the number of NFA states.The results of the second experiment (S 2 ) are displayed in Figure 17, where the varying parameter is the -density (percentage of -transitions), while the number of NFA states is fixed (10,000).In contrast with experiment S 1 (cf.Figure 16), Quick Subset Construction outperforms Subset Construction only up to a certain value of -density (approximately, 15% of -transitions), beyond which it is increasingly outperformed by Subset Construction, as clearly indicated by a negative gain which becomes overwhelmingly so when the -density is approaching the maximum value (100% of -transitions).
Note how, in contrast with the results presented in Figure 12 of a similar experiment for random NFAs (R 3 ), the processing time of Subset Construction is very different from that of Quick Subset Construction, as it decreases monotonically with increasing -density, owing presumably to the particular topology of stratified NFAs.In fact, according to Proposition 1, the extension of the states in the SC-equivalent DFA is constrained by condition (9), which limits the number of possible subsets, a property that does not hold for random NFAs.Hence, increasing the percentage of -transitions is bound to generate in the equivalent DFA a smaller number of states (with larger extension), as well as a smaller number of transitions.By contrast, the number of initial singularities created according to the second rule in Section 4.1 and processed by Quick Subset Construction does not depend on the stratification of the NFA and, therefore, a wave-shaped curve (substantially similar to the spire-shaped curve in Figure 12) is produced for the same reasons discussed in Section 6.2 for experiment R 3 .Also, based on the bar graph displayed on the right side of Figure 17, note how there are executions of Quick Subset Construction in which the number of singularities processed is smaller than the number of initial singularities, meaning that a subset of the initial singularities are removed before being processed owing to the removal the states associated.
Shown in Figure 18 are the results of S 3 , the third experiment, where the varying parameter is the height (ranging in [10 .. 10,000]), while the number of NFA states is fixed (10,000), as well as the -density (0.5).According to the left-hand graph, Quick Subset Construction is outperformed by Subset Construction, with approximately a constant negative gain of about −0.6.Apparently, the large percentage of -transitions (half of all transitions) is bound to degrade the performance of Quick Subset Construction to such an extent that it is outperformed by Subset Construction considerably.This conjecture may be tested by means of a similar experiment in which -transitions are missing (cf.experiment S 4 ).
Experiment S 4 is very similar to experiment S 3 .The only difference lies in the lack of -transitions (-density = 0), as shown in Figure 19.Removing the -transitions makes Quick Subset Construction to increasingly outperform Subset Construction.This is corroborated by the right-hand side diagram, which shows that the number of singularities processed decreases when the height increases, thereby reducing the impact.
The last experiment (S 5 ) is meant to have control on the impact indirectly by leveraging the degree of determinism in the NFA.As shown in Figure 20, despite a fixed 10% of -transitions, Quick Subset Construction outperforms Subset Construction with increasing gain.This result is consistent with the right-hand graph, which shows how the number of singularities processed decreases linearly with an increasing determinism.As conjectured above, a stratified NFA with a deterministic prefix of contiguous strata is bound to reduce the set of DFA states because the extension of these states cannot include any NFA state in the deterministic prefix.The larger the determinism, the smaller the impact and, consequently, the shorter the processing time.
In summary, the experimental results for stratified NFAs suggest that Quick Subset Construction outperforms Subset Construction when either -transitions are missing or determinism in the NFA is high (cf.Definition 8).
In the next section, we consider a different class of NFAs in which the nondeterminism is localized in the transition function of a single state, called weak NFAs.

Weak NFAs
In A weak NFA is reminiscent of an environment that is continuously modeled by a learning robot (cf.Section 1), where a DFA (the model of the environment) is extended with some transitions/states that cause the DFA to be transformed into an NFA, thereby requiring on-the-fly determinization of an NFA at each new interaction of the robot with the environment.
The results of three experiments on weak NFAs are presented, namely W 1 , W 2 , and W 3 , where each experiment is designed to compare the performance of Quick Subset Construction with that of Subset Construction by varying the number of NFA states, while keeping the other relevant parameters fixed.Since the NFAs are weak, that is, the nondeterminism affects the transition function of a single state, a small impact of Quick Subset Construction is expected and, hence, a large gain.
The results of experiment W 1 are shown in Figure 21, where in each NFA the nondeterminism is caused by a single -transition, while the number of NFA states ranges in [200 .. 2500].While the processing time of Subset Construction grows rapidly with the size of the NFA, Quick Subset Construction performs much better, with a processing time growing slowly.This result is corroborated by the limited number of singularities processed (right side of Figure 21), which is of the order of tens, despite the number of states reaching the order of thousands.As a result, the gain is close to 1 (cf. the left side of Figure 21), meaning that Quick Subset Construction is actually much faster than Subset Construction.The second experiment (W 2 ) is similar to W 1 , as there is just one single -transition causing the nondeterminism in the NFA.Now, however, the range of the number of NFA states is enlarged to [100 .. 12,975].The results are shown in Figure 22, where the processing time is displayed in logarithmic scale.Notice how the time of Subset Construction is plotted as a straight line, meaning that it grows exponentially with the size of the NFA.On the other hand, the processing time of Quick Subset Construction too seems to grow approximately linearly, but with a lower slope, thereby resulting in better performance, which is testified by the curve of the gain growing closer and closer to 1.For example, with the largest NFA, the processing time of Subset Construction is about 60 s, while the processing time of Quick Subset Construction is about 0.3 s, giving rise to a gain of 0.995, in other words, generating the same equivalent DFA by saving 99.5% of the processing time.As with experiment W 1 , the number of singularities processed (right side of Figure 22) is kept within a limited range despite a growing number of NFA states.
The last experiment (W 3 ) is a variant of experiment W 2 , where the -transition is substituted by an -transition,  ≠ , that causes nondeterminism owing to the existence of another -transition exiting the same state.The results are shown in Figure 23, where the processing time is still in logarithmic scale.Note how the evolution of both the processing time and the number of singularities processed is similar to the corresponding evolution in W 2 (cf. Figure 22), indicating that the type of nondeterminism in a weak NFA (-transition rather than -transition) is irrelevant to the performance of the determinization algorithm.In summary, these experimental results show that Quick Subset Construction performs much better than Subset Construction when applied to weak NFAs.

Bad NFAs
When the NFA is large and the impact is small, chances are that Quick Subset Construction outperforms Subset Construction considerably.When, instead, the number of singularities processed compares with the number of transitions in the DFA, that is, when the impact is high, Quick Subset Construction is likely to perform similarly to, if not worse than Subset Construction.To illustrate this point, we have conducted an experiment on so-called bad NFAs.Shown in Figure 24 is one generic such NFA, which includes n + 1 states, two alphabet labels, namely a and b, and just one singularity (0, b), irrespective of the actual value of n.Notwithstanding this, it is shown in the literature † † † that the number of states in the SC-equivalent DFA is invariably 2 n , thereby growing exponentially and becoming close to the worst case of 2 n+1 − 1 DFA states.Since, by increasing n, the number of DFA states becomes overwhelmingly larger than the number of NFA states, Quick Subset Construction is required to generate the SC-equivalent DFA almost entirely as Subset Construction does.Consequently, we expect the two algorithms to perform similarly.The results of the experiment are shown in Figure 25, where the only varying parameter is the number of NFA states, ranging from 3 to 18, in other words, 2 ≤ n ≤ 17; hence, the number of states in the SC-equivalent DFA will range from 2 2 to 2 17 .Note how the processing time of the two algorithms (expressed in logarithmic scale) is similar and grows exponentially.This result is corroborated by the bar chart on the right, which clearly show the number of singularities processed growing exponentially (linearly in logarithmic scale).
One may ask why the processing time of Quick Subset Construction is almost identical to that of Subset Construction, practically so for n ≥ 10 (the gain is approximately zero).It looks like the effort in processing a singularity by Quick Subset Construction compares with the effort of generating a transition by Subset Construction.Why? † † † Compare Reference 9, Section 2.3.6:A bad case for the subset construction.The reason stems from the actual scenarios involved in the processing of singularities.It turns out that only n singularities are relevant to scenario S 2 , while all other singularities (the vast majority of them) are processed in the context of scenario S 1 .The point is that processing scenario S 1 roughly mimics the processing by Subset Construction, as there is no transition exiting the current state that is marked with the label  involved in the singularity: what is to be done is generating the new transition (marked with ) either directed to an existing state or to a newly-created state, which is exactly what Subset Construction does.

Upfront removal of 𝜺-transitions
The experimental results presented in Section 6.4 for stratified NFAs clearly indicate that Quick Subset Construction underperforms when the NFA includes -transitions: the higher the -density, the worse its performance in comparison with Subset Construction.‡ ‡ ‡ A natural question arising from this behavior is whether the upfront removal of -transitions carried out by a standard polynomial-time algorithm may improve the performance of Quick Subset Construction.To answer this question, five additional experiments have been conducted, named P 1 , … , P 5 , where Subset Construction and Quick Subset Construction were run with and without -transitions in the NFAs, thereby allowing for a comparison of the processing time in both conditions.The results of experiment P 1 are shown in Figure 26, where the varying parameter is the number of NFA states.Plotted in the graph on the left side are the curves of the processing time of the two algorithms running on both the original NFA (embodying the -transitions) and the NFA resulting from the elimination of the -transitions.In the latter case, the processing time represents the sum of the time for the elimination of the -transitions (which is independent of the determinization algorithm) and the time for the determinization of the resulting NFA (without -transitions).The curves indicate that both algorithms perform slightly better when -transitions are removed, with Quick Subset Construction being still outperformed by Subset Construction.The bar graph on the right side of the figure shows the corresponding number of singularities processed by Quick Subset Construction in both conditions (with and without -transitions), which happen to be substantially the same.
Displayed in Figure 27 are the results of the second experiment (P 2 ), where the varying parameter is the determinism of the stratified NFA (cf.Definition 8).Note how, with -transitions, Quick Subset Construction starts outperforming Subset Construction with determinism = 0.2.Without -transitions, the performance of Subset Construction ‡ ‡ ‡ Very high-densities of -transitions do normally not occur in classical application domains (see, for example, the domain of regular expressions considered in Reference 35).By contrast, in our experience, high -densities can emerge quite naturally in translated automata.deteriorates, while that of Quick Subset Construction improves up to determinism = 0.8, remaining however better than the performance of Subset Construction for every value of determinism.This improvement is corroborated by the bar graph on the right side of Figure 27, indicating that the number of singularities processed decreases when -transitions are removed.Experiment P 3 is a variation of P 2 , with the exception that the NFAs are larger (including 3000 states), with results being shown in Figure 28.Remarkably, the time curves of Subset Construction almost coincide in both conditions, while Quick Subset Construction improves its performance when the -transitions are removed.
In the next experiment (P 4 ), the varying parameter is the -density.Notice the recurring spire-shaped curve in all four conditions (cf.Figures 12 and 17).These curves indicate that eliminating the -transitions causes a considerable mitigation of the bad performance of both algorithms in the subrange of nondeterminism around 0.7, with Subset Construction performing slightly better than Quick Subset Construction in almost the entire range.
Even in the last experiment (P 5 ), the varying parameter is the -density, where the NFAs considered are acyclic.As shown in Figure 30, the results are somehow similar to those in experiment P 4 (cf.Figure 29): both algorithms perform better when -transitions are removed, at least up to to a threshold of -density of about 0. In summary, removing the -transitions upfront and then applying the determinization process on the resulting NFA seems to improve the performance of both algorithms.Moreover, assuming that the -transitions are missing allows for some simplifications of the outward-oriented determinization performed by Quick Subset Construction.First, the first two rules for the creation of the initial singularities are no longer applicable (cf.Section 4.1), thereby leaving just one rule, which applies when a state in the NFA is exited by several -transitions.Second, scenario  0 in the algorithm disappears, as the initial state of the NFA cannot be exited by any -transitions, thereby leaving only two scenarios:  1 and  2 .Third, the computation of an -mapping (e.g., line 21) is simplified as there is no need for the computation of the -closure of the set of states reached by the transitions exiting such states with the same label.

6.8
Is it all about algorithm implementation?
A natural question raised by the experimental results is whether a different implementation of the algorithms might reduce the performance gap between them, possibly in favor of Subset Construction.For example, for supporting the efficiency of Quick Subset Construction, the set of the entering (in addition to the exiting) transitions is maintained for each automaton state.§ § § Also, the level associated with states is necessary to Quick Subset Construction but completely irrelevant to Subset Construction.The processing of this additional data penalize Subset Construction, which is forced to compute something both costly and unnecessary.Other "unorthodox" implementation choices might penalize both algorithms, such as identifying a DFA state by a string representing a subset of the NFA states rather than a vector of these states.Intuitively, when both algorithms are equally penalized, however, the performance gap is expected to remain unchanged, although the time response may (equally) deteriorate for both algorithms.¶ ¶ ¶ In order to test this conjecture in practice, we have developed an improved implementation of Subset Construction, namely Subset Construction ′ , where all irrelevant data structures specifically designed for supporting Quick Subset Construction have been removed, thereby resulting in lighter code that has no longer the burden of processing unnecessary information, like the entering transitions or the level attached to each automaton state.Moreover, in order to improve the performance of both algorithms, we have also ameliorated the processing of the data structures that are necessary to both Subset Construction and Quick Subset Construction, such as the identification of DFA states.
Then, we have performed four experiments, which are reminiscent of experiments R 1 (random NFAs, cf. Figure 10), A 1 (acyclic NFAs, cf. Figure 13), S 1 (stratified NFAs, cf. Figure 16), and W 1 (weak NFAs, cf. Figure 21), where the NFAs have been generated based on the same configuration parameters adopted in the corresponding original experiments.
In each new experiment, Subset Construction, Subset Construction ′ , and Quick Subset Construction were run on 100 instances of each NFA (based on the same parameters), eventually plotting the average processing time.The results, shown in Figure 31, clearly indicate that Subset Construction is invariably outperformed by Subset Construction ′ , but slightly so.This could make a difference when Quick Subset Construction compares with Subset Construction, as in the experiment for random NFAs (A).However, when Quick Subset Construction outperforms Subset Construction to a major extent, such as with stratified (C) and weak NFAs (D), code optimization in Subset Construction ′ is insufficient to make a practical difference.
Comparing the results of each new experiment with those of the corresponding original experiment, notice how despite the substantial similarity of the time curves of Subset Construction and Quick Subset Construction, the actual time values are considerably smaller than in the original experiments.This is the effect of code optimization carried out on the data structures that keep being shared by both algorithms.
That said, our claim is that, whatever the implementation technique adopted for either algorithm, the fact remains that Quick Subset Construction may still outperform Subset Construction when the impact is below a given threshold; in other words, the better the implementation of Subset Construction, the lower the impact is expected to be for Quick Subset Construction to outperform Subset Construction, and vice versa.
We first support this claim with the help a so-called lake-race metaphor.With reference to Figure 32, imagine a race in which two competing athletes, namely SC and QSC, are expected to reach a point finish starting from a point start, both points being on the edge of a lake (blue region).The peculiarity of the race is that there is absolutely no constraint on the chosen path and the moving technique of each athlete.Since it turns out that SC is a runner and QSC is a swimmer, SC is expected to run on the ground (red curved line), while QSC prefers swimming in the lake (straight blue line).The question is: who will win the race?The answer is not obvious: even if the runner is considerably faster than the swimmer, the path of the swimmer is considerably shorter than the path of the runner; all depends on the actual lengths of the paths and the actual speeds of the athletes.
Let S SC be the length of the path of the runner, let S QSC be the length of the path of the swimmer, let V SC be the (average) speed of the runner, let V QSC be the (average) speed of the swimmer, and let t SC and t QSC be the time of the runner and the swimmer, respectively, to complete the race.The two athletes will reach finish at the same time when t SC = t QSC , that is, when S SC ∕V SC = S QSC ∕V QSC , in other words, when S QSC ∕S SC = V QSC ∕V SC .Hence, denoting with ℑ (impact) the ratio S QSC ∕S SC , whatever the speed of the two athletes, the swimmer will outperform the runner when ℑ < V QSC ∕V SC .In other words, the faster the runner, the smaller ℑ needs to be in order for the swimmer to outperform the runner and vice versa.For example, assuming V SC = 5 m/s and V QSC = 1.25 m/s, ℑ needs to be less than 1.25∕5 = 0.25; in other words, the § § § Typically, this extra information allows for an efficient check of the condition in line 6 of the Unsafe auxiliary function (cf.Section 4.3.1).
¶ ¶ ¶ In case of DFA state identification by strings, this is questionable, as comparing two vectors of states is not necessarily more efficient than comparing two strings.

F I G U R E 31
Results from extended experimentation with random NFAs (cf.experiment R 1 in Figure 10) (A), acyclic NFAs (cf. experiment A 1 in Figure 13) (B), stratified NFAs (cf.experiment S 1 in Figure 16) (C), and weak NFAs (cf.experiment W 1 in Figure 21) (D), where SC ′ is a lighter implementation of Subset Construction.

F I G U R E 32
The lake-race metaphor: The QSC swimmer (blue path on the lake) against the SC runner (red path on the ground).
swimmer will outperform the runner provided that the swim path is less than a quarter of the ground path.If the runner increases the speed to 6.25 m/s, then ℑ needs to be less than 1.25∕6.25 = 0.2.Subsequently, if the swimmer increases the speed to 1.5 m/s, then ℑ needs to be less than 1.5∕6.25 = 0.24.Out of metaphor, whatever the speed of the two algorithms, which partly depend on their implementation, there will be always a threshold of the impact (cf.Definition 5) under which Quick Subset Construction outperforms Subset Construction.Since the impact does not provide a precise measure of the convenience in using Quick Subset Construction, as chances are that processing a singularity requires, on average, more time than creating a transition by Subset Construction, we introduce the notion of time impact, which provides a precise measure of that convenience by comparing the actual processing time spent by both algorithms in the determinization of the same NFA.Definition 9 (time impact).Let  be an NFA and  the corresponding SC-equivalent DFA.The time impact ℑ t of Quick Subset Construction to transform  into  is the ratio between the time t QSC spent by Quick Subset Construction to obtain  and the time t SC spent by Subset Construction to generate  from scratch, namely Clearly, Quick Subset Construction becomes more convenient than Subset Construction when ℑ t < 1.
Example 15 (impact and time impact).Shown in Figure 33 are the bar charts of the singularities processed in experiments A 1 (left) for acyclic NFAs (cf. Figure 13) and W 1 (right) for weak NFAs (cf. Figure 21), which have been extended with the curves of the impact and the time impact.In the chart on the left, the impact is considerably high (above 0.75).Albeit the impact remains under 1, the time impact is always above 1.Consequently, Quick Subset Construction is invariably outperformed by Subset Construction, as already known from Figure 13 (left).By contrast, considering the results in the chart on the right, since the impact is always very low, the time impact remains abundantly under the threshold of 1, thereby indicating that Quick Subset Construction invariably outperforms Subset Construction, as already known from Figure 21 (left).
Note how, based on Definition 9, the gain (cf.Definition 6 in Section 6.1) may be expressed in terms of time impact as follows: Assuming (reasonably) that processing a singularity requires on average more time than generating a transition by Subset Construction, we expect ℑ t > ℑ.The notion of scale factor provides a quantitative relation between ℑ t and ℑ.
Definition 10 (scale factor).Let ℑ and ℑ t be the impact and the time impact, respectively, of Quick Subset Construction for the determinization of an NFA.The scale factor is the ratio In other words, based on ( 7) and ( 13), the scale factor can be expressed as where V SC = n  ∕t SC is the speed of Subset Construction (number of transitions generated per time unit) and V QSC = n  ∕t QSC is the speed of Quick Subset Construction (number of singularities processed per time unit).Intuitively, the scale factor indicates how Subset Construction is faster than Quick Subset Construction and, hence, how small the impact needs to in order for Subset Construction to be outperformed by Quick Subset Construction.The notion of neutral impact provides a threshold for that convenience.
Definition 11 (neutral impact).The neutral impact is the value ℑ of the impact that makes the processing time of Quick Subset Construction equal to that of Subset Construction in the determinization of an NFA.
Proposition 2. The neutral impact is the inverse of the scale factor, namely Proof.Based on Definition 11, the neutral impact is the impact that makes ℑ t = 1, namely This means that Quick Subset Construction starts outperforming Subset Construction when ℑ < ℑ, in other words, when ℑ < V QSC ∕V SC , a result already known from our lake-race metaphor above.
Example 16 (neutral impact).Assume that Subset Construction generates 150 transitions per time unit and Quick Subset Construction processes 90 singularities in the same time unit.The neutral impact is ℑ = 90∕150 = 0.6.That is, Quick Subset Construction is more convenient than Subset Construction when the number of singularities processed is smaller than 60% of the total number of transitions in the SC-equivalent DFA.If, by code amelioration, we improve the speed of Quick Subset Construction from 90 to 105, the neutral impact will raise to ℑ = 105∕150 = 0.7, which allows for increasing the threshold of singularities processed in order to still outperform Subset Construction.Subsequently, if we improve the speed of Subset Construction from 150 to 175, the neutral impact returns to ℑ = 105∕175 = 0.6.So, is it all about algorithm implementation?Apparently not: the faster Subset Construction over Quick Subset Construction, the smaller the impact required for Quick Subset Construction to outperform Subset Construction.In other words, whatever the quality of the code implementing the two algorithms, in theory, there will be always a threshold of the impact under which Quick Subset Construction remains more convenient than Subset Construction.

CONCLUSION
In this article, we presented an algorithm for NFA determinization, named Quick Subset Construction, as an alternative to the classical Subset Construction algorithm.Although the two algorithms generate the same (SC-equivalent) DFA when applied to the same input NFA (cf.Theorem 1), that DFA is obtained very differently.The approach of Subset Construction is outward-oriented: the DFA is generated from scratch, irrespective of the nature of the NFA.The modus operandi of Quick Subset Construction, instead, is inward-oriented: the NFA is progressively transformed into the equivalent DFA by applying a series of repair actions on the NFA that remove the nondeterminism by changing the transition function of the states involved.
final, the string is recognized.The same string is recognized by the equivalent NFA on the left of Figure 1 as follows: matching the first character moves the NFA to the singleton (0, b) = {2}; next, matching the second character moves the NFA to the set of states (2, a) = {3, 4}; finally, matching the third character moves the NFA to the set of states (3, b) ∪ (4, b) = {5}.Since this last set contains a final state, the string is recognized.

Algorithm 1 .begin 5 : 7 : 8 :
Subset Construction1: function Subset Construction( ) →  2: input:  = (Σ, N,  n , n 0 , F n ): an NFA 3: output:  = (Σ, D,  d , d 0 , F d ): a DFA that is equivalent to  , namely the SC-equivalent DFA of  4: Initialize D,  d , and F d as empty sets 6: Insert the initial state d 0 into D, where ‖d 0 ‖ = -closure(n 0 ,  ) if ‖d 0 ‖ includes a final state of  then insert d 0 into F d end if Initialize a stack of states by including d 0

F I G U R E 2
Generation by Subset Construction of the DFA SC-equivalent to the NFA displayed on the left of Figure1.Displayed on top of each intermediate graph is the content of the stack (growing from left to right).

Algorithm 5 .
New (auxiliary function) 1: function New( N) → q 2: input: N: a set of states of the NFA  3:

F
I G U R E 15 Stratified NFAs.

F
I G U R E 21 Results of experiment W 1 (weak NFAs): One -transition, one state with nondeterministic transition function, and number of NFA states varying in range [200 .. 2500].

22 23
Results of experiment W 2 (weak NFAs, processing time in logarithmic scale): One -transition, one state with nondeterministic transition function, and number of NFA states varying in range [100 .. 12,975].Results of experiment W 3 (weak NFAs, processing time in logarithmic scale): -density = 0, one state with nondeterministic transition function, and number of NFA states varying in range [100 .. 12,975].

F I G U R E 24 25
Bad NFA, with n + 1 states and just one initial singularity (0, b), whose SC-equivalent DFA includes 2 n states.Results of the experiment on bad NFAs (processing time and singularities in logarithmic scale): Number of NFA states varying in range [3 .. 18] (hence, number of states in the equivalent DFA ranging in [2 2 .. 2 17 ] = [4 .. 131,072]).

33
Bar charts for experiments A 1 (left) and W 1 (right), which are extended with the impact and the time impact.
are the states in Q (line 7), where the dashed ones are the unsafe states (cf. the Unsafe auxiliary function in Section 4.3.1),namely C, D, and F.Then, the extension of the initial state is enlarged by the set of NFA states involved in Q (line 8), namely {1, 2, 3, 5} (cf. the Enlarge auxiliary procedure in Section 4.3.2),while the -transitions exiting the initial state are removed (line 9), which brings  to the second configuration in Figure Processing by the auxiliary procedure Level after the insertion of a transition from A to H performed by Quick Subset Construction, which causes a reduction in the levels of the states F, G, H, and I (transition labels are omitted).Displayed below each graph configuration is the instance of Λ, where the processing of pairs in gray has no effect on levels.
is a configuration of the FA  being manipulated by Quick Subset Construction, where each state is marked with the relevant level (whereas transition labelsF I G U R E 8are omitted).Let assume now that a new transition from A to H is inserted into  (dashed gray arrow).This insertion causes a call to procedure Level with input parameter Λ = [(H, 1)], as expressed below the graph.
′ || ≠ ||q ′′ ||, then, owing to Lemma 3, ||q ′ || = ||q ′′ || = -mapping(||q||,  ), a contradiction.Hence, nondeterminism cannot hold in  ′ , in other words,  ′ is deterministic.▪ Lemma 5.The extension of the initial state of  ′ equals the extension of the initial state of .Proof.We have to show that ||q 0 || = ||d 0 ||.Based on Subset Construction, ||d 0 || = -closure(n 0 ,  ), where n 0 is the initial state of  .If n 0 is not exited by any -transition in  , then ||d 0 || = {n 0 } = ||q 0 ||, as there is Definition 6 (gain).Let t SC and t QSC denote the processing time of Subset Construction and Quick Subset Construction, respectively, for generating the DFA that is SC-equivalent to the same NFA.The gain  of Quick Subset Construction,  ∈ [−1 .. 1], is the relative portion of processing time that is either saved (if positive) or wasted (if negative) by Quick Subset Construction over Subset Construction, namely a weak NFA, nondeterminism is either caused by one -transition or by two -transitions exiting the same (single) state.A weak NFA may be generated starting from a DFA that is extended with a single transition exiting a state d with either label  or with a label  of the alphabet which is the same of another transition already exiting d.In other words, the transition inserted is either ⟨d, , d ′ ⟩, where d ′ ≠ d, or ⟨d, , d ′ ⟩, where  ≠  and there is another transition ⟨d, , d ′′ ⟩ with d ′ ≠ d ′′ .
F I G U R E 30Results of experiment P 5 (acyclic NFAs with upfront removal of -transitions): Number of NFA states = 1000 and -density varying in range [0 .. 1].