From DFAs to regular expressions (Arden’s rule)
Given a DFA A, Arden’s lemma (below) gives a way to construct a regular expression r s.t. \mathscr{L}(r)=\mathscr{L}(A).1 The same approach adapts naturally to \lambda-NFAs but for simplicity we describe it for DFAs.
Lemma 1 (Arden’s lemma) Given A,B\subseteq \Sigma^*, consider the equation
X=A X\cup B\ ,
\tag{1} then X=A^* B is the smallest solution of Equation 1.
If \lambda\notin A then, this is the only solution. If \lambda\in A, then for every C\subseteq \Sigma^*, X=A^*(B\cup C) is a solution of Equation 1.
Let A=(S,Q,\delta,q_0,F) be a DFA (S is the alphabet). For each q_j\in Q, let X_j be the language of all words that would be accepted by A if q_j were the initial state (instead of q_0). Notice that X_j= \begin{cases} \displaystyle \bigcup_{\substack{s\in S\\\delta(q_j,s)=q_i}} \{s\}X_i&\text{ if } q_j\notin F\ , \\ \displaystyle \{\lambda\}\cup \bigcup_{\substack{s\in S\\\delta(q_j,s)=q_i}} \{s\}X_i &\text{ if } q_j\in F\ . \end{cases} \tag{2} This give a system of equations where the variables are X_j and with Arden’s lemma we can find a way to express each X_j only using unions, concatenations, and Kleene stars, that is we have found a regular expression for it. This is analogous to the way to solve linear equations by eliminating variables. This analogy is even more clear if we write down Equation 2 using the notation for regular expressions: X_j= \begin{cases} \displaystyle \sum_{\substack{s\in S\\\delta(q_j,s)=q_i}} sX_i&\text{ if } q_j\notin F\ , \\ \displaystyle \lambda+ \sum_{\substack{s\in S\\\delta(q_j,s)=q_i}} sX_i &\text{ if } q_j\in F\ . \end{cases} Therefore, to find a regular expression for \mathscr{L}(A) we just need to solve the system above and the regular expression for X_0 gives the answer.
Example 1 Consider the language L=\{w\in \{a,b\}^* \mid |w|_a\in 2\mathbb N\}. The minimal DFA recognizing it is the following.
The system of equations associated to it is
\left\{ \begin{align*} X_0 &= \{b\}X_0\cup \{a\}X_1\cup\{\lambda\}\\ X_1 &= \{b\}X_1\cup \{a\}X_0 \end{align*} \right. Since in the end we are interested into giving a regular expression usually the system above is written directly as equations on regular expressions \left\{ \begin{align*} X_0 &= bX_0+ aX_1+\lambda\\ X_1 &= bX_1+ aX_0 \end{align*} \right. We can apply Arden’s rule to both equations, but since in the end we are interested into writing down a regular expression for X_0 let’s use the rule on the second equation:
X_1=b^*aX_0\ , and substitute it into the first: \begin{align*} X_0 &= bX_0+ ab^*aX_0+\lambda \\ & =(b+ab^*a)X_0+\lambda\ . \end{align*} Applying Arden’s rule again we get \begin{align*} X_0 &= (b+ab^*a)^*\lambda \\ &= (b+ab^*a)^*\ . \end{align*} Hence we got a regular expression for L: (b+ab^*a)^*. \diamond
Footnotes
Recall that with \mathscr{L}(\cdot) we denote the language associated to a DFA/NFA/regular expression/CFG/PDA/TM/…↩︎