4
$\begingroup$

This question is asking if the solution $x=v^\ast w$ to the equation

$$x=vx+w$$

(where all constants and variables are regular expressions) is unique or not, and the accepted answer states that it is wrong to "formulate everything in terms of equations" and instead one should formulate things in terms of inequalities. It's not exactly clear to me what 'everything' means here, and I'd like to better understand what 'everything' refers to. Let me make the question more specific.

Given a deterministic finite automaton (DFA), there is a way to construct a regular expression that has the same language as the language of that automaton. One such way arises from solving a system of equations. This way is described at the beginning of this video lecture by Alexandra Silva. This way is about solving equations of the form $$x=vx+w$$ where all variables and constants are regular expressions. It is mentioned in the video that the least solution of this equation is $v^\ast w$. (I guess "least" means least with respect to the relation $\leq$ which is defined by: $r\leq s \iff r+s\leq s$). Given a DFA, one constructs a system of equations of the above form using DFA's states and transitions and solves it using the above formula. This results in a regular expression whose language is claimed to be the same as the language of the original DFA.

Does the accepted answer in the question referenced above imply that this way of going from DFA to regex is wrong? In other words, does this transition fall under the term "everything" which is used in the answer? If it's wrong, then what should the procedure of producing a regex from DFA be in terms of inequalities?

$\endgroup$
2
  • $\begingroup$ Hi and welcome to MSE! Please only ask one question at a time. $\endgroup$ Commented Aug 13, 2024 at 13:22
  • 1
    $\begingroup$ @BenSteffan I believe my additional questions (which I didn't ask you to answer and only included as part of the background on what I don't understand) might have been helpful in answering the main question. But anyway, I deleted them. $\endgroup$ Commented Aug 14, 2024 at 13:57

1 Answer 1

2
$\begingroup$

Let me use capital letters for languages and lower case letters for words and $1$ for the empty word. Let also use $+$ for the union, since we are working in the semiring of languages. Consider the equation $$ X = KX + L\quad (1) $$ where $K$ and $L$ are (non necessarily regular) languages and $X$ is the unknown language.

Its solution has little to do with regular languages and automata and, contrary to the claim of the accepted answer to the referenced question, it does not require to use inequalities.

Proposition [Arden's Lemma] If $K$ does not contain the empty word, then $X=K^*L$ is the unique solution of the equation $X = KX + L$.

Proof. Replacing $X$ by $K^*L$ in the expression $KX+L$, one gets $$ K(K^*L)+L = K^+L + L = (K^+ + 1)L = K^*L, $$ and hence $X=K^*L$ is a solution of (1). To prove uniqueness, consider two solutions $X_1$ and $X_2$ of (1). By symmetry, it suffices to show that each word $u$ of $X_1$ also belongs to $X_2$. Let us prove this result by induction on the length of $u$.

If $|u|=0$, $u$ is the empty word and if $u \in X_1 = KX_1 +L$, then necessarily $u \in L$ since $1 \notin K$. But in this case, $u \in KX_2 + L= X_2$. For the induction step, consider a word $u$ of $X_1$ of length $n+1$. Since $X_1 = KX_1 + L$, $u$ belongs either to $L$ or to $KX_1$. If $u \in L$, then $u \in KX_2 + L = X_2$. If $u \in KX_1$ then $u = kx$ for some $k\in K$ and $x \in X_1$. Since $k$ is not the empty word, one has necessarily $|x| \leqslant n$ and hence by induction $x\in X_2$. It follows that $u\in KX_2$ and finally $u\in X_2$. This concludes the induction and the proof of the proposition.

If $K$ contains the empty word, uniqueness is lost.

Proposition. If $K$ contains the empty word, the solutions of (1) are the languages of the form $K^*M$ with $L \subseteq M$.

Proof. Since $K$ contains the empty word, one has $K^+ = K^*$. If $L\subseteq M$, one has $L\subseteq M \subseteq K^*M$. It follows that the language $K^*M$ is solution of (1) since $$ K(K^*M) + L = K^+M + L = K^*M + L = K^*M. $$ Conversely, let $X$ be a solution of (1). Then $L \subseteq X$ and $KX \subseteq X$. Consequently, $K^2X \subseteq KX \subseteq X$ and by induction, $K^nX \subseteq X$ for all $n$. It follows that $X \subseteq K^*X = \sum_{n \geq 0}K^nX \subseteq X$. Thus $X = K^*X$ and $L \subseteq X$, which shows that the language $X$ can be written as $K^*M$ with $L \subseteq M$: it suffices to take $M = X$.

$\endgroup$

You must log in to answer this question.