Uma classe especial de idiomas: idiomas "circulares". Isso é conhecido?

20

Defina a seguinte classe de idiomas "circulares" sobre um alfabeto finito Sigma. Na verdade, o nome já existe para denotar uma coisa diferente que parece, usada no campo da computação de DNA. AFAICT, essa é uma classe de idiomas diferente.

Uma linguagem L é circular se, para todas as palavras $w$ em $\Sigma^*$ , temos:

$w$ pertence a L se e somente se para todos os números inteiros $k > 0$ , $w^k$ pertence a L.

Essa classe de idiomas é conhecida? Estou interessado nas linguagens circulares, que também são regulares e, em particular, em:

um nome para eles, se eles já são conhecidos
a decidibilidade do problema, dado um autômato (em particular: um DFA), se o idioma aceito obedece à definição acima

fl.formal-languages automata-theory regular-language

— vincenzoml
fonte

1

Esta é uma questão muito interessante. Duas questões relacionadas: 1) se temos um idioma regular L e um DFA associado, podemos torná-lo circular? 2) Dada qualquer linguagem L, é verdade que circ (L) é regular ou possui boas propriedades?

— Suresh Venkat

ps Talvez isso seja óbvio, mas por que você acha que as linguagens circulares são uma subclasse das linguagens regulares?

— Suresh Venkat

3

@Suresh, acho que ele está definindo uma linguagem para ser circular se for a) regular; b) satisfaz uma propriedade de fecho

∀w∈L,n∈N:wn∈L $\forall w \in L, n \in \mathbb{N} : w^n \in L$ .

— Peter Taylor

Crosspost em MO .

— Hsien-Chih Chang

1

Talvez não deva postar agradecimentos, mas essa foi minha primeira pergunta e apreciei muito a qualidade dos comentários, respostas e discussão. Obrigado.

— precisa saber é o seguinte

19

Na primeira parte, mostramos um algoritmo exponencial para decidir a circularidade. Na segunda parte, mostramos que esse é um problema coNP. Na terceira parte, mostramos que toda linguagem circular é uma união de linguagens da forma (aqui pode ser o regexp vazio); a união não é necessariamente disjunta. Na quarta parte, exibimos uma linguagem circular que não pode ser escrita como uma soma disjunta . $r^+$ $r$ $\sum r_i^+$

Editar: Incorporadas algumas correções após os comentários de Marcos. Em particular, minhas afirmações anteriores de que a circularidade é coNP-completa ou NP-difícil são corrigidas.

Editar: Corrigida a forma normal de a . Exibiu uma linguagem "inerentemente ambígua". $\sum r_i^*$ $\sum r_i^+$

Continuando o comentário de Peter Taylor, veja como decidir (extremamente ineficientemente) se um idioma é circular, considerando seu DFA. Construa um novo DFA cujos estados sejam pares dos estados antigos. Este novo DFA executa cópias do DFA antigo em paralelo. $n$ $n$

Se o idioma não for circular, haverá uma palavra modo que, se o passarmos pelo DFA repetidamente, começando com o estado inicial , obteremos os estados modo que esteja aceitando apenas um dos outros não está aceitando (se todos estão aceitando, então a sequência deve alternar para que esteja sempre no idioma). Em outras palavras, temos um caminho de $w$ $s_0$ $s_1,\ldots,s_n$ $s_1$ $s_0,\ldots,s_n$ $w^*$ a que está aceitando, mas um dos outros não está aceitando. Por outro lado, se o idioma é circular, isso não pode acontecer. $s_0,\ldots,s_{n-1}$ $s_1,\ldots,s_n$ $s_1$

Portanto, reduzimos o problema a um simples teste de acessibilidade direcionada (basta verificar todos os tufos possíveis "ruins" ). $n$

O problema da circularidade é difícil de coNP. Suponha que recebamos uma instância 3SAT com variáveis e cláusulas . Podemos supor que (adicione variáveis fictícias) e que seja primo (caso contrário, encontre um primo entre e usando o teste de primalidade da AKS e adicione variáveis e cláusulas fictícias). $n$ $\vec{x}$ $m$ $C_1,\ldots,C_m$ $n = m$ $n$ $n$ $2n$

Considere o seguinte idioma: "a entrada não possui a forma onde é uma atribuição satisfatória para ". É fácil construir um DFA para esse idioma. Se o idioma não for circular, haverá uma palavra no idioma, cujo poder não está no idioma. Como as únicas palavras que não estão no idioma têm comprimento , deve ter comprimento ou . Se for de comprimento $\vec{x}_1 \cdots \vec{x}_n$ $\vec{x}_i$ $C_i$ $O(n^2)$ $w$ $n^2$ $w$ $1$ $n$ , considere vez (ele ainda está no idioma), de modo que é na linguagem e não está no idioma. O fato de que não está nos meios de linguagem que é uma atribuição satisfatória. $1$ $w^n$ $w$ $w^n$ $w^n$ $w$

Por outro lado, qualquer atribuição satisfazer traduz em uma palavra comprovando a não circularidade da língua: a atribuição satisfazer pertence à língua, mas não. Portanto, o idioma é circular se a instância 3SAT for insatisfatória. $w$ $w^n$

Nesta parte, discutimos uma forma normal para linguagens circulares. Considere algumas DFA para uma linguagem circular . Uma sequência é real se (o estado inicial), todos os outros estados estão aceitando, e implica . Assim, toda sequência real é eventualmente periódica, e há apenas muitas seqüências reais finitas (já que o DFA possui muitos estados finitos). $L$ $C = C_0,\ldots$ $C_0 = s$ $C_i = C_j$ $C_{i+1} = C_{j+1}$

Dizemos que uma palavra se comporta de acordo com $C$ se a palavra leva o DFA de estado para o estado , para todo . O conjunto de todas essas palavras é regular (o argumento é semelhante à primeira parte desta resposta). Note-se que é um subconjunto de . $c_i$ $c_{i+1}$ $i$ $E(C)$ $E(C)$ $L$

Dada uma sequência real , defina como a sequência . A sequência também é verdadeiro. Desde há apenas finitamente muitos diferentes seqüências , a linguagem , que é a união de todos também é regular. $C$ $C^k$ $C^k(t) = C(kt)$ $C^k$ $C^k$ $D(C)$ $E(C^k)$

Afirmamos que tem a propriedade de que se então . De fato, suponha que e . Então . Assim, pode ser escrito na forma $D(C)$ $x,y \in D(C)$ $xy \in D(C)$ $x \in C^k$ $y \in C^l$ $xy \in C^{k+l}$ $D(C) = D(C)^+$ para alguma expressão regular . $r^+$ $r$

Cada palavra nos corresponde língua para alguma seqüência verdadeira , ou seja, não existe uma verdadeira seqüência que se comporta de acordo com. Assim é a união de sobre toda a sequência verdadeira . Portanto, toda linguagem circular tem uma representação da forma . Por outro lado, todo idioma é circular (trivialmente). $w$ $C$ $C$ $w$ $L$ $D(C)$ $C$ $\sum r_i^+$

Considere a linguagem circular de todas as palavras sobre que contenham um número par ou 's ou um número par de ' s (ou ambos). Mostramos que não pode ser escrito como uma soma disjunta ; por "disjunção" queremos dizer que . $L$ $a,b$ $a$ $b$ $\sum r_i^+$ $r_i^+ \cap r_j^+ = \varnothing$

Let $N_i$ be the size of the some DFA for $r_i^+$ , and $N > \max N_i$ be some odd integer. Consider $x = a^N b^{N!}$ . Since $x \in L$ , $x \in r_i^+$ for some $i$ . By the pumping lemma, we can pump a prefix of $x$ of length at most $N$ . Thus $r_i^+$ generates $z = a^{N!} b^{N!}$ . Similarly, $y = a^{N!} b^N$ is generated by some $r_j^+$ , which also generates $z$ . Note that $i \neq j$ since $xy \notin L$ . Thus the representation cannot be disjoint.

— Yuval Filmus
fonte

There seem to be a number of errors here. You're reducing from UNSAT, not SAT, so you're showing it's coNP-hard. What's your polynomial time witness for (non)-membership?

— Mark Reitblatt

"Since the only words not in the language have length

n2 $n^2$ " Shouldn't that be

nm $nm$ ?

— Mark Reitblatt

I don't think it's "trivially in coNP". At least, it's not trivially obvious to me. The "obvious" certificate would be a string

l $l$ in the language, and a power

k $k$ such that

lk $l^k$ isn't in the language. But it's not immediately obvious to me why such a word must be polynomially-sized. Maybe it's by a simple fact of automata theory that I'm overlooking.

— Mark Reitblatt

An even more serious apparent flaw is that you jump from each clause being satisfiable individually to the whole formula being satisfiable. Unless I am misreading, of course.

— Mark Reitblatt

I agree that it's not clear that circularity is in coNP. On the other hand, I see no problems in the rest of the argument (now that I've put

n=m $n = m$ ). If each clause is satisfied by the same assignment, then the 3SAT instance is satisfied by this assignment.

— Yuval Filmus

17

Here are some papers that discuss these languages:

Thierry Cachat, The power of one-letter rational languages, DLT 2001, Springer LNCS #2295 (2002), 145-154.

S. Hovath, P. Leupold, and G. Lischke, Roots and powers of regular languages, DLT 2002, Springer LNCS #2450 (2003), 220-230.

H. Bordihn, Context-freeness of the power of context-free languages is undecidable, TCS 314 (2004), 445-449.

— Jeffrey Shallit
fonte

6

@Dave Clarke, L = a*|b* would be circular, but L* would be (a|b)*.

In terms of decidability, a language $L$ is circular if there is an $L'$ such that $L$ is the closure under + of $L'$ or if it is a finite union of circular languages.

(I'm dying to redefine "circular" replacing your $>$ with $\ge$ . It simplifies things a lot. We can then characterise the circular languages as those for which there exists a NDFA whose starting state has only epsilon-transitions to accepting states and has an epsilon-transition to each accepting state).

— Peter Taylor
fonte

You are right. I've removed my incorrect post.

— Dave Clarke

Regarding adaption with

≥ $\geq$ : I am thinking that a minimal DFA should always have exactly one accepting state, namely the start state. Maybe more accepting states can happen, but then they need an

ε $\varepsilon$ -transition to the start state.

— Raphael

1

@Raphael, consider again L = a*|b*. A DFA whose start state is the only accepting state and which accepts a and b must accept (a|b)*.

— Peter Taylor

On the question of decidability, again: suppose you have a DFA with

n $n$ states of which

na $n_a$ are accepting. Suppose it accepts a word

w $w$ , and also accepts

w2 $w^2$ ,

w3 $w^3$ , ...,

wna+1 $w^{n_a+1}$ . Then it accepts

wx $w^x$ for

x>0 $x > 0$ . (Proof is a straightforward application of the pigeonhole principle). If it's possible to show that the minimal (minimising

|w| $|w|$ ) counterexample (

w $w$ ,

x $x$ ) to the circularity of the language accepted by the DFA has length bounded by a function of

n $n$ then brute force testing is possible. I suspect that

|w|<=n+1 $|w| <= n+1$ , but I haven't proved it.

— Peter Taylor

To follow up on @Raphael's idea above. The idea of start state = only accept state is wrong for this problem, but it does capture some interesting property. When M is a minDFA, the start state is the only accept state if and only if L(M) is the Kleene star of a prefix-free language. This is one of my favorite DFA trivia tidbits and thus I am quick to share it! ;)

— mikero

5

Edit: A complete (simplified) PSPACE-completeness proof appears below.

Two updates. First, the normal form described in my other answer appears already in a paper by Calbrix and Nivat titled Prefix and period languages of rational $\omega$ -langauges, unfortunately not available online.

Second, deciding whether a language is circular given its DFA is PSPACE-complete.

Circularity in PSPACE. Since NPSPACE=PSPACE by Savitch's theorem, it is enough to give an NPSPACE algorithm for non-circularity. Let $A = (Q,\Sigma,\delta,q_0,F)$ be a DFA with $|Q|=n$ states. The fact that the syntactic monoid of $L(A)$ has size at most $n^n$ implies that if $L(A)$ is not circular then there is a word $w$ of length at most $n^n$ such that $w \in L(A)$ but $w^k \notin L(A)$ for some $k \leq n$ . The algorithm guesses $w$ and computes $\delta_w(q) = \delta(q,w)$ for all $q \in Q$ , using $O(n\log n)$ space (used to count up to $n^n$ ). It then verifies that $\delta_w(q_0) \in F$ but $\delta_w^{(k)} \notin F$ for some $k \leq n$ .

Circularity is PSPACE-hard. Kozen showed in his classic 1977 paper Lower bounds for natural proof systems that it is PSPACE-hard to decide, given a list of DFAs, whether the intersection of the languages accepted by them is empty. We reduce this problem to circularity. Given binary DFAs $A_1,\ldots,A_n$ , we find a prime $p \in [n,2n]$ and construct a ternary DFA $A$ accepting the language

$L(A) = \overline{\{2w_12w_2\cdots2w_p : w_i \in L(A_{1+(i\mod{n})})\}}.$ (With some more effort, we can make

$A$ binary as well.) It is not difficult to see (using the fact that

$p$ is prime) that

$L(A)$ is circular if and only if the intersection

$L(A_1) \cap \cdots \cap L(A_n)$ is empty.

— Yuval Filmus
fonte

0

Every $s \in L$ of length $p>0$ can be written as $xy^{i}z$ where $x = z = \epsilon$ , $y = w \neq \epsilon$ . It's obvious that $|xy| \leq p$ and $|y| = |w| > 0$ . It follows that the language is regular for non-empty inputs, by the pumping lemma.

For $w= \epsilon$ , the definition holds, since a NDFA that accepts the empty string will also accept any number of empty strings.

The union of the above languages is the language L and since regular languages are closed under union, it follows that every circular language is regular.

By Rice's theorem, $CIRCULARITY/TM$ is undecidable. The proof is similar to regularity.

— chazisop
fonte

1

The pumping lemma is a necessary, but not sufficient, condition for regularity. In particular, there are nonregular languages satisfying the pumping condition. Also, Rice's theorem would say that

$\{\langle M\rangle\vert L(M)\text{ is circular}\}$ is undecidable. This does not mean that

$\{\langle D\rangle\vert L(D)\text{ is circular}\}$ is undecidable (where

$D$ is a DFA,

$M$ a TM)! For instance, emptiness testing for DFAs is decidable, while emptiness testing for TMs is not.

— alpoge

1

Here's a non-computable circular language. Let

$D = \{ 0^x 1 : x \in R\}$ , where

$R$ is some non-computable language (e.g. codes of halting TMs). Then

$D^*$ is circular but clearly non-computable (an oracle for

$D^*$ can be used to decide

$R$ ).

— Yuval Filmus

2

@Peter, have you read this answer? It was trying to prove that any circular language (without the condition of regularity) is regular.

— Yuval Filmus

1

@Yuval, my mistake. @chazisop, the pumping lemma is useful for proving non-regularity of languages, but not regularity. (Besides, the assertion of your first sentence reduces to "Every

$s \in L$ of length

$p > 0$ can be written as

$y^i$ where

$y \ne \epsilon$ ", which is clearly false).

— Peter Taylor

1

Yes, I use CIRCULARITY/TM to refer to this. CIRCULARITY/DFA is probably decidable.

— chazisop