Archive

Monthly Archives: April 2013

No subject is more core to Computer Science than Automata Theory and yet in my humble opinion is the one that many students neglect and despise. If the subject is not properly introduced the majority will find it too abstract, dry and pointless. This combined with the academic requirement of reproducing the formal proofs drains all the sweetness from its study.  And the fact is, yes the subject is abstractness at its glory and yet if approached the right way students can easily give form and structure that they can identify and relate. The same goes with the proofs, and yes there is a lot to deal, the formal language used in the canonical texts usually become abstruse if handled without help. Thus towards the end of the semester students try to vacate as much space in their heads as possible to stuff in the writings of the pioneers in the field, just to flush out the trash on the day of reckoning (end-semester exams :D).

If the above resonates with you in one manner or the other, I guess you are either doing CS or have done CS. With these assumptions let me get straight to the topic, which is an attempt to convey my understanding of ‘Pumping Lemma’ for regular languages. If you think you haven’t really understood the idea behind this method of disproving regularity of a formal language, be happy to give it another try. If otherwise, I hope not to disappoint you and urge you to have a look.

Let me get you into the mood my brushing up the basics.Regular languages are the ones accepted by a Finite Automaton, and are the simplest of the formal languages. Finite automaton are the machines that do not have access to a working memory and are informally defined as a set of states and rules that define transition from one state to another on a particular input. These machines do all their work by jumping from one state to another. A finite automaton constructed for a regular language only accepts strings in that particular language. That is a finite automaton of a regular language decides whether an arbitrary string belongs to the given language. A language is not regular if we cannot design an Finite automaton for it. But trying to prove a language is not regular by failing to come up with an equivalent Finite automaton is futile. Instead methods like the Pumping Lemma are used to bring out characteristics of a language that are contrary to regular languages. I hope this small intro has brought back those forgotten ideas or have reinforced what you have learned. I wish I do write about these in detail in the future.

Before we start to prove certain languages are not regular we need to understand what the pumping lemma is. A lemma by definition is an idea which in itself is not very useful but in conjunction with other ideas can be put to use. The same goes with pumping lemma, which in itself forms the base axiom and is derived from the Pigeon Hole principle. You must have heard about the pigeon hole principle, and probably must have readily discarded it as common sense. And yes it is common sense. If there are m pigeons and n pigeon holes (nests) ,if m>n and all pigeon gets a pigeon hole then it implies that one or more pigeon holes contain more than one pigeon. The same idea when applied to finite automaton forms the pumping lemma. Every state in a FA (finite automaton) consumes an input symbol and switches to another state. A state consumes or takes in more than one input if the state has a loop. On consuming an input, a state with loop transitions to itself and thus allowing itself to co nsume more input symbols. So without any looping states, if the length of the input string is m the number of states in the FA is n=m+1 . (the first state doesn’t consume any input , and is the extra 1) . Now what shall we conclude if the length of the accepted string is greater than the number of states by more than one. n>m+1 .  That is, the number of symbols in the input string is much greater the number of states. Well we can readily conclude that there is one or more states that consume more than one input, or one or more states have loops. The input symbols consumed by loops become a substring. The substring that is consumed by the looping states can repeat again and again and still be consumed by those looping states.

Those familiar with pumping lemma might say this this is just pigeon hole principle and the lemma has more things to it. But I think this is pretty much soul of the lemma and the additions are just for specificity and for things to work. So for being precise we say there is a string thats is accepted by a FA (thus belongs to a regular language) and is longer than the number of states . This string as its longer than the number of states has a substring that is accepted by a loop. Because these loops don’t restrict the number of such substrings the substrings can repeat. When theses substrings repeat they produce strings longer than the original. These new and longer strings are also accepted by the FA. Therefore these new strings should also belong to the regular language the FA represented. The repetition of the substring is the ‘pumping’ in the pumping lemma. Thus we can say if a language is regular its strings can be pumped. Therefore if we disprove the pump-ability of a language then the language is not regular. The lemma is not yet fully specific there are certain restrictions which shall follow.

Now in order to start proving that certain languages are non regular we need to enter a challenge mode. We need someone, an adversary who claims that a particular language is regular. The adversary surely must have built the FA for it (which is a lie) but doesn’t want to show it to us, lets guess he/she doesn’t believe in the principles of free software. Your adversary may not want to show the FA, but lets ask him/her the number of states in the FA. And lets say the reply is n. Now lets take a string, z, that belongs to the regular language under scrutiny and in length is greater than or equal to n. Surely there must be some looping states in the FA ( |z|\geq n doesn’t necessarily mean looping states, but the following conditions will make it right). Now we ask the adversary , since he/she knows the innards of his FA, to partition the string z such that he/she marks out the pump-able substring as z=u(v)^{i}w . Where v is the looping substring, and the super-script i means it can be pumped. With the condition that |uv|\leq n and |v|\geq 1 .

Now lets assume we have an adversary who claims that a language given as 0^{m}1^{m} | m>0 is regular. The example strings that belong to this language are 01,0011,0000011111  . The exchanges between you and the adversary is as follows.

  •  You ask the adversary how many states the FA has.
  •  Adversary replies with say 4 ( n=4 )
  •  You choose a string , z , that belongs to the language who’s length is a function of n.  The string you choose is 0^{4}1^{4} \Rightarrow 00001111 ( |z|\geq n )
  •  The adversary verifies that z indeed belongs to the language.
  •  You now ask the adversary to partition the string , marking out the repeating substring as z=u(v)^{i}w , because the length of the string you chose was greater than n. The condition being that |uv|\leq n and |v|\geq 1
  •  The adversary now partitions the string as he/she sees fit. (the adversary is clearly lying or doesn’t really know how to construct a FA). The partitioning the adversary proposes is {\xrightarrow[00]{u}}{\xrightarrow[00]{v}}{\xrightarrow[1111]{w}} .
  •  You verify if the conditions were met. Now you chooses a i=2 and pump in the substring v   twice. You get 00\overbrace{0000}1111 \Rightarrow 0000001111 , which clearly doesn’t belong to the language 0^{m}1^{m} . If the language was regular according to the pumping lemma the new substring must also belong to the language. Here that doesn’t happen.
  •  You declare the language has been proven non-regular and the adversary goes back home defeated.

The challenge method is the correct way to prove a language is non regular using the pumping lemma. But that really means you have to play double role, and act out the adversary by yourself.Now let me try to windup by answering the obvious questions that you should have asked. Why you chose z based on n, why the conditions are important and why you chose i=2 and pumped it only twice.

The conditions and the choice of z are to make sure the lemma works, otherwise the adversary can partition the string in ways that could be pumped. Lets look at these one by one.First lets consider the most obvious one, |v|\geq 1. Well if v was empty, |v|=0 , then v could be pumped any number of times and the string would be the same and surely be accepted again by the FA. Thus v should not be empty. To justify the other conditions we need another language. Well the other conditions aren’t really required for disproving that 0^{m}1^{m} is non regular. But for example another language given as ww | w\in\left \{ 0|1 \right \}^{*} can be proved to be regular if we don’t enforce those conditions.Possible strings in this language are 00, 0101, 1111,100001,.. etc. Assume we don’t enforce |uv|\leq n and the input we gave was z=001001 . The adversary can now partition the string as \xrightarrow[]{u}\xrightarrow[001001]{v}\xrightarrow[]{w} , with u=w=\epsilon (empty string), v=z . Now this could be pumped and still be in the language. Now on choosing the right string z. Assume we are dealing with same language and we chose z=01010101 . Now the adversary can partition the string as z=\xrightarrow[01]{u}\xrightarrow[0101]{v}\xrightarrow[01]{w} , where v=0101 , and pump v any number of times and still produce strings in the same language. So choosing the string in the first place in very important. The best way to choose z would be to incorporate n , the number of states, into the string. For example 0^{n}10^{n}1 is good enough for disproving this one. Now in disproving 0^{m}1^{m} , why did you choose  i=2 , well you can choose any value for i . If one value doesn’t help, choose another.

Before stopping let me also make it certain that the pumping lemma can only be used to prove a language is not regular. It can not be used to prove a language is regular. If a string cannot be pumped then language is non regular but if it can be pumped then its not necessary that language is regular. So thats it. What do you think? Did this help you understand the lemma, or did I just confuse you more, or did I give some false ideas, or do you think this was just overly explained and unwontedly stretched? Go ahead and comment about it.

 
 
PS.
-The examples were taken from MIT OCW Automata, Computability, and Complexity lecture notes, there’s more examples in there.  (kinda feel i should have just provided this link here instead of this write-up )
-This was the first time i used Latex, should have been easy but wordpress made it painful..really painful.
-and the first time i wrote a full prose in Vim.. VimRocks!!