|
Non-deterministic Finite State Machines
Non-deterministic finite state machines are finite state machines where a given input from a particular state can lead to more than one different state.
For example, let’s say we want to build a finite state machine that can recognize strings of letters that:
- Start with the letter ‘a’
- and are then followed by zero or more occurrences of the letter ‘b’
- or, zero or more occurrences of the letter ‘c’
- are terminated by the next letter of the alphabet.
Valid strings would be:
- abbbbbbbbbc
- abbbc
- acccd
- acccccd
- ac (zero occurrences of b)
- ad (zero occurrences of c)
So it will recognize the letter ‘a’ followed by zero or more of the same letter of ‘b’ or ‘c’, followed by the next letter of the alphabet.
A very simple way to represent this is with a state machine that looks like the one below, where a final state of t means that the string was accepted and matches the pattern.
graphviz.svg
(2.95 KB, 下载次数: 51)
Pattern matching finite state machine
Do you see the problem? From starting point s, we don’t know which path to take. If we read the letter ‘a’, we don’t know whether to go to the state q or r.
There are a few ways to solve this problem. One is by backtracking. You simply take all the possible paths, and ignore or back out of the ones where you get stuck.
This is basically how most chess playing computers work. They look at all the possibilities — and all the possibilities of those possibilities — and choose the path that gives them the greatest number of advantages over their opponent.
The other option is to convert the non-deterministic machine into a deterministic machine.
One of the interesting attributes of a non-deterministic machine is that there exists an algorithm to turn any non-deterministic machine into a deterministic one. However, it is often much more complicated.
Fortunately for us, the example above is only slightly more complicated. In fact, this one is simple enough that we can transform it into a deterministic machine in our head, without the aid of a formal algorithm.
The machine below is a deterministic version of the non-deterministic machine above. In the machine below, a final state of t or v is reached by any string that is accepted by the machine.
graphviz-1.svg
(3.43 KB, 下载次数: 44)
A deterministic finite state machine
The non-deterministic model has four states and six transitions. The deterministic model has six states, ten transitions and two possible final states.
That isn’t that much more, but complexity usually grows exponentially. A moderately sized non-deterministic machine can produce an absolutely huge deterministic machine.
Regular Expressions
If you have done any type of programming, you’ve probably encountered regular expressions. Regular expressions and finite state machines are functionally equivalent. Anything you can accept or match with a regular expression, can be accepted or matched with a state machine.
For example, the pattern described above could be matched with the regular expression: a(b*c|c*d)
Regular expressions and finite state machines also have the same limitations. In particular, they both can only match or accept patterns that can be handled with finite memory.
So what type of patterns can’t they match? Let’s say you want to only match strings of ‘a’ and ‘b’, where there are a number of ‘a’s followed by an equal number of ‘b’s. Or n ‘a’s followed by n ‘b’s, where n is some number.
Examples would be:
- ab
- aabb
- aaaaaabbbbbb
- aaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbb
At first, this looks like an easy job for a finite state machine. The problem is that you’ll quickly run out of states, or you’ll have to assume an infinite number of states — at which point it is no longer a finite state machine.
Let’s say you create a finite state machine that can accept up to 20 ‘a’s followed by 20 ‘b’s. That works fine, until you get a string of 21 ‘a’s followed by 21 ‘b’s — at which point you will need to rewrite your machine to handle a longer string.
For any string you can recognize, there is one just a little bit longer that your machine can’t recognize because it runs out of memory.
This is known as the Pumping Lemma which basically says: “if your pattern has a section that can be repeated (like the one) above, then the pattern is not regular”.
In other words, neither a regular expression nor a finite state machine can be constructed that will recognize all the strings that do match the pattern.
If you look carefully, you’ll notice that this type of pattern where every ‘a’ has a matching ‘b’, looks very similar to HTML. Within any pair of tags, you may have any number of other matching pairs of tags.
So, while you may be able to use a regular expression or finite state machine to recognize if a page of HTML has the <html>, <head>; and <body> elements in the correct order, you can’t use a regular expression to tell if an entire HTML page is valid or not — because HTML is not a regular pattern.
Turing Machines
So how do you recognize non-regular patterns?
There is a theoretical device that is similar to a state machine, called a Turing Machine. It is similar to a finite state machine in that it has a paper strip which it reads. But, a Turing Machine can erase and write on the paper tape.
Explaining a Turing Machine will take more space that we have here, but there are a few important points relevant to our discussion of finite state machines and regular expressions.
Turing Machines are computationally complete — meaning anything that can be computed, can be computed on a Turing Machine.
Since a Turing Machine can write as well as read from the paper tape, it is not limited to a finite number of states. The paper tape can be assumed to be infinite in length. Of course, actual computers don’t have an infinite amount of memory. But, they usually do contain enough memory so you don’t hit the limit for the type of problems they process.
Turing Machines give us an imaginary mechanical device that lets us visualize and understand how the computational process works. It is particularly useful in understanding the limits of computation. If there is interest I’ll do another article on Turing Machines in the future. |
|