Finite State Machines

hbghlyj · 2022-8-19 13:30

Last edited by hbghlyj 2022-8-31 10:05logicbox - a programming puzzle game - Re-invented in HTML 5!
字符串操作的游戏
给定输入(到后面是多个), 排列方块, 使其可以达到给定的输出.
类似于Manufactoria

kuing · 2022-8-19 15:45

这啥东西啊

看起来是个游戏但又不知怎么玩完全看不懂啥意思。

发上来这里做什么？

hbghlyj · 2022-8-19 20:29

Finite-state machine
A finite-state machine (FSM) or finite-state automaton (FSA, plural: automata), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number of states at any given time. The FSM can change from one state to another in response to some inputs; the change from one state to another is called a transition. An FSM is defined by a list of its states, its initial state, and the inputs that trigger each transition.
FreeCodeCamp - State Machines - Basics of Computer Science
A finite state machine is a mathematical abstraction used to design algorithms.
In simpler terms, a state machine will read a series of inputs. When it reads an input, it will switch to a different state. Each state specifies which state to switch to, for a given input. This sounds complicated but it is really quite simple.
Imagine a device that reads a long piece of paper. For every inch of paper there is a single letter printed on it–either the letter ‘a’ or the letter ‘b’.
As the state machine reads each letter, it changes state. Here is a very simple state machine: $type

graph.svg (14.37 KB, Downloads: 131) The circles are “states” that the machine can be in. The arrows are the transitions. So, if you are in state s and read an ‘a’, you’ll transition to state q. If you read a ‘b’, you’ll stay in state s.
So if we start on s and read the paper tape above from left to right, we will read the ‘a’ and move to state q.
Then, we’ll read ‘b’ and move back to state s.
Another ‘b’ will keep us on s, followed by an ‘a’ — which moves us back to the q state. Simple enough, but what’s the point?
Well, it turns out that you can run a tape through the state machine and tell something about the sequence of letters, by examining the state you end up on.
In our simple state machine above, if we end in state s, the tape must end with a letter ‘b’. If we end in state q, the tape ends with the letter ‘a’.
...
The output of a state machine is its final state. It goes through all its processing, and then the final state is read, and then an action is taken. A state machine doesn’t do anything as it moves from state to state.
It processes, and when it gets to the end, the state is read and something external triggers the desired action (for example, dispensing a soda can). This is an important concept when it comes to non-deterministic finite state machines.

hbghlyj · 2022-8-19 21:47

Non-deterministic Finite State Machines
Non-deterministic finite state machines are finite state machines where a given input from a particular state can lead to more than one different state.
For example, let’s say we want to build a finite state machine that can recognize strings of letters that:

Start with the letter ‘a’
and are then followed by zero or more occurrences of the letter ‘b’
or, zero or more occurrences of the letter ‘c’
are terminated by the next letter of the alphabet.

Valid strings would be:

abbbbbbbbbc
abbbc
acccd
acccccd
ac (zero occurrences of b)
ad (zero occurrences of c)

So it will recognize the letter ‘a’ followed by zero or more of the same letter of ‘b’ or ‘c’, followed by the next letter of the alphabet.
A very simple way to represent this is with a state machine that looks like the one below, where a final state of t means that the string was accepted and matches the pattern. $type

graphviz.svg (2.95 KB, Downloads: 108) Pattern matching finite state machine Do you see the problem? From starting point s, we don’t know which path to take. If we read the letter ‘a’, we don’t know whether to go to the state q or r.
There are a few ways to solve this problem. One is by backtracking. You simply take all the possible paths, and ignore or back out of the ones where you get stuck.
This is basically how most chess playing computers work. They look at all the possibilities — and all the possibilities of those possibilities — and choose the path that gives them the greatest number of advantages over their opponent.
The other option is to convert the non-deterministic machine into a deterministic machine.
One of the interesting attributes of a non-deterministic machine is that there exists an algorithm to turn any non-deterministic machine into a deterministic one. However, it is often much more complicated.
Fortunately for us, the example above is only slightly more complicated. In fact, this one is simple enough that we can transform it into a deterministic machine in our head, without the aid of a formal algorithm.
The machine below is a deterministic version of the non-deterministic machine above. In the machine below, a final state of t or v is reached by any string that is accepted by the machine. $type

graphviz-1.svg (3.43 KB, Downloads: 97) A deterministic finite state machine The non-deterministic model has four states and six transitions. The deterministic model has six states, ten transitions and two possible final states.
That isn’t that much more, but complexity usually grows exponentially. A moderately sized non-deterministic machine can produce an absolutely huge deterministic machine.

Regular Expressions

If you have done any type of programming, you’ve probably encountered regular expressions. Regular expressions and finite state machines are functionally equivalent. Anything you can accept or match with a regular expression, can be accepted or matched with a state machine.
For example, the pattern described above could be matched with the regular expression: a(b*c|c*d)
Regular expressions and finite state machines also have the same limitations. In particular, they both can only match or accept patterns that can be handled with finite memory.
So what type of patterns can’t they match? Let’s say you want to only match strings of ‘a’ and ‘b’, where there are a number of ‘a’s followed by an equal number of ‘b’s. Or n ‘a’s followed by n ‘b’s, where n is some number.
Examples would be:

ab
aabb
aaaaaabbbbbb
aaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbb

At first, this looks like an easy job for a finite state machine. The problem is that you’ll quickly run out of states, or you’ll have to assume an infinite number of states — at which point it is no longer a finite state machine.
Let’s say you create a finite state machine that can accept up to 20 ‘a’s followed by 20 ‘b’s. That works fine, until you get a string of 21 ‘a’s followed by 21 ‘b’s — at which point you will need to rewrite your machine to handle a longer string.
For any string you can recognize, there is one just a little bit longer that your machine can’t recognize because it runs out of memory.
This is known as the Pumping Lemma which basically says: “if your pattern has a section that can be repeated (like the one) above, then the pattern is not regular”.
In other words, neither a regular expression nor a finite state machine can be constructed that will recognize all the strings that do match the pattern.
If you look carefully, you’ll notice that this type of pattern where every ‘a’ has a matching ‘b’, looks very similar to HTML. Within any pair of tags, you may have any number of other matching pairs of tags.
So, while you may be able to use a regular expression or finite state machine to recognize if a page of HTML has the <html>, <head>; and <body> elements in the correct order, you can’t use a regular expression to tell if an entire HTML page is valid or not — because HTML is not a regular pattern.

Turing Machines

So how do you recognize non-regular patterns?
There is a theoretical device that is similar to a state machine, called a Turing Machine. It is similar to a finite state machine in that it has a paper strip which it reads. But, a Turing Machine can erase and write on the paper tape.
Explaining a Turing Machine will take more space that we have here, but there are a few important points relevant to our discussion of finite state machines and regular expressions.
Turing Machines are computationally complete — meaning anything that can be computed, can be computed on a Turing Machine.
Since a Turing Machine can write as well as read from the paper tape, it is not limited to a finite number of states. The paper tape can be assumed to be infinite in length. Of course, actual computers don’t have an infinite amount of memory. But, they usually do contain enough memory so you don’t hit the limit for the type of problems they process.
Turing Machines give us an imaginary mechanical device that lets us visualize and understand how the computational process works. It is particularly useful in understanding the limits of computation. If there is interest I’ll do another article on Turing Machines in the future.

hbghlyj · 2022-8-19 23:16

Pumping lemma for regular languages
In simple words, for any regular language $L$, any sufficiently long string $w$ (in $L$) can be split into 3 parts. i.e. $w = xyz$ , such that all the strings $xy^nz$ for $n \geq 0$ are also in $L$.

For example, the following image shows an FSA. $type

An_automat_accepting_the_language_a(bc) d.svg (5.54 KB, Downloads: 77) The FSA accepts the string: abcd. Since this string has a length at least as large as the number of states, which is four, the pigeonhole principle indicates that there must be at least one repeated state among the start state and the next four visited states. In this example, only $q_1$ is a repeated state. Since the substring bc takes the machine through transitions that start at state $q_1$ and end at state $q_1$, that portion could be repeated and the FSA would still accept, giving the string abcbcd. Alternatively, the bc portion could be removed and the FSA would still accept giving the string ad. In terms of the pumping lemma, the string abcd is broken into an $x$ portion a, a $y$ portion bc and a $z$ portion d.

hbghlyj · 2022-8-19 23:34

zh.wikipedia.org/wiki/迈希尔－尼罗德定理
在形式语言理论中，Myhill–Nerode 定理提供了一个语言是正则语言的必要和充分条件。它近乎专门的被用来证明一个给定语言不是正则的。
定理陈述
给定一个字母集 (alphabet) $\Sigma$ 以及其生成的语言 (language) $L\subseteq\Sigma^*$，我們先來定義兩個字串 $x, y\in\Sigma^*$ 之間彼此的關係：如果對於所有的後接字串 $z\in \Sigma^*$ (注意！$z$ 可以是空字串) 都會讓 $xz$ 和 $yz$ 同時屬於或不屬於 $L$，那麼我們說 $x$ 和 $y$ 不可區分 (indistinguishable)；相反地如果存在某個後接字串 $z\in \Sigma^*$ 讓 $xz$ 和 $yz$ 其中一個屬於而另外一個不屬於 $L$，則我們說 $x$ 和 $y$ 是可區分的 (distinguishable)。如果 $x$ 和 $y$ 不可區分，我們說它們滿足關係 (relation) $R_L$，數學式子寫成 $x\ R_L\ y$。我們也很容易证明這種關係是種等价关系 (equivalence relation)，因此它可以把 $\Sigma^*$ 划分成一个或多个等价类 (equivalence class)。

Myhill–Nerode 定理声称一套語言 $L$ 是正規的「若且唯若」$\Sigma^*$ 被 $R_L$ 所切分出的等价类数目是有限的，此外這個等價類數量又恰好是可辨識 $L$ 的最小 DFA 的狀態數量。在詳細的證明中「一個等價類會恰好對應到最小 DFA 裡的一個狀態」，如果先給定一个自动机，则任意兩個能驱使它走到同一個状态的字串 $x$ 和 $y$ 必在同一个等价类中；如果先給定一個等价类划分，那可以轻易地构造出一个自动机使其状态恰好對應到等价类。
定理證明

方向一：如果 $\Sigma^*$ 被 $R_L$ 所切分出的等价类数目是無限多的，那麼語言 $L$ 無法正規
這個方向比較簡單，我們只要證明一個引理 (lemma)：可辨識 $L$ 的 DFA 狀態數量必須不小於等價類數目即可，而這可以使用反證法 (鴿籠原理) 說明。如果 DFA 狀態數量少於等價類數量，那代表必定有兩個不屬於同一個等價類的字串 $x$ 和 $y$ 會引導 DFA 到同一個狀態，如果它們又繼續有後接字串 $z$，就會讓 $xz$ 和 $yz$ 也走到同一個狀態，如此一來根據定義，$x$ 和 $y$ 應該要屬於同一個等價類，造成矛盾，因此 DFA 狀態數量必須不小於等價類數目。由這個結論可以推得，當等价类数目無限多時，這個 DFA 的狀態數量也必須是無限多，和「有限」狀態機的定義矛盾，換句話說我們找不到一台 DFA 可辨識 $L$，因此 $L$ 不正規。
方向二：如果 $\Sigma^*$ 被 $R_L$ 所切分出的等价类数目是有限的，那麼必存在一台同等數量狀態的 DFA 可辨識 $L$ 使其正規
我們先構造出一台 DFA，先假設它的每一個狀態都剛好代表一個等價類 (也就是一個狀態承裝一個等價類裡的所有字串，從起點開始輸入該字串務必到達該狀態)，而轉移函數的決定方式就是從一個狀態隨意抓取一個代表字串，看看它吃了一個字母之後的字串會落在哪個狀態裡面，就讓它走去那裏。這個走向會唯一，理由很簡單可以自己想。對於每個狀態和每個轉移字母都窮舉一遍，最後再觀察哪些狀態包含了被接受的字串，直接讓它們變成終點態 (final state)。一個等價類裡如果有一個字串被接受，那同一個類裡的所有其他字串也會通通被接受，因為根據定義，後接字串可以是空字串。到目前為止，我們有了一定數量的狀態 (圓圈圈)、所有的轉移函數、一個唯一的起點 (看空字串在哪裡即可)、所有的終點，是一台合法的 DFA。現在的問題是，要怎麼證明這台 DFA 能確實辨識 $L$，也就是說對於任意字串輸入都能驅使 DFA 走到我們當初賦予每個狀態必須各自代表一個等價類的任務？

這必須對字串輸入的長度作數學歸納法才能證明。如果字串輸入長度為 0，也就是空字串，那它必須留在起點，而我們起點的取法就剛好是看空字串的位置，所以輸入為空字串時的狀態落點正確；如果對於所有長度不超過 $\ell$ 的字串輸入，它們的狀態落點皆正確，那麼對於任何長度為 $\ell + 1$ 的字串輸入 $w$，皆可分解為 $w = ua$，其中 $u$ 的長度為 $\ell$ 而 $a$ 的長度為 $1$ (字元)，走完 $u$ 之後的落點根據假設是正確的，剩下的一個字元 $a$ 根據我們上述決定轉移函數的方式，自然也會繼續走向我們當初所期望的落點；如此依序遞迴下去，就能說明對於任意字串輸入，這台 DFA 確實能妥善的把 $\Sigma^*$ 裡的每個字串分到正確的類別，並決定接受與否。於是乎我們根據這個 $R_L$ 創造了一台能辨識 $L$ 的 DFA，而且這台 DFA 的狀態數量剛好就是 $\Sigma^*$ 被 $R_L$ 所切分出的等价类數量。

用途和结论
Myhill–Nerode 定理的一个结论是语言 $L$ 是正则的(就是说可被有限状态机接受，当且仅当 $R_L$ 的等价类的数目是有限的。

直接推论是如果一个语言定义等价类的无限集合，它就不是正则的。这个推论经常被用来证明一个语言不是正则的。

例如，由可以被 3 整除的二进制数组成的语言是正则的。有三个等价类 3 - 被 3 除的时候余数是 0, 1 和 2 的数。接受这个语言的极小自动机有对应于等价类的三个状态。

再給一例，我們想說明 $A = \{0^n1^n:n\ge0\}$ 不是正規的。原因是，給定任意兩個不同長度的 $0$ 字串，我們都能接上一個後接字串將這兩個字串分辨開來，舉例來說，給定 $000$ 和 $00000$，我們就能接上 $111$ 讓 $000111$ 合法而 $00000111$ 不合法。所以有無限多個字串 $0$、$00$、$000$、$0000$、... 等彼此之間都是不可分辨的，這意味著需要無限多個狀態的自動機才能辨識這個語言，和正規語言定義矛盾，因此 $A$ 不是正規語言。

Account		Remember me	Forgot password
Password			Register account

Finite State Machines

Related threads

Regular Expressions

Turing Machines

Quick Reply