Alok Menghrajani
Previously: security engineer at Square, co-author of HackLang, put the 's' in https at Facebook. Maker of CTFs.
This blog does not use any tracking cookies and does not serve any ads. Enjoy your anonymity; I have no idea who you are, where you came from, and where you are headed to. Let's dream of an Internet from times past.
Home | Contact me | Github | RSS feed | Consulting services | Tools & games
Can you write a regular expression which matches multiples of 3 only? I.e. you want to match "3", "03", "24", "1122", but not "5", "10" or "20".
Keep reading to learn how I did it.
Scroll down to see solution ↓
The hard way to find the solution
You might think that you can simply write the solution as:
/^(0|10*2|10*10*1|20*1)+$/
(where 0 is actually [0369], 1 is [147] and 2 is [258]).
You will however end up missing cases like: 1122, 2211 or 222. You can keep adding cases, and you'll eventually
get to the solution, but it's going to be hard...
Using finite state machines
An easier way to solve the problem requires understanding that regular expressions can be written as finite state machines, and vice versa.
Let's start by building the state machine which matches multiples of 3. We know that a number is a multiple of 3, if and only if the sums of the digits are a multiple of 3 (divisibility by three proof).
The state machine is going to process the input and keep track of the sum of the digits. We only need the sum modulo 3, so we'll have a pretty simple state machine with 3 states: state A (starting state), state B (we are off by 1), state C (we are off by 2).
When we are in state A, if we get a "0", "3", "6" or "9", we remain on state A. If we get a "1", "4" or "7" we move to state B. If we get a "2", "5" or "8" we move to state C:
Similar rules apply when we are in state B:
And finally, here is the full state machine:
Converting the state machine
To convert the state machine into a regular expression, we first need to write down some equations.
For each state, we write down how to get there:
A = ∅ | A[0369] | B[258] | C[147] B = A[147] | B[0369] | C[258] C = A[258] | B[147] | C[0369]
∅ represents the initial state. Having to write [0369], [147] and [258] is cumbersome, so we'll only write 0, 1 and 2 respectively:
A = ∅ | A0 | B2 | C1 B = A1 | B0 | C2 C = A2 | B1 | C0
The goal now is to substitute B and C, in order to
get a forumula for A. Recursive rules like
X = Xw | Y | Z
become X = Yw* | Zw*
,
which can also be written as X = (Y | Z) w*
:
A = ( ∅ | B2 | C1 ) 0* B = ( A1 | C2 ) 0* C = ( A2 | B1 ) 0*
C is easy to substitute:
A = ( ∅ | B2 | ( A2 | B1 ) 0* 1 ) 0* B = (A1 | ( A2 | B1 ) 0* 2 ) 0*
We repeat this process:
A = 0* | B2 0* | ( A2 | B1 ) 0* 1 0* B = A1 0* | ( A2 | B1 ) 0* 2 0*
A = 0* | B2 0* | A2 0* 1 0* | B1 0* 1 0* B = A1 0* | A2 0* 2 0* | B1 0* 2 0*
A = ( 0* | B2 0* | B1 0* 1 0* ) ( 2 0* 1 0* )* B = ( A1 0* | A2 0* 2 0* ) ( 1 0* 2 0* )*
We can now substitute B:
A = ( 0* | ( A1 0* | A2 0* 2 0* ) (1 0* 2 0* )* 2 0* | ( A1 0* | A2 0* 2 0* ) (1 0* 2 0* )* 1 0* 1 0* ) (2 0* 1 0* )*
A = 0* ( 2 0* 1 0* )* | A1 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* | A2 0* 2 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* | A1 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )* | A2 0* 2 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )*
At this point, we have something like X = a | Xb | Xc
,
which can be expressed as X = a (b | c)*
:
A = 0* ( 2 0* 1 0* )* ( 1 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* | 2 0* 2 0* ( 1 0* 2 0* )* 2 0* ( 2 0* 1 0* )* | 1 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )* | 2 0* 2 0* ( 1 0* 2 0* )* 1 0* 1 0* ( 2 0* 1 0* )* )*
We found a solution, this rule can be converted to a regular expression! We can however simplify it a little.
a* ( b a* )*
becomes ( a | b )*
:
A = 0* ( 2 0* 1 0* | 1 0* ( 1 0* 2 0* )* 2 0* | 2 0* 2 0* ( 1 0* 2 0* )* 2 0* | 1 0* ( 1 0* 2 0* )* 1 0* 1 0* | 2 0* 2 0* ( 1 0* 2 0* )* 1 0* 1 0* )*
Using the same rule again:
A = ( 0* | 2 0* 1 | 1 ( 0 | 1 0* 2 )* 2 | 2 0* 2 ( 0 | 1 0* 2 )* 2 | 1 ( 0 | 1 0* 2 )* 1 0* 1 | 2 0* 2 ( 0 | 1 0* 2 )* 1 0* 1 )*
Solution
The above rule can be converted to a regular expression, by anchoring the regexp with ^ and $:
/^(0|20*1|1(0|10*2)*2|20*2(0|10*2)*2|1(0|10*2)*10*1|20*2(0|10*2)*10*1)*$/
We also need to replace back the 0 with [0369], 1 with [147] and 2 with [258]:
/^([0369]|[258][0369]*[147]|[147]([0369]|[147][0369]*[258])*[258]|[258][0369]*[258]([0369]|[147][0369]*[258])*[258]|[147]([0369]|[147][0369]*[258])*[147][0369]*[147]|[258][0369]*[258]([0369]|[147][0369]*[258])*[147][0369]*[147])*$/
Don't believe me? Try it out:
Some links
- regex.alf.nu: test your regexp skills
- A regular expression crossword (pdf) (source: MIT Mystery Hunt 2013)
- https://www.quaxio.com/regexp_lint/: A tool I wrote to visulize regexps
- greenery: FSM/regex conversion Ruby library
Credits
- Erling for challenging me with this stuff :)
- Joel for finding a shorter solution:
/^(0|20*1|(1|20*2)(0|10*2)*(2|10*1))*$/
- Martin Camitz for pointing out a mistake