A regular expression is a sequence of characters which forms a pattern, used to match character combinations in a string. Regular expressions are very useful when you are dealing with textual data. These can be used to extract useful information from a string, and can also be used for find and replace use cases. I have used regular expressions extensively during my thesis project in Indian Institute of Technology, Delhi.
Some of the use cases where I have used Regular expressions are- extract the required information like name of the person and his date of birth from the data scraped from the internet, check if the date is in the required format, filter out the unuseful data from the text, extract the relationships among the entities based on the pattern etc. For all these use cases, I have used python programming language. A few weeks back, I was asked to write a REPL interpreter in Scala to evaluate a given mathematical expression. This blog post covers how to create REPL interpreter using Regex in Scala?
The requirements of the assignment as follows-
(a + b ) x (a + c) -> a x ( b + c)
0 + e -> e
e + 0 -> e
1 x e -> e
e x 1 -> e
e x 0 -> 0
0 x e -> 0
There is a built in util class for regular expressions in scala- Regex. The class is defined in the package scala.util.matching. An instance of the class Regex represents a compiled regular expression pattern. The compilation of regular expressions is an expensive operation, therefore, it is recommended to define frequently used regex in the program once, outside any loop.
Creating a Regex object is an easy task. A Regex object can be created easily from the string using the implicit method r. For solving the above requirements, I have created multiple Lists of Regexes. Let us consider the requirements of simplifying the expressions using multiple mathematical identities specified in requirement (3).
Given an expression, the problem is to identify if the expression matches the left hand side of any of the mathematical identities. I solved the problem using Regex. I defined a regular expression for the left hand side of each of the mathematical identities. Regular expression for LHS of distributive property-
raw"\( (\d+) \* (\d+) \) \+ \( (\d+) \* (\d+) \)".r("f11", "f12", "s11", "s12")
raw"^(0 \+ (\d+))".r
raw"^((\d+) \+ 0 )".r
raw"^((\d+) \* 1 )".r
raw"^(1 \* (\d+))".r
raw"( 0 \+ (\d+))".r
raw"( (\d+) \+ 0 )".r
raw"( (\d+) \* 1 )".r
raw"( 1 \* (\d+))".r
raw"(0 \* \d+)|(\d+ \* 0)".r
Decoding symbols used in above regex-
After defining regular expressions for all the mathematical identities, the next step is to simplify the expression by replacing the expression with the RHS of the identities wherever applicable. The identities are divided into 3 categories- distributive identity, identities returning number itself and identities returning zero. I used the following methods for simplifying the expressions-
replaceAllIn
replaceAllIn(target: CharSequence, replacer: (Match) ⇒ String): String
group
method of Match class
subgroups
method of Match class
var expression = <input>
listOfRegex.foreach(regex =>
expression = regex.replaceAllIn(expression, m => m.subgroups(1))
)
I have used similar logic to further simplify the expression using distributive and zero identities.
The above logic of search and match continues until you cannot further simplify the expression.
I have used a map to store the value of variables. For the expressions which are assigning values to any variable, the idea is to solve the RHS and then update the value of the variable in the map. For the expressions, in which a variable is part of the RHS of the expression, I iterated through the map and replaced the variable names with the value stored in the map. I have used a similar logic to find and replace the variables with the corresponding values as for simplifying the expression using regular expressions.
You can find the complete code here on github.
That's it for today. Thanks for reading, I hope this article is helpful!