We write our own parser of mathematical expressions and a command-line calculator – InformTFB

We write our own parser of mathematical expressions and a command-line calculator

We write our own parser of mathematical expressions and a command-line calculator

I was thinking about this task while doing engineering work for my master’s thesis. It turned out that I spend a lot of time searching for the same physical units of measurement to check the transformations and the correctness of my calculations.

The annoyance of monotonous work helped me soon realize that units and their transformations never change. Therefore, I don’t need to search for them every time, just save the information once, and then design a simple tool that allows you to automate the verification of transformations and correctness!

As a result, I wrote my own small command-line calculator that parses a semantic mathematical expression and calculates the result with the necessary error handling. The fact that the calculator can handle expressions with SI units made it even more difficult to create. This little “side quest” has significantly accelerated my workflow and, to be honest, I still use it daily in the programming process, even though it took only two days to create it.

Note: the expression parser I designed is a recursive descent parser that uses 

precedence hoisting. I’m not a linguist, so I’m not familiar with concepts like context-free grammar.
In this article, I will talk about my attempt to create a custom, dependency-free microcalculator that can parse mathematical expressions with physical units of measurement and speed up your calculation process on the command line.

I believe that this is not only a curious, but also useful lesson, which can serve as a reference point for creating any other semantic system for processing mathematical expressions! This project allowed me to solve many riddles about the principles of semantic mathematical programs. If you love coding and are interested in linguistics and mathematics, then this article is for you.

Note: of course, I understand that there are ready-made libraries for solving such problems, but this seemed to me a curious project for which I already have a good solution concept. I decided to implement it anyway and am very happy with the result. Examples of such programs are insect, qalculate!, wcalc. An important difference in my solution is that the program does not require “running”, but simply runs from the console.

My first command line calculator

The main task of the command-line calculator is to parse a human-readable mathematical expression with units of measurement, return its value (if it can be calculated), and point the user to the place where there is an error (if it cannot be calculated).

In the next section, I’ll explain how to implement a nifty, self-made command-line calculator in standard C++ in about 350 lines of code.

To make the algorithm and process more clear, we will observe the following example of a mathematical expression::

the result of which should be equal to 1.35 m.

Note: I use square brackets because bash doesn't like round ones. Tildes are equivalent to minus signs, and must be written in such a way as to distinguish them from binary subtraction.

Syntax of mathematical expressions with units of measurement

The calculated mathematical expression must be represented as a uniquely interpreted string of characters. To evaluate an expression, we need to answer a number of important questions:

How are numbers expressed with units of measurement and how can actions be performed with them?

What characters are allowed in the expression and what do they mean?

How are symbols grouped to create structures/operations with mathematical meaning?

How do I convert an expression based on its structure to get a mathematical calculation?

Representation of numbers and units of measurement

In order for various operations to work with “number-unit of measurement” pairs, you need to specify their structure and operators. In the next section, I will briefly explain how to implement such number-unit pairs.

SI units — simple implementation in C++

In the international system of units, physical quantities are expressed as the product of 7 basic units: timelengthmasselectric currentthermodynamic temperaturequantity of matter, and luminous intensity.

Each physical quantity can be created from the product of the powers of these units. We call the complete complex derived unit the “dimension”. Let’s create a structure that reflects this fact, storing in it a vector of degrees associated with each basic unit:

struct unit{              // Generic SI Derived-Unit

  vector<double> dim;     // Vector of base-unit powers

  unit(){};               // Constructors
  unit(dlist d){
    for(auto&e: d)
      dim.push_back(e);
  }

};

void fatal(string err){
  cout<<err<<endl;
  exit(0);
}

//Comparison Operators
bool operator==(const unit& l, const unit& r){
  if(l.dim.size() != r.dim.size())
    fatal("Dimension mismatch");
  for(int i = 0; i < l.dim.size(); i++)
    if(l.dim[i] != r.dim[i]) return false;
  return true;
}

bool operator!=(const unit& l, const unit& r){
  return !(l == r);
}

We can set each basic unit as a constant of type unit:

const unit D   = unit({0, 0, 0, 0, 0, 0, 0}); //No Dimension
const unit s   = unit({1, 0, 0, 0, 0, 0, 0});
const unit m   = unit({0, 1, 0, 0, 0, 0, 0});
const unit kg  = unit({0, 0, 1, 0, 0, 0, 0});
const unit A   = unit({0, 0, 0, 1, 0, 0, 0});
const unit K   = unit({0, 0, 0, 0, 1, 0, 0});
const unit mol = unit({0, 0, 0, 0, 0, 1, 0});
const unit cd  = unit({0, 0, 0, 0, 0, 0, 1});

Let’s define a set of overloaded operators for the units structure. You can multiply/divide different units, but you can’t add or subtract them. When adding / subtracting identical units, the same unit is obtained. A unit without dimension cannot be used as a power, but a unit can be raised to a power:

unit operator+(unit l, unit r){
  if(l == r) return l;
  fatal("Unit mismatch in operator +");
}

unit operator-(unit l, unit r){
  if(l == r) return l;
  fatal("Unit mismatch in operator -");
}

unit operator/(unit l, unit r){
  if(l.dim.size() != r.dim.size())
    fatal("Dimension mismatch");
  for(int i = 0; i < l.dim.size(); i++)
    l.dim[i] -= r.dim[i];
  return l;
}

unit operator*(unit l, unit r){
  if(l.dim.size() != r.dim.size()) 
    fatal("Dimension mismatch");
  for(int i = 0; i < l.dim.size(); i++)
    l.dim[i] += r.dim[i];
  return l;
}

unit operator%(unit l, unit r){
  if(l == r) return l;
  fatal("Unit mismatch in operator %");
}

template<typename T>
unit operator^(unit l, const T f){
  for(int i = 0; i < l.dim.size(); i++)
    l.dim[i] *= f;
  return l;
}

Pairs of values “, the integer-unit»

Numbers associated with units are called “values” and are defined as follows:

struct val{
  double n = 1.0;  // Magnitude (default = unity)
  unit u;          // Dimension

  val(){};  //Dimensionless Single Value
  val(double _n, unit _u):n{_n},u(_u){};
};

bool operator==(const val& l, const val& r){
  if(l.u != r.u) return false;
  if(l.n != r.n) return false;
  return true;
}

Similarly to units, we will define a set of overloaded operators that operate between values and return a new value:

val operator+(val l, const val& r){
  l.u = l.u + r.u;
  l.n = l.n + r.n;
  return l;
}

val operator-(val l, const val& r){
  l.u = l.u - r.u;
  l.n = l.n - r.n;
  return l;
}

val operator*(val l, const val& r){
  l.n = l.n * r.n;
  l.u = l.u * r.u;
  return l;
}

val operator/(val l, const val& r){
  l.n = l.n / r.n;
  l.u = l.u / r.u;
  return l;
}

val operator%(val l, const val& r){
  l.n -= (int)(l.n/r.n)*r.n;
  l.u = l.u%r.u;
  return l;
}

val operator^(val l, const val& r){
  if(r.u != D) fatal("Non-Dimensionless Exponent");
  l.n = pow(l.n, r.n);
  l.u = l.u ^ r.n;
  return l;
}

Derived units are combined as the basic unit

Having a string representing a unit or other physical quantity, we can extract the number-unit pair using a lookup table. Values are created by multiplying basic units and stored in map with a character reference:

map <string, val> ud = {

  //Base Unit Scalings
  {"min", val(60, s)},
  {"km",  val(1E+3, m)},
  //...

  //Derived Units (Examples)
  {"J",   val(1, kg*(m^2)/(s^2))},    //Joule   (Energy)
  //...

  //Physical Constants
  {"R",   val(8.31446261815324, kg*(m^2)/(s^2)/K/mol)},
  //...

  //Mathematical Constants
  {"pi",  val(3.14159265359, D)},
  //...

};

//Extract Value
val getval(string s){
  auto search = ud.find(s);
  if(search != ud.end()){
    val m = ud[s];
    return m;
  }

  cout<<"Could not identify unit \""<<s<<"\""<<endl;
  exit(0);
}

If you specify that some values are dimensionless, you can also include mathematical constants in the search table.

Note: combined units are usually represented by a” key ” or string, which people may understand differently. In contrast, the search table is easy to modify!

Примечание: оператор ^ в таблице поиска требует заключения в скобки из-за его низкого порядка предшествования.

Expression parsing

For my calculator, I set five components of the expression: numbersunitsoperatorsparentheses, and special characters.

Each character in a valid expression can be assigned to one of these categories. Therefore, the first step is to determine which class each character belongs to and store it in a form that includes this information.

Note: “special” characters denote unary operators that transform a value. The code examples are written in C++ and use the standard template library.

We will define a ” parsing class “using a simple numberator and define a” parsing tuple ” as a pair between a symbol and its parsing class. The string is parsed into a “parsing vector” containing ordered parsing tuples.

enum pc {                         //Parse Class
  NUM, UNT, OPR, BRC, SPC
};

using pt = std::pair<pc,char>;    //Parse Tuple

using pv = std::vector<pt>;       //Parse Vector

We create a parsing vector by simply comparing each character to the set of characters contained in each class.

// Error Handling

void unrecognized(int i, char c){

  cout<<"Unrecognized character \""<<c<<"\" at position "<<i<<endl;

  exit(0);

}

// Construct a parse vector from a string!

pv parse(string e){

  pv parsevec;

  // Iterate over the string

  for(string::size_type i = 0; i < e.size(); i++){

    const char c = e[i]; // Get the next character

    // Permissible characters for each class

    string brackets = "[]";
    string operators = "+-*/^%";    //Binary Operators
    string special = "!~E";         //Unary Operators
    string numbers = "0123456789.";

    // Identify the class and add a parse tuple to the vector

    if(numbers.find(c) != string::npos)
      parsevec.push_back(pt(NUM, c));

    else if(isalpha(c))
      parsevec.push_back(pt(UNT, c));

    else if(operators.find(c) != string::npos)
      parsevec.push_back(pt(OPR, c));

    else if(brackets.find(c) != string::npos)
      parsevec.push_back(pt(BRC, c));

    else if(special.find(c) != string::npos)
      parsevec.push_back(pt(SPC, c));

    else unrecognized(i, c);

  }

  return parsevec;

}

The structure of our main program consists in compressing an expression, constructing a parsing vector, and passing it to the evaluator that returns the calculated expression:

// MAIN FILE

using namespace std;

// ...

// Compress the command line input
string compress(int ac, char* as[]){
  string t;
  for(int i = 1; i < ac; i++)
    t += as[i]; // append to string
  return t;
}   // Note that spaces are automatically sliced out

// Compress, Parse, Evaluate
int main( int argc, char* args[] ) {

  string expression = compress(argc, args);
  pv parsevec = parse(expression);

  val n = eval(parsevec, 0);
  cout<<n<<endl;

  return 0;

}

For our example expression, it will look like this:

A parsed example of an expression. Each character in a string can be assigned to a specific category. Numbers are shown in red, units in orange, operators in blue, parentheses in purple, and special characters in green.

We’re done with the easiest one. A parsing vector has been created from incoming command-line data, and now we can work with numbers associated with values.

How do I calculate the parsing vector to get a single value?

Evaluating expressions

We just need to calculate the parsing vector, which requires information about the structure of semantic mathematical expressions.

Here you can make the following important observations::

  1. The numbers and units that are related to each other in an expression are always next to each other
  2. Unary operators can be applied directly to their associated number-one pairs
  3. Binary operators are used between number-one pairs, returning one new number-one pair
  4. Binary operators have a specific order of computation
  5. Parentheses contain expressions that can themselves be evaluated to produce a number-one
  6. pair A balanced expression without unary operators or parentheses always consists of N values and N-1 operators

After creating number-one pairs and applying unary operators, the parsing vector is converted to a balanced sequence of values and operators.

At the same time, parentheses are evaluated recursively inside to reduce them to a single value.

In General, this means that the parsing vector can be calculated using a recursive function that returns a number-one pair.

Recursive function for calculating the parsing vector

The recursive eval function is described by the following process:

  1. Iteratively traversing the parsing vector
  2. When we encounter a parenthesis, we call eval for the internal expression to get the value
  3. Extract the number-one pairs using unary operations and save them to a vector
  4. Extract the binary operations and save them to a vector
  5. We apply the binary operators in the correct order, using our balanced vectors of values and operators

Creating an ordered sequence of values and operators

Let’s start by defining the eval function, which receives an additional parameter n, denoting the starting point:

val eval(pv pvec, int n){

  if(pvec.empty()) return val(1.0, D);
  if(pvec[0].first == OPR) invalid(n);

  vector<val> vvec;   //Value Sequence Vector
  vector<char> ovec;  //Binary Operator Sequence Vector

  // ...

Note: an empty parsing vector returns a dimensionless unit value, and the parsing vector cannot start with an operator.

We iterate through the string, keeping track of the start and end point of the current observable sequence. When we encounter a parenthesis, we find the end point of the parenthesis and call the evaluation function for the inner expression:

  // ...

  size_t i = 0, j = 0;  //Parse section start and end
  while(j < pvec.size()){

    //Bracketing
    if(pvec[j].second == '['){

      i = ++j;  //Start after bracket
      for(int nbrackets = 0; j < pvec.size(); j++){
        if(pvec[j].second == '[') //Open Bracket
          nbrackets++;
        else if(pvec[j].second == ']'){
          if(nbrackets == 0) //Successful close
            break;
          nbrackets--; //Decrement open brackets
        }
      }

      //Open Bracket at End
      if(j == pvec.size())
        invalid(n+i-1);

      //Recursive sub-vector evaluate
      pv newvec(pvec.begin()+i, pvec.begin()+j);
      vvec.push_back(eval(newvec, n+j));
    }

    // ...

This will cause recursion until a parsing vector expression is found that does not contain parentheses, and therefore consists of a balanced sequence of values and operators.

When evaluating an internal parenthesis expression, a certain value is returned. After all parentheses are destroyed, only a balanced sequence of values and operators remains.

Operators can be added directly, and values are set using a combination of numbers, ones, and unary operators. We can create a value by finding a sequence of consecutive parsing tuples representing it, and constructing it accordingly:

    // ...

    //Add Operator
    if(pvec[j].first == OPR)
      ovec.push_back(pvec[j].second);

    //Add Value
    if(pvec[j].first == NUM ||
       pvec[j].first == UNT ||
       pvec[j].first == SPC ){

      i = j; //Start at position j
      while(pvec[j].first != OPR &&
            pvec[j].first != BRC &&
            j < pvec.size()) j++; //increment

      //Construct the value and decrease j one time
      pv newvec(pvec.begin()+i, pvec.begin()+j);
      vvec.push_back(construct(newvec, n+j));
      j--;
    }

    j++; //Increment j

    //Out-of-Place closing bracket
    if(pvec[j].second == ']')
      invalid(n+j);

  }

  // Check if sequence is balanced
  if(ovec.size() + 1 != vvec.size())
    fatal("Operator count mismatch");

  // ...

We construct values from their sequence of parsing tuples, identifying numeric components, units, and unary operators.

The pairing of “number one”, and unary operators

Since this calculator supports floating-point representation, the value can consist of both pre-exponents and degrees. In addition, both of these values can have a sign.

The sign is extracted as the first character followed by a sequence of numbers. The value is extracted using stof:

val construct(pv pvec, int n){

  unit u  = D;       //Unit Initially Dimensionless
  double f = 1.0;    //Pre-Exponential Value
  double p = 1.0;    //Power
  double fsgn = 1.0; //Signs
  double psgn = 0.0;

  size_t i = 0;      //Current Index
  string s;
  bool fp = false;   //Floating Point Number?

  //First Character Negation
  if(pvec[i].second == '~'){
    fsgn = -1.0;
    i++;
  }

  //Get Numerical Elements
  while(i < pvec.size() && pvec[i].first == NUM){
    s.push_back(pvec[i].second);
    i++;
  }

  if(!s.empty()) f = stof(s);   //Extract Value
  s.clear();                    //Clear String

  //...

Note: a unary operator that denotes a sign change, such as -1, is represented here as a tilde (~), because this makes it easier to distinguish it from a binary subtraction operator.

Next, we check for the presence of a floating-point entry and repeat this to get the sign and magnitude of the degree:

  //...

  //Test for Floating Point
  if(pvec[i].second == 'E'){
    i++;
    psgn = 1.0;

    if(pvec[i].second == '~'){
      psgn = -1.0;
      i++;
    }

    while(i < pvec.size() && pvec[i].first == NUM){ //Get Number
      s.push_back(pvec[i].second);
      i++;
    }
    if(!s.empty()) p = stof(s);
    else fatal("Missing exponent in floating point representation.");
    s.clear();
  }

  //Double floating point attempt
  if(pvec[i].second == 'E')
    invalid(n+i);

  //...

Finally, we extract the remaining unit as a string and get the corresponding value. Apply the remaining unary operators at the end and return the final form of the value:

  //...

  //Get Unit String
  while(i < pvec.size() && pvec[i].first == UNT){
    s.push_back(pvec[i].second);
    i++;
  }

  if(!s.empty()){
    val m = getval(s);
    f *= m.n; //Scale f by m.n
    u = m.u;  //Set the unit
  }

  if(pvec[i].second == '!'){
    f = fac(f);
    i++;
  }

  if(i != pvec.size())  //Trailing characters
    invalid(n+i);

  return val(fsgn*f*pow(10,psgn*p), u);
}

This process reduces all combinations of unary operators, numbers, and ones to value structures that can be manipulated using binary operators.:

Here you can see how an expression with a unary operator is reduced to a value.

Note: the position is passed to the creation function so that the absolute position in the error handling expression is known.

Compressing an ordered sequence of values and operators

Finally, we have a balanced sequence of values and binary operators that we can compress using the correct order of operations. Two values linked by an operator between them can be compressed using a simple function:

val eval(val a, val b, char op){
  if(op == '+') a = a + b;
  else if(op == '-') a = a - b;
  else if(op == '*') a = a * b;
  else if(op == '/') a = a / b;
  else if(op == '^') a = a ^ b;
  else if(op == '%') a = a % b;
  else{
    std::cout<<"Operator "<<op<<" not recognized"<<std::endl;
    exit(0);
  }
  return a;
}

To compress the entire balanced sequence, we iteratively traverse the parsing vector once for each operator type in the correct order and perform binary calculations. It is worth noting that multiplication and division can occur simultaneously, as well as addition and subtraction.

  //...

  function<void(string)> operate = [&](string op){
    for(size_t i = 0; i < ovec.size();){
      if(op.find(ovec[i]) != string::npos){
        vvec[i] = eval(vvec[i], vvec[i+1], ovec[i]);
        ovec.erase(ovec.begin()+i);
        vvec.erase(vvec.begin()+i+1, vvec.begin()+i+2);
      }
      else i++;
    }
  };

  operate("%");
  operate("^");
  operate("*/");
  operate("+-");

  return vvec[0];
}

We get the final result for the zero index and return it.

Parentheses are evaluated recursively, as an internally balanced sequence of values and operators. After removing all parentheses from the main expression, the final value is returned.

Note: this recursive structure reflects the tree-like nature of the semantics of a mathematical expression.

Results and conclusions

The framework described above was wrapped in a simple command-line tool that I called ” dima “(short for”dimensional analysis”). The full source code can be found here.

This simple command-line calculator will correctly calculate expressions with units of measurement.

Units are correctly combined and laid out:

 dima J
1 kg m^2 s^-2
> dima J / N
1 m
> dima J/N + 2cm
1.02 m

at the same time allowing you to quickly find out the values of constants:

> dima R
8.31446 kg m^2 K^-1 mol^-1 s^-2

If necessary, the expression lookup table with units can be modified.

You can extend this system by adding a way to specify named functions/transformations.

Another potential improvement: you can add a calculation buffer that stores variables in a new dictionary using the assignment operator, which can be accessed during other subsequent calculations. However, such persistence you will need an entry in the file, or the start of the process.

This semantic mathematical parser can be fully converted and other useful mathematical programs can be created.

For example, you can try algebraically transforming expressions with variables to find a more elegant representation based on some method of evaluation (length, symmetry, repeated expressions, etc.)

Another possible variation is the auxiliary differentiation function, which uses the algorithmic nature of derivatives.

Anderson
Anderson
Web site editor and tester.

Leave a Reply

Your email address will not be published. Required fields are marked *