On 17 October 2014 18:48, Yoann Padioleau <pad@fb.com> wrote:
Also is there a place explaining the naming conventions?
I have a hard time understanding what is the difference
between tcom(), acom(), xcom(), and then there’s  complex() …

the compiler assumes you've know your  Aho and Ullmann  either by reading them or by working with them.
this will be useful later.

complex runs tcom, ccom, acom, xcom in that order on a node (and each of those can recurse).
the compiler works by a series of tree transformations, informed by attributes attached to each node,
representing types, notional costs, addressability, and other ... stuff.

tcom has a comment!

  /*
   * evaluate types
   * evaluate lvalues (addable == 1)
   */

so that's sorted. "evaluate types" means it does most of the type checking and making casts explicit.
t->addable is set to 1 if t is an l-value.  L-value and R-value are terms introduced by Christopher Strachey (see https://www.itu.dk/courses/BPRD/E2009/fundamental-1967.pdf): briefly, lvalues
are forms legal on the left-hand side of an assignment ("address-like objects"), and r-values legal only the right-hand side ("value-like objects")
where "objects" means what it used to mean before it was ruined.

acom is Aho and Ullman's acommute (with all the exercises done). basically it does code improvements
taking advantage of associative-commutative operations, and extends that to look for further optimisations
mentioned in Thompson's paper on the compiler. Ritchie's PDP-11 C compiler did something similar (but he
called it acommute and distrib). comments such as "bust terms out" and "look for factorable terms c1*i + c1*c2*j -> c1*(i + c2*j)
suggest what it's doing). essentially it flattens levels of an expression tree using associative and commutative operators into an array
and rearranges them.

xcom, which is machine-dependent, also has a comment. it assigns an index of complexity to a sub-expression, which is usually
expressed as the number of registers needed to compute it (and good luck with that on an x86), but it's also a good place to look
for machine-dependent aspects of machine addressing, such as indexing, double-indexing, scaling, etc and it sets values in each
tree node to mark that.

"balancing", "casting", "coercion" and other terms connected with languages, types and compilation were often introduced by Algol-68.
"Balancing" for instance is making  operand types match each other (eg, int+long => long+long).