💾 Archived View for soviet.circumlunar.space › zwatotem › snapshot › solution › definition.gmi captured on 2024-12-17 at 10:31:42. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-09-29)

-=-=-=-=-=-=-

Definition of Expression Problem

Basic definition

Definition

Suppose our program models an external domain and defines a set of data types describing a set of objects from the external domain. Those data types may differ by structure, but all conform to a common interface. We then want to define a set of operations that all should handle all aforementioned types, possibly in different ways, specific to individual types. Expression Problem seeks for a procedure of defining these data types and functions in such a way, that an addition of a new type does not modify any existing code and addition of a new operation also does not modify any existing code.

Example

The basic, and historically first example of Expression Problem, and source of the name is a scenario of incrementally changing requirements, where the implementer is tasked with creating a system for processing syntax trees for expressions in some hypothetical programing language. Initially the requirements only demand two types of expression – a number literal and binary plus operator and a single operation to conduct on them – expression evaluation. A typical solution in an object-oriented style (see Listing 1.2.1) involves creating two classes: Literal and BinaryPlus, containing all the data necessary for each case, along with the Execute method for execution of the expression. Common sense, as well as the need for polymorphism, when it comes to the type of left and right subexpression of the binary plus will result in creation of the abstraction over these classes in the form of an abstract class, or an interface Expression, which includes the specification for the execute method for all the specific classes.

abstract class Expression {
    public abstract int Evaluate();
}

class Literal(int value) : Expression {
    public int Value = value;
    public override int Evaluate() {
         return value;
    }
}

class Addition(Expression left, Expression right) : Expression {
    public Expression Left = left;
    public Expression Right = right;
    public override int Evaluate() {
        return Left.Evaluate() + Right.Evaluate();
    }
}

A solution in a functional style (see Listing 1.2.2) would see a definition of the Expression type as a sum type (also known as tagged union) of two cases: Literal and BinaryPlus. It would also define a single function execute, which takes an expression and returns its value. The function would have two separate implementations and one would be selected with a pattern matching technique. These pattern matches would be resolved at compile time on a call-by-call basis.

type Expression =
    | Literal of int
    | Addition of Expression * Expression
 
let rec evaluate = function
    | Literal value -> value
    | Addition(left, right) -> evaluate left + evaluate right

After the required domain is modelled, specification is extended by one additional data type: a negation operator and one new operation: getting a string representation of the expression. In the object-oriented approach (in Listing 1.2.3) adding the new data type requires no modification of existing code, only an addition of a new class. Adding a new operation, however, requires changes to the abstraction on all the existing classes, as they now have to include the Stringify method.

interface Expression {
    public int Evaluate();
    public string Stringify();
}
 
class Literal(int value) : Expression {
    public int Value = value;
    public int Evaluate() {
        return value;
    }
    public string Stringify() {
        return value.ToString();
    }
}
 
class Addition(Expression left, Expression right) : Expression {
    public Expression Left = left;
    public Expression Right = right;
    public int Evaluate() {
        return Left.Evaluate() + Right.Evaluate();
    }
    public string Stringify() {
        return $"({Left.Stringify()}) + ({Right.Stringify()})";
    }
}
 
class Negation(Expression operand) : Expression {
    public Expression Operand = operand;
    public int Evaluate() {
        return -Operand.Evaluate();
    }
    public string Stringify() {
        return $"-({Operand.Stringify()})";
    }
}

In the functional approach (in Listing 1.2.4) adding new function stringify requires no modification to the existing definitions. Adding the Negation data type requires modifying the definition of expression, as well as modification of all the existing functions, by addition of a pattern to match and an implementation.

type Expression =
    | Literal of int
    | Addition of Expression * Expression
    | Negation of Expression
 
let rec evaluate = function
    | Literal value -> value
    | Addition(left, right) -> evaluate left + evaluate right
    | Negation(operand) -> -evaluate operand
 
let rec stringify = function
    | Literal value -> value.ToString()
    | Addition(left, right) -> $"({stringify left}) + ({stringify right})"
    | Negation(operand) -> $"-({stringify operand})"

This example illustrates, how a solution to Expression Problem is not straightforward, and why it has been a point of research for the past three decades. In the following part of the chapter, we will try to specify the exact meaning of this definition.

Extensions

Available literature mostly agrees on the definition quoted at the beginning of this chapter, but different sources interpret it in a variety of ways.

Syntactical code organization

Some sources focus on an organizational aspect of code – namely: how different definitions are being organized in syntactic structures (Tarr et al., 1999). In this interpretation our main point of concern is the fact that certain implementations are being bound syntactically into a single structure, and upon addition of new type/operation this structure needs to be extended, and therefore modified (from a syntactic standpoint). We take note of the phenomena of scattering and tangling. Scattering is a placement of several pieces of code, which touch on a single concern, in separate units of code. In our example scattering can be observed in two ways: in object-oriented solu-tion methods that share a concern of the same operation are scattered across all the classes. In functional solution operations concerning one type of ob-ject is scattered between different functions. Tangling occurs when different units of code in a grouping touch on different concerns. In our example tan-gling is also visible in two ways. In object-oriented solution several methods are tangled in the same class, despite representing different operations. In functional solution implementations for different types are tangled in each function. This perspective, focused on code organization mostly applies to situations, where one compilation unit is being developed, and in the bound of that unit changes occur over time, extending the data and operations han-dled by the program.

External extensibility

A different branch of research is focused on the external extensibility (Szyperski, 1996). This perspective assumes an externally defined module with a set of data types and operations, which are being used in in-house developed program. In this line of reasoning the external module (dependency) and internal module (consumer) are separate compilation units. This brings a consequence of true inability of change of dependency code, as it is shipped in a compiled state with its source code potentially unavailable. Therefore, a solution requires a special structure of compilation targets, which allows for extension at link time (Pirkelbauer et al., 2007).

namespace BasicExpressions;

public interface Expression
{
  public abstract int Evaluate();
}

public class Literal(int value) : Expression
{
  public int Value      { get; set; } = value;
  public int Evaluate() => Value;
}

public class Addition(Expression left, Expression right) : Expression
{
  public Expression Left       { get; set; } = left;
  public Expression Right      { get; set; } = right;
  public int        Evaluate() => Left.Evaluate() + Right.Evaluate();
}

For an example, let us consider the same problem domain as before. Assume the external module consists of functionality from initial specification, i.e. two types of expression and a single operation defined on them. The implementation of this module is equal to that presented in Listing 1.2.1 and Listing 1.2.2, however now we will need to account for module related syn-tax (Listing 1.2.5 and Listing 1.2.6).

module BasicExpressions
type Expression =
    | Literal of int
    | Addition of Expression * Expression
 
let rec evaluate = function
    | Literal value -> value
    | Addition(left, right) -> evaluate left + evaluate right

The dependency module may be distributed and used separately for different clients in a form of compilation unit. The problem is then understood as writ-ing a consumer module, that uses definitions of data types and operations given by dependency, extends it with definitions of new data types and new operations and makes this extended functionality available for further con-sumers. Let us illustrate this with object-oriented and functional approach to this problem.

namespace ExtendedExpressions;
using Basic = BasicExpressions;

public interface Expression : Basic.Expression
{
  public string Stringify();
}

public class Literal(int n) : Basic.Literal(n), Expression
{
  public string Stringify() => Value.ToString();
}

public class Addition(Expression left, Expression right) :
		Basic.Addition(left, right), Expression
{
  public string Stringify()
		=> $"({Left.Stringify()}) + ({Right.Stringify()})";
}

public class Negation(Expression negated) : Expression
{
  public Expression Negated     { get; set; } = negated;
  public int        Evaluate()  => -Negated.Evaluate();
  public string     Stringify() => $"- ({Negated.Stringify()})";
}

Naïve object-oriented approach (Listing 1.2.7) uses two strategies to extend the dependency, one for adding new data variants, and a second for adding new operations. For adding new data variants, it is once again enough to cre-ate a new class, which implements all the methods defined in dependency. For the addition of new methods, a new Expression interface is created, that extends the original Expression interface by all the newly introduced method declarations. New implementation for each existing data variant is then cre-ated, reusing all the old behaviour through inheritance, and adding new be-haviour directly.

module ExtendedExpressions
open BasicExpressions; module Basic = BasicExpressions

type Expression =
    | BasicExpression of Basic.Expression
    | Negation of (Expression)

let rec evaluate = function
    | BasicExpression b -> Basic.evaluate b
    | Negation e -> - evaluate e
let rec stringify = function
    | BasicExpression (Literal n) -> string n
    | BasicExpression (Addition (left,right)) -> 
$"({stringify (BasicExpression left)}) + ({stringify (BasicExpression right)})"
     | Negation e -> $"-({stringify e})"

Naïve functional approach (Listing 1.2.8) again, uses two strategies – one for extending data variants and the other for extending operations. If it is only required to add a new operation, it can be done by simply adding a new function. However, if at least one new data variant needs to be added it is necessary to create a new union type which enumerates the original Expression as one of its variants and enlists newly introduced variants separately. All functions, including those existing in the original module need to be redefined in terms of this new data type, delegating implementation to the original functions wherever possible.

Note, that both naïve approaches do not achieve their goals. In particular, the point of failure is wherever any of the data variants composes polymorphically of the base abstraction.

When we look at the redefinition of the Addition class, it inherits from original Addition, which composes two members of type BasicExpressions.Expression, but Addition would need them to be of type ExtendedExpressions.Expression to call the Strigify method recursively.

Similarly, in the functional approach a BasicExpression(Addition(left,right)) variant of Extended.Expression composes left and right as Basic.Expressions, so a data structure representing expression where negation is within addition is impossible to construct.

External composability

While considering an Expression Problem, with extension of external module, some references consider the influence of their extension on other existing clients of the dependency module and a possible diamond problem (see Figure 1.1 below) that can occur between modules (Zenger & Odersky, 2005). However, not all papers on the matter consider this interaction.

Figure 1.1: Illustration of a possible diamond problem. Module B defines new data while module C defines new operations. Module D is a unifying module, because it unifies definitions from unrelated modules. Depending on the solution, module D may be impossible to create, or objects constructed in module B may not be used with operations define in module C.

Polymorphism

Another aspect of Expression Problem which is not universally agreed on, is whether the data types should be unified by an abstraction (Carette et al., 2009; Zenger & Odersky, 2005). Adding this requirement usually is associat-ed with enabling a certain form of polymorphism on all the data forms registered under that abstraction. If the requirement of abstraction is imposed, then the question arises about the form of this polymorphism. Static poly-morphism allows operations to have polymorphic definitions, but with the restriction that all calls of these operations must have argument types resolvable statically by the compiler. Dynamic polymorphism allows a compilation of code that includes operation calls with argument types dependent on runtime, and narrowed only to the broadest abstraction that defines this op-eration.

In general, sources agree that this problem does not apply to dynamic and dynamically typed languages such as Python or JavaScript, which allow runtime modification of data and operation definitions (Carette et al., 2009; Haeri & Keir, 2019; Oliveira, 2009; Oliveira et al., 2013; Oliveira & Cook, 2012; Zenger & Odersky, 2005). In such languages extensibility is granted even for dynamically dispatched operations. However, these lan-guages lack static guarantees that are granted by static type systems and as a result they allow for a whole class of runtime errors not present in statically typed languages.