Failure of Implicit Conventions: Determining Function Destructiveness

Published on July 18, 2007. functional (4), language-design (4)

All functions are either destructive or non-destructive. What is the difference? How do we distinguish between them? Why do we care? These are important questions, and in order to develop a language, a library, a program, or a function the failure to consider the ramnifications of destructiveness leads to inconsistent code and unintuitive errors.

What Are Destructive Functions

Lets see if we can figure what destructive functions look like:

def doSomething()
  "str"
end

Non-Destructive.

def doSomething(a, b)
   a * b
end

Non-Destructive.

def doSomething(a)
  a << "this"
end

Destructive.

def doSomething(a)
  b = a.clone()
  b << "this"
end

Non-Destructive.

irb(main):033:0> y = [1,2,5,1,2]
=> [1, 2, 5, 1, 2]
irb(main):034:0> y.sort()
=> [1, 1, 2, 2, 5]
irb(main):035:0> y
=> [1, 2, 5, 1, 2]

Non-Destructive.

def sort2(a)
  a = a.sort()
end

Destructive.

class Something
  def initialize(name)
    @name = name
  end

  def doSomething()
    @name = @name << " is a silly name"
  end
end

Destructive.

Okay, so we have probably picked up on the difference at this point: destructive functions cause some change outside of the lexical scope of their function, whereas non-destructive functions do not cause changes outside of their lexical scope.

More simply, non-destructive functions don't change values they don't create. All incoming parameters may be examined, but not altered (if you do need to alter it, a non-destructive function will create a copy to modify instead of acting on the original).

Telling Them Apart

To figure out if a function is destructive or non-destructive can be difficult. If you have access to the source code, you can read through it and look for points where external variables or passed in parameters are modified. However, when the source is unavailable, things get a bit more difficult.

Without the source, you have two tools remaining: experimentation, and convention.

Experimentation comes down to evaluating any variables a function might change before and after its execution, and monitoring if they have been altered. In particular, parameters ought to be closely examined for changes. Does this seem unreliable? Well, it is. It is easy to create simple test cases where a destructive function appears to be non-destructive. A common case is something like this:

irb(main):037:0> def link(a,b)
irb(main):038:1>   [a, b]
irb(main):039:1> end
=> nil
irb(main):040:0> a = [1,2,3]
=> [1, 2, 3]
irb(main):041:0> b = [4,5,6]
=> [4, 5, 6]
irb(main):042:0> c = link(a,b)
=> [[1, 2, 3], [4, 5, 6]]
irb(main):043:0> a
=> [1, 2, 3]
irb(main):044:0> b
=> [4, 5, 6]
irb(main):045:0> c[0] << 10
=> [1, 2, 3, 10]
irb(main):046:0> a
=> [1, 2, 3, 10]

Now a and b initially appear to be unaltered, but we later realize that they have been recycled to help form the value returned from link. In some cases even if the function explicitly copies the incoming parameters, you may still run into this problem (because a shallow copy occurs instead of a deep copy).

This leaves us with our final tool: convention.

Programming languages have their own preferred paradigms, their chosen idioms, their good practices, and their built-in philosophies about what constitutes good programming style. Typically these are unwritten rules are gradually absorbed by reading code written by language experts. However, this methodology of teaching convention implicitly is a tragedy.

It is tragic because unwritten rules are unreliable, and gravitate from short-lived trend to short-lived trend. We cannot rely on library writers to fully implement an evolving array of best practices when those best practices are defined ambiguously: implicit rules have all the permanency of of a drunken one-night stand.

However, not all languages rely on unwritten convention for distinguishing destructive functions. Languages which encourage functional programming tend to be very explicit: Scheme appends all destructive methods with an exclamation mark. This shows where a proactive language designer (or language design committee) can enhance the usability of a language without modifying syntax or defining standard libraries: simple conventions--like Ruby's use of a question mark to denote boolean queries, or Scheme's exclamation mark to denote a destructive function--can convey important information quickly to others reading your source code.

Why We Should Care

Okay, so in many languages it can be difficult to distinguish a destructive function from a non-destructive function. Does this really matter?

If you like to write clean, efficient, and non-redundant code, then yes, this matters. It also matters if you like code that works, or value your time.

If you can't easily distinguish between destructive and non-destructive functions then you will have to err on the side of caution. This means you will encounter situations where you are doing add needless redundancy:

def doSomething(x)
  y = x.clone()
  y << "some text"
end
z = "this is my precious string"
z2 = doSomething(z.clone())

This is inefficient code (not to mention ugly), but if you can't quickly determine if doSomething is non-destructive, then these situations can and will arrise fairly frequently. The key here is quickly determine: tracing through a chain of library function calls is a bad use of your time, and you know that, so frequently the quick fix is adopted rather than verifying how a function truly operates. Over time these quick fixes accumulate and your code is longer, more complex, and less efficient than it would have been if distinguishing between the two types of functions was trivial.

If we can distinguish quickly, we know where to clone our data for safety, and where to pass it in unmodified. We can make informed decisions about destroying our data for efficiency purposes, or keeping it pristine for later usage.

It boils down to this: we can write more efficient and less redundant code--more quickly and with less effort--if we can distinguish between destructive and non-destructive functions.

What To Do

We have a few resources available to us in our war against destructive ambiguity. We can start using languages that actually make this distinction (what, no one wants to start doing all their coding in Scheme?), or we can learn from the success of Scheme (in this narrow scope) and create explicit conventions that exist on pieces of paper, and not just in our communities collective memory.

Lets consider the Python convention for distinguish between destructive and non-destructive functions (not that I am aware of anywhere this is actually written down):

If a function returns None, then it is destructive.
If a function returns a value other than None, then it not destructive.
Rules #1 and #2 are optionally void at discression of programmer.

You can see evidence of this convention when you read documentation which says something to the effect of "This function returns None to emphasis the non-destructive nature."

Then you see a method like this:

def increment(self, inc_by):
    self.count = self.count + inc_by
    return self.count

It returns a value other than None. And it is destructive. And noone would consider it bad form.

Fundamentally the Python convention is also undesirable because it requires the programmer to actually run the method (or read its source code) to determine what it returns. Simply seeing the name of the method is insufficient to determine destructiveness.

This is by no means an exclusively Pythonic problem though, almost no languages comprehensively address this important distinction. Each programming community needs to sit down and find a way to distinguish between destructive and non-destructive functions that fits with its language philosophy. The Scheme method of appending an exclaimation mark to destructive functions won't work for all languages, but finding a similarly explicit convention is well worth the time a few moments of community discussion.