[% setvar title Subroutines: Extend subroutine contexts to include name parameters and lazy arguments %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/

TITLE

Subroutines: Extend subroutine contexts to include name parameters and lazy arguments

VERSION

  Maintainer: Damian Conway <damian@conway.org>
  Date: 17 Aug 2000
  Last Modified: 25 Sep 2000
  Mailing List: perl6-language-subs@perl.org
  Number: 128
  Version: 4
  Status: Frozen

ABSTRACT

This RFC proposes that subroutine argument context specifiers be extended in several ways, including allowing parameters to be typed and named, and that a syntax be provided for binding arguments to named parameters.

CHANGES

Added section describing named parameter interaction with named higher-order function placeholders.

DESCRIPTION

It is proposed that the existing subroutine "prototype" mechanism be replaced by optional formal parameter lists that allow parameters to be named and their contexts specified.

The syntax for this would be:

        sub subname ( type context(s) parameter_name : parameter_attributes ,
                      type context(s) parameter_name : parameter_attributes ,
                      type context(s) parameter_name : parameter_attributes ;
                      # end of required parameters
                      type context(s) parameter_name : parameter_attributes ,
                      # etc.
                    ) : subroutine_attributes
        { body }

Each of the four components of a parameter specification -- type, context, name, and attributes -- would be optional.

Contexts

The context specifiers would be:

        $       parameter is scalar
        @       parameter is array (eats remaining args)
        %       parameter is hash (eats remaining args)
        /       parameter is qr'd string
        &       parameter is subroutine reference or block
        *       parameter is typeglob (assuming they still exist)
        ""      parameter is bareword or character string
	()	parameter is an explicitly parenthesized list

Note that any of these specifiers may appear in any position in a parameter list (especially &, which would no longer be constrained to the first position).

The following prefix context modifier would be available:

        \             parameter must be a reference,
                      argument is magically en-referenced if necessary

The following context attributes would be available:

        :lazy         argument is lazily evaluated

        :uncurried    (& only) terminate curry propagation on argument

        :noautoviv    that is a (possibly nested) hash element or array
                      element is not autovivified.

	:repeat{m,n}  argument is variadic within the specified range

The following subsections describe each of these in detail.

The following grouping operator would also be available:

        (...)   specifies that the argument(s) are to be 
                treated collectively (i.e. by modifiers and attributes)

Automagically en-referenced arguments

The \ modifier causes the modified parameter to automagically convert its corresponding argument to a reference without list flattening. The most common usage is in passing hashes and arrays as a single argument.

Note that the semantics of \ attribute would be altered slightly from those of Perl 5, so that a reference is always passed for that parameter. It would, of course, retain its magical en-referencing coercion:

        \$         argument must be scalar ref or start with $
                   scalar var magically en-referenced

        \@         argument must be array ref or start with @,
                   array var magically en-referenced

        \%         argument must be hash ref of start with %,
                   hash var magically en-referenced

        \/         argument must be qr'd string or /.../ or m/.../
                   /.../ or m/.../ magically qr'd to en-reference

        \&         arg must be sub reference, curried function, or block
                   block converted to anonymous sub ref

        \*         argument must be typeglob ref of start with *,
                   typeglob magically en-referenced

        \""        argument must be a string reference or a bareword,
                   bareword magically stringified and en-referenced

        \()        argument must be a parenthesized list or an anonymous
                   list constructor
                   parenthesized list is magically en-referenced

Lazy evaluation

If the lazy attribute is used for a particular parameter, that parameter is lazily evaluated. This means that it is only evaluated when the corresponding named parameter (see below) -- or the corresponding element of @_ -- is first accessed in some way, after which the evaluated value is stored in the element in the usual way. Passing the parameter to another subroutine or returning it as an lvalue does not count as an access. Evaluating it in an eval block always counts.

If the lazy attribute is applied to a @ parameter (which eats the remaining arguments), those remaining arguments are not evaluated until the corresponding element of the array is accessed. Iteration through such an array (i.e. in a for or foreach) only evaluates one element per iteration.

If the lazy attribute is applied to a % parameter (which eats the remaining arguments), the odd arguments (that are mapped to keys) are immediately evaluated, but the even arguments (that map to values) are not evaluated until the corresponding entry of the hash is accessed. Iteration through such a hash (i.e. via each or values) only evaluates one element per iteration.

For example:

        sub firstdef(@:lazy) { defined($_) && return $_ for (@_); }

        sub enervate($:lazy) { return $_[0] }

        sub Klingon::OP_TERNARY ($,$:lazy,$:lazy) 
        {
                if ( $_[0]->debaseToTerran() ) { return eval{$_[1]} }
                return eval{$_[2]};
        }

Note the use of explicit eval's in the last example, to force the lazy arguments to evaluate before being returned.

Controlling curry propagation

RFC 23 proposes the addition of higher order functions, via argument/operand placeholders. However, when a subroutine call includes a curried argument, there is an ambiguity as to how far "outwards" the currying should propagate. For example:

        $num_nodes = traverse( $root, $sum += ^_ );

might mean:

        $num_nodes = sub{ traverse( $root, $sum += $_[0] ) };

if currying continued to the outermost subroutine, or:

        $num_nodes = traverse( $root, sub{$sum += $_[0]} );

if it were restricted to the second argument.

As the former interpretation is the proposed default behaviour, some syntactic means of requesting the latter interpretation is required.

It is proposed that a parameter context attribute -- uncurried -- be added to handle this. Any parameter with the uncurried attribute would prevent curry propagation to the surrounding subroutine call. Thus, with the declaration:

        sub traverse ($,$:uncurried);

the call:

        $num_nodes = traverse( $root, $sum += ^_ );

would be equivalent to:

        $num_nodes = traverse( $root, sub{$sum += $_[0]} );

whereas the declaration:

        sub traverse ($,$);

would allow the curried argument to "infect" the entire surrounding call:

        $num_nodes = sub{ traverse( $root, $sum += $_[0] ) };

Note that the curry control only applies to the argument whose parameter has the uncurried attribute. So:

        sub traverse ($,$:uncurried);
        $num_nodes = traverse( ^_ , $sum += ^_ );

means:

        $num_nodes = sub { traverse( $_[0], sub{$sum += $_[0]} ) };

The currying of the second argument is restricted to its argument slot, whilst the currying of the first argument propagates outwards to encompass the entire call to traverse.

Variadic parameter lists

It would be possible to specify parameter lists consisting of an arbitrary number of specified parameters, using the variadic attribute repeat{m,n}.

A parameter specification such as:

	sub max($:repeat{2,20}) { ... }

is equivalent to:

	sub max($,$;$,$,$,$,$,$,$,$,$,$,$,$,$,$,$,$,$,$) { ... }

That is, the :repeat attribute specifies the range of arguments that the specified (scalar) parameter may represent.

If m is omitted it is zero; if n is omitted it is ~0 (maximum unsigned integer).

For example, to specify a subroutine named most that takes two or more magically enreferenced arrays and returns the one with the most elements:

        sub most ( \@:ref repeat{2,} ) {
                my $max = shift;
                for (@_) {
                        $max = $_ if @$max < @$_;
                }
                return @$max;
        }

        my @most = most @x, @y, @z;

Or consider a subroutine that takes an alternating sequence of pairs of:

which then returns the stringification of the first bareword following any expression that evaluates to true:

        sub first ( ($:lazy uncurried, ""):repeat{,} ) {
                while (my ($true, $str) = splice @_, 0, 2) {
                        return $str if $true;
                }
        }

        my $first = first
                        $x < 10 => little,
                        $x < 20 => middle,
                        $x < 30 => large;

Note the use of grouping parentheses to cause the alternating scalar/bareword sequence to be repeated.

Preventing argument autovivification

When entries of nested hashes are passed to a subroutine:

	func( $hash{key}{subkey}{subsubkey} );

the intermediate entries in the nested hash (i.e. $hash{key} and $hash{key}{subkey} in the above example) are atovivified, whether or not the argument value itself is every accessed within the subroutine. This is particularly galling if one or more of the nested hashes is undefined, since it means the higher-level entries will have keys created unnecessarily.

Specifying the :noautoviv attribute on a subroutine parameter would cause the corresponding argument to be evaluated in a special "non-autovivifying" context, unless it is used as an lvalue.

In such a non-autovivifying context, the non-existence of any intermediate nested hash would cause the entire nested hash access to immediately evaluate to undef, without any autovivification.

For example:

        sub func1 ( $:noautoviv ) { ... }
        sub func2 ( $ )           { ... }

        my %hash;
        print keys %hash;                       # prints ""

        func1( $hash{key}{subkey} );
        print keys %hash;                       # prints ""

        func2( $hash{key}{subkey} );
        print keys %hash;                       # prints "key"

If the parameter is used in an lvalue manner within the subroutine: then autovivification is still applied (at the point where the argument is used as an lvalue). For example:

	sub func3 ( $:noautoviv ) {
		if (rand > 0.5) { $_[0] = 0 }	# autovivifies argument
		else		{ print $_[0] }	# does not autovivify argument
	}


	sub func4 ( \$:noautoviv ) {	# always autovivifies (compiler warning)
		...
	}

Note that this implies that :noautoviv parameters are automatically :lazy.

Block parameters and arguments

As noted above, & parameters could appear in any position in the parameter list, allowing raw blocks as arguments anywhere in the argument list.

It is proposed that raw blocks that are subroutine arguments need not be separated by commas from adjacent arguments (on either side):

        sub on ( "", & ) {
                $handler{$_[0]} = $_[1];
        }

        # and later...

        on Error::Numeric { die $@; };
        on Error::Range   { $_[0]--; };
        on Error          { ref($_[0])->handle(); };

Furthermore, it is proposed that if a subroutine's parameter list ends in a & and the subroutine is called in a void context, that the following semi-colon be optional:

        on Error::Numeric {
                die $@;
        }

        on Error::Range {
                $_[0]--;
        }

        on Error {
                ref($_[0])->handle();
        }

Context classes

The revised syntax would also allow context classes to be specified. A context class aggregates two or more alternative contexts, allowing any one of them to be the context for corresponding argument.

For example:

        sub mymap ([\/&$], @) {...}
        

Here, the first argument must be either a /.../ pattern (or qr), or a block (or sub ref), or a scalar. In parsing that argument, the various possible contexts are considered left-to-right and the first context that allows the argument to be parsed is used.

Note that context classes may also have attributes:

        sub mymap ([\/&$]:lazy uncurried}, @) {...}

In this example, no matter what the first argument is, it is lazily evaluated and does not propagate currying.

A context class may only contain context specifiers that yield scalar parameters. Hence, a context class may contain any of the following specifiers (any of which may also have lazy or uncurried attributes):

        $       /       \$      \/
        &       *       \&      \*      
        ""              \""     \()
                        \@      \%

but not:

        @       %       ()

A context class always yields a scalar parameter.

Parameter names

Each parameter may optionally (and independently) be given a name. This name is specified after the parameter's context specifer. The declaration of a parameter name creates a lexical variable of the same name in the scope of the subroutine body. Named @ and % parameters create a lexical array or hash respectively. All other named parameters create a lexical scalar.

For example:

        sub doublemap (&mapsub, @args) {        # creates my($mapsub,@args)
                my @mapped;
                push @mapped, $mapsub->(splice @args, 0, 2) while @args;
                return @mapped;
        }

Note that the context specifier can still be any valid specifier:

        sub lazymap ([&\/$]mapper : lazy uncurried, $max, @args:lazy) {
                my @mapped;
                switch (ref $mapper) {
                        case 'CODE'  { push @mapped, $mapper->(shift)
                                                while @args && $max--; }
                        case 'REGEX' { push @mapped, shift() =~ m/$mapper/
                                                while @args && $max--; }
                        case ''      { push @mapped, $mapper
                                                while @args && $max--; }
                }
                return @mapped;
        }

Named arguments

It is further proposed that arguments may be passed by name, and that named arguments may be passed in any order.

An argument would be associated with a named parameter by prefixing it with a standard Perl label (i.e. an identifier-colon sequence). For example:

        @mapped = doublemap(args: @list, mapsub: ^a+^b);

On encountering labelled arguments in a subroutine call, the interpreter would examine the named parameters to determine their contexts, evaluate ththe labelled arguments (in left-to-right sequence) in the context specified by the corresponding named parameters (or not evaluate them for lazy contexts!). The resulting values would then be assigned to the corresponding named parameters.

Any unlabelled arguments would then be evaluated and assigned (again in left-to-right sequence) to any remaining parameters. Those nameless evaluations would be carried out in the respective contexts specified by the remaining parameters.

It would be an error to:

        * Define two named parameters with the same name, unless they
          can be distinguished by context. 

        * Label two arguments with the same name, unless there are 
          two context-distinguishable named parameters of that name.

If a subroutine was called with a labelled argument for which there was no named parameter, the label would be ignored and the argument treated as unlabelled, unless the subroutine had been declared with a strict_args attribute.

Interaction with named placeholders

It is further proposed that when named placeholders are used to curry a function, the resulting subroutine would have named parameters. If the curried function mixed named, ordinal, and anonymous placeholders, the resulting subroutine would have a mixture of named and unnamed parameters.

For example:

        my $selector = ^condition ? ^2 : ^_;

would be equivalent to:

        my $selector = sub ($condition,$,$) { $condition ? $_[2] : $_[1] };

This would make currying out the condition clearer:

        my $select_on_val = $selector->(condition: $val);

Types

It is proposed that parameters may be given types: either the name of a class, or the name of a builtin type (such as 'ARRAY', 'HASH', 'CODE', etc.)

If a parameter has a type (T) then the following additional constraints are placed upon it and its value:

For example:

        sub traverse (Tree $root, $subref:uncurried) {...}

This specifies that the first argument must be a Tree object, or an object of a class derived from Tree. The corresponding lexical variable would be equivalent to:

        my Tree $root;

Using builtin type names

The ability to specify the names of builtin types as parameter types offers additional flexibility in controlling argument interpretation. For example, the specification:

        sub demo(ARRAY $a, @b) {...}    # version 1

constrains the argument to be an array reference, but does not invoke a magical en-referencing context, the way this would:

        sub demo(\@a, @b) {...}         # version 2

Thus, a call like:

        demo(@LOL);

will succeed under version 1 (binding $LOL[0] to $a, and the rest of @LOL to @b), provided $LOL[0] is an array reference.

Under version 2, the call to demo would fail, since \@LOL will be bound to $a and there will be nothing left to bind to @b.

Banishment of the term "prototype"

It is further proposed that parameter lists never be referred to as "prototypes", and that use of the term be a flameworthy offence. The preferred nomenclature would be "parameter list", or perhaps "signature".

MIGRATION ISSUES

This proposal has the potential to break a small number of cases where a backslashed context specifier would now match a reference argument that it previously complained about.

Also, the suggested regularization of semantics for backslash means that a \$ argument is passed as a reference, not a value.

IMPLEMENTATION

Definitely S.E.P.

REFERENCES

RFC 21 (v1): Replace wantarray with a generic want function

RFC 22 (v1): Builtin switch statement

RFC 23 (v2): Higher order functions

RFC 57 (v1): Subroutine prototypes and parameters

RFC 84 (v1): Replace => (stringifying comma) with => (pair constructor)

RFC 97 (v1): prototype-based method overloading

[Numerous other RFC's make use of, or reference to, this mechanism]