Note: these documents may be out of date. Do not use as reference! |

To see what is currently happening visit http://www.perl6.org/

Arrays: Efficient Array Loops

Maintainer: Buddha Buck <bmbuck@14850.com> Date: 8 Sep 2000 Last Modified: 30 Sep 2000 Mailing List: perl6-language-data@perl.org Number: 207 Version: 3 Status: Frozen

This RFC proposes a notation for creating efficient implicit loops over multidimensional arrays. It introduces the notation |i for an index iterator for arrays, allowing temporary multidimensional arrays to be created on the fly.

Version 3 freezes this RFC. Except for some requests for clarification from Jeremy Howard, there was no additional discussion since Version 2.

Version 3 clarifies the use of looping indices in "list context", and has a rewritten section concerning looping indices and RFC205 Cartesian products. It also introduces |@ and |@foo notation for flexibly dealing with looping indices for arrays of unknown dimension.

Version 2 is an almost complete rewrite, based on concerns and enhancements that arose in discussion. I was also unhappy with the language in the original. The primary semantic changes are:

- The syntax formerly known as "iterators" is now known as "looping indices"
- |i is interpreted as "scalar-like", rather than "list-like". The examples are now (to the best of my knowledge) consistant with that interpretation.
- The scoping mechanism is enhanced. It is now clearer what the iterators mean in the various contexts within perl.

I believe the current version is much more understandable.

Consider the problem of multiplying together two 2-dimensional tensors. In standard notation, this would be symbolized by

Cijkl = Aij * Bkl

where the letters i, j, k and l are written as subscripts and represent the indices of their respective tensors. To accomplish that same multiplication in Perl, one needs to write (using RFC 204 notation):

for my $i = (0..2) { #assuming 3x3 tensors for my $j = (0..2) { for my $k = (0..2) { for my $l = (0..2) { $c[[$i,$j,$k,$l]] = $a[[$i,$j]] * $b[[$k,$l]] ; } } } }

While this is not particularly difficult, it is clumsy, and slow. It could be hidden in a subroutine, but then one lacks the flexibility to write similar equations on the fly. For instance, transposition is easily written as Tij = Aji. Such notation is assumed to be true for all legal values of i and j -- in effect, it loops over all legal values of i and j to compute the results. Furthermore, such notation along with appropriate restrictions on use could allow Perl to create optimised loops.

This RFC proposes a similar notation, using |name as a notation for looping indices like i and j above.

This RFC proposes that the entire The prefix | signifies that |i is an "looping index" within the statement |i appears in. More than one looping index can appear within one statement.

The "scope" of a set of looping indices is the smallest subexpression in the statement that contains all the looping indices in the statement. If the scope is not in list or void context, the scope is expanded until it is in list or void context. The scope of a set of looping indices will never exceed one statement.

A set of looping indices generates an implicit loop (or nested loops) surrounding the scope of the indices.

If used in list context, the implied loop creates a temporary array whose dimensions and bounds are determined by the number and range of the looping indices.

# In list context $dotproduct = reduce ^_+^_, 0, $a[|i] * $b[|i]; @tensorproduct = $a[[|i,|j]] * $b[[|k,|l]];

When there are multiple looping indices in an expression in list context, the order of the indices in the temporary array is determined by scanning the expression left-to-right.

As a specific example, the expression

@tensorproduct = $a[[|i,|j]] * $b[[|k,|l]];

would generate code somewhat equivilant to:

{ my sub loop { my @result :bounds(@#a,@#b); for my $i (0..$#a[0]) { for my $j (0..$#a[1]) { for my $k (0..$#b[0]) { for my $l (0..$#b[1] { $result[[$i,$j,$k,$l]] = $a[[$i,$j]]*$b[[$k,$l]]; } } } } } @tensorproduct = loop(); }

If @a is a 2x3 array and @b is a 3x4 array, then @tensorproduct would
become a 2x3x3x4 array. If the expression had been
`$b[[|i,|j]]*$c[[|k,|l]]`

instead, then @tensorproduct would have
become a 3x4x2x3 array.

If used in a void context, the implied loop does not create a temporary array, but rather expects the side effects (if any) to do the real work.

# In void context $c[[|i,|j,|k,|l]] = $a[[|i,|j]]*$b[[|k,|l]]; # compare with loop above $t[[|i,|j]] = $a[[|j,|i]]; $product[[|i, |j]] = $a[[|i,|k]] * $b [[|k,|j]]; $dotproduct += $a[|i] * $b[|i]; # tmtowdi $stack = $a[|i]; # $stack->STORE is TIEd to $stack->push($)

As the two example shows, assignment to a scalar is assumed to be in scalar context, not list, and so the scope of the looping indices is expanded to include the LHS of the assignment.

Looping indices can also have a user-defined range, using the syntax "|i=@list". This would create a bounding loop of the form "for |i (@list) {...}" around the expression. When no list explicitly given, |i acts as if (0..) was the specified list. The range for a looping index can be specified anywhere in the statement, but only one range can be given per looping index.

# take the upper-triangle of a square array $uppertri[[|i,|j]] = 0; # clear array to begin with $uppertri[[|i,|j]] = $a[[|i,|j=(0..|i)]];

Strictly speaking, |i does not take on all the values in @list (or (0..)), but rather solely those values which lead to valid (in bounds) array indices.

# "unriffle" two arrays. There are probably better ways to do this ($a[|i], $b[|i]) = ($c[ 2*|i ], $c[ 2*|i + 1 ]); $average[|i] = ($a[|i-1] + $a[|i] + $a[|i+1])/3;

In the first example, |i will never take on values that would cause 2*|i+1 to be out of bounds for $c.

As mentioned above, using multiple looping indices will cause a nested loop. The order of nesting the loops is not specified here, but any interdependencies among the indices must be satisfied.

In most of the examples above, the loops caused by the multiple iterators are independent. However, in the "upper triangle" example, since the range of |j depends on the current value of |i, |i must be the "outer loop".

RFC 205 proposes the use of ';' as an operator to generate Cartesian products of list as a useful notation for generating complex array slices. An obvious question is how this notation works with iterators.

If an array index contains a Cartesian product that uses one or more looping indices, the Cartesian product is converted into an array reference containing a looping index for every non-singleton list in the Cartesian product before the loop scope is determined.

Example: The expression

$a[|i;(0..5)]

is initially converted to the form:

$a[[|i,|_j=(0..5)]]

where `|_j`

is a newly created anonymous looping index.

This conversion is based on the "canonical" form of the RFC205 notation (i.e., without empty lists or leading or trailing "*" lists:

$a[|i;]

is canonically

$a[|i;(0..)]

which is converted to

$a[[|i,|_j=(0..)]]

which is simplified to

$a[[|i,|_j]]

The use of a leading or trailing * proceeds similarly:

# Generalized tensor multiplication: @product = $a[|a0;*] * $b[|b0;*];

is canonically (assuming 4-dimensional tensors on the RHS):

#generalized tensor multiplication (4-dim tensors) @product = $a[|a0;(0..);(0..);(0..)] * $b[|b0;(0..);(0..);(0..)];

which is converted to

# Generalized tensor multiplication (4-dim example) @product = $a[[|a0,|_a1,|_a2,|_a3]] * $b[[|b0;|b1;|b2;|b3];

The use of ; also makes it easier to express long lists of looping indices. $array[|i;|j;|k] is equivilant to $array[[|i,|j,|k]], but doesn't use as much punctuation.

The RFC205 notation above is useful, but it does have some limitations. For instance, it is impossible to use it (or at least, it requires more clever bits than I possess) to sum along the first dimension of an arbitrary dimensioned array:

# Do something completely bizzare $reduced[|j;*] += $a[|i;|j;*]; # which might be equivilant to: # # $reduced[|j;|_k;|_l;|_m] += $a[|i;|j;|_n];

In order to do this properly, there needs to be a way to state that the iterators on the LHS are the same as the iterators on the RHS. This is possible if the dimensions are known in advance:

# sum along first dimension of 3-dim array, yielding 2-dim array $reduced[|j;|k] += $a[|i;|j;|k];

The notation |@ and |@foo can be used to get around that problem. |@ is equivilant to a leading or trailing * in RFC205 notation, in that it generates a group of anonymous looping indices. |@foo names the group of generated anonymous looping so that they can (as a group) be used in multiple places:

# sum along first dimension of n dim array, yielding n-1 dim array $reduced[|@f] += $a[|i;|@f]; # perform tensor addition @sum = $a[|@a] + $b[|@a]; # perform tensor multiplication @product = $a[|@a] * $b[|@b]; # or @product = $a[|@] * $b[|@];

Each use of |@ is independent, so each factor in `$a[|@] * $b[|@]`

is
using a different set of anonymous looping indices.

|@ and |@foo are allowed only within array indices.
Because |@ and |@foo are of variable length, it must be possible to
determine from the expressions how large they are, knowing the shape
of the arrays. Practically, this means that |@ can be used only once
in an index, because it is impossible to tell which dimension `|i`

is
in an expression like `$a[|@;|i;|@]`

. An index may use multiple
named index groups, as long as other parts of the statement provide
enough context to provide the bounds of the index groups.

As yet another way to take a tensor product:

$product[|@a;|@b] = $factor1[|@a] * $factor2[|@b];

demonstrates this. Because the shape of @factor1 and @factor2 determined the number and bounds of the looping index groups |@a and |@b, both can be used within the index for @product.

Looping indices aren't restricted to being used solely as array indices, as the "unriffle" example showed. But each looping index has to be used in an array index for at least one array.

# find $nth triangular number my $triangle = 0; $triangle += |i=(0..$n); # compile-time error: |i not used as index # Fill a multiplication table my @multtable : shape(12,12); $multtable[|i;|j] = |i*|j; # OK

=head Lazy Evaluation

Assuming that lazy evaluation is used in other parts of Perl6, it would be nice if these loops could also be evaluated lazily.

In list context, this could be done by creating an anonymous function to evaluate the looped expression at the desired indices:

$a[|i]*$b[|j] # in list context # becomes sub { my ($i,$j) = @_; $a[$i]*$b[$j]; }

This anonymous function can be TIEd to the resulting anonymous array, so all array lookups would invoke this function. Since TIEing is supposed to be improved in Perl6, this would be a reasonable way to do it.

If other lazy evaluation mechanisms work in Perl6, they could be used instead.

I am uncertain if lazy evaluation makes sense in void context.

$t[[|i,|j]] = $a[[|j,|i]]; # transpose 2-d @a

would be equivilant to:

{ my $i; my $j for $i (0..) { # last if out-of-bounds for $j (0..) { # last if out-of-bounds $t[[$i,$j]] = $a[[$j,$i]]; } } }

This notation also allows (as a specific use) an alternative notation to the RFC 82 element-wise syntax.

#compute pairwise sum, pairwise product, pairwise difference... @sum = @a[[|i,|j,|k,|l]] + @b[[|i;|j;|k;|l]]; # RFC82: @sum = @a + @b @prod= @a[[|i,|j,|k,|l]] * @b[[|i;|j;|k;|l]]; # @prod = @a * @b @diff= @a[[|i,|j,|k,|l]] - @b[[|i;|j;|k;|l]]; # @diff = @a - @b

RFC 82 syntax is simpler, but this is perl, so There Is More Than One Way To Do It, as tensor multiplication demonstrates.

Note that if the "Lazy Evaluation" schema mentioned above is adopted, then these sums, products, and differences could be automagically lazy as well.

The simplest implementation would be to convert at compile-time (or parse time) void-context looped iterator scopes to loops analogous to the above examples, and convert list-context looped iterator scopes to valued do-blocks or invoked anonymous subroutines:

$dotproduct = reduce {^_+^_},0,$a[|i]*$b[|i]; # would be transformed into $dotproduct = reduce {^_+^_},0, sub { my $i; my @r; for $i (0..min($#a,$#b)) { $r[$i] = $a[$i] * $b[$i]; } return @r; }->();

A more sophisticated, preferred, implementation would take advantage of the static, known nature of the data to create a highly optimized version of the loop.

Possible optimizations include: Common sub-expression elimintation, encoding internally to some non-interpreted looping construct, etc. If special 'numeric functions' are provided in Perl, then expressions with just unoverloaded operators and numeric functions could be optimised into tight compiled loops, as occurs for example with fromfunction() and ufuncs in Numeric Python:

starship.python.net#SEC8 starship.python.net#SEC13

For lazy evaluation, the value of the expression at any given set of indices is easy to calculate. However the lazy evaluation mechanism works, it can use this property to calculate the appropriate values.

RFC 203: Notation for declaring and creating arrays

RFC 204: Notation for indexing arrays with an LOL as an index

RFC 205: New operator ';' for creating array slices