TITLE

Exegesis 7: Formats

AUTHOR

Damian Conway <damian@conway.org>

VERSION

  Maintainer: Larry Wall <larry@wall.org>
  Date: 26 Feb 2004
  Last Modified: 29 May 2006
  Number: 7
  Version: 2

[Update: Please note that this was written a couple of years ago, and a number of things have changed since then. Rather than changing the original document, we'll be inserting "Update" notes like this one to tell you where the design has since evolved. (For the better, we hope). In any event, for the latest Perl 6 design (or to figure out any cryptic remarks below) you should read the Synopses, which are kept very much more up-to-date than either the Apocalypses or Exegeses.]

What a piece of work is Perl 6!

How noble in reason!

How infinite in faculty!

In form how express and admirable!

– W. Shakespeare, "Hamlet" (Perl 6 revision)

Formats are Perl 5's mechanism for creating text templates with fixed-width fields. Those fields are then filled in using values from prespecified package variables. They're a useful tool for generating many types of plaintext reports – the r in Perl, if you will.

Unlike Perl 5, Perl 6 doesn't have a format keyword. Or the associated built-in formatting mechanism. Instead it has a Form.pm module. And a form function.

Like a Perl 5 format statement, the form function takes a series of format (or "picture") strings, each of which is immediately followed by a suitable set of replacement values. It interpolates those values into the placeholders specified within each picture string, and returns the result.

The general idea is the same as for Perl's two other built-in string formatting functions: sprintf and pack. The first argument represents a template with N placeholders to be filled in, and the next N arguments are the data that is to be formatted and interpolated into those placeholders:

    $text = sprintf $format_s, $datum1, $datum2, $datum3;
    $text =    pack $format_p, $datum1, $datum2, $datum3;
    $text =    form $format_f, $datum1, $datum2, $datum3;

Of course, these three functions use quite different mini-languages to specify the templates they fill in, and all three fill in those templates in quite distinct ways.

Apart from those differences in semantics, form has a syntactic difference too. With form, after the first N data arguments we're allowed to put a second format string and its corresponding data, then a third format and data, and so on:

    $text = form $format_f1, $datum1, $datum2, $format_f2, $datum4, $format_f3, $datum5;

And if we prettify that function call a little, it becomes obvious that it has the same basic structure as a Perl 5 format:

    form
         $format_f1,
             $datum1, $datum2, $datum3,
         $format_f2,
             $datum4,
         $format_f3,
             $datum5;

But the Perl 6 version is implemented as a vanilla Perl 6 subroutine, rather than hard-coded into the language with a special keyword and declaration syntax. In this respect it's rather like Perl 5's little-known formline function – only much, much better.

So, whereas in Perl 5 we might write:

    # Perl 5 code...

    our ($name, $age, $ID, $comments); 

    format STDOUT
     =================================== 
    | NAME     |    AGE     | ID NUMBER |
    |----------+------------+-----------|
    | @<<<<<<< | @||||||||| | @>>>>>>>> |
      $name,     $age,        $ID,
    |===================================|
    | COMMENTS                          |
    |-----------------------------------|
    | ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |~~
      $comments,
     ===================================
    .

    write STDOUT;

in Perl 6 we could write:

    print form
        " =================================== ",
        "| NAME     |    AGE     | ID NUMBER |",
        "|----------+------------+-----------|",
        "| {<<<<<<} | {||||||||} | {>>>>>>>} |",
           $name,     $age,        $ID,
        "|===================================|",
        "| COMMENTS                          |",
        "|-----------------------------------|",
        "| {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[} |",
           $comments,
        " =================================== ";

And both of them would print something like:

     ===================================
    | NAME     |    AGE     | ID NUMBER |
    |----------+------------+-----------|
    | Richard  |     33     |    000003 |
    |===================================|
    | COMMENTS                          |
    |-----------------------------------|
    | Talks to self. Seems to be        |
    | overcompensating for inferiority  |
    | complex rooted in post-natal      |
    | materal rejection due to physical |
    | handicap (congenital or perhaps   |
    | the result of premature birth).   |
    | Shows numerous indications of     |
    | psychotic (esp. nepocidal)        |
    | tendencies. Naturally, subject    |
    | gravitated to career in politics. |
     ===================================

At first glance the Perl 6 version may seem like something of a backwards step – all those extra quotation marks and commas that the Perl 5 format didn't require. But the new formatting interface does have several distinct advantages:

Of course, this is Perl, not Puritanism. So those folks who happen to like package variables, global accumulators, and mysterious writes, can still have them. And, if they're particularly nostalgic, they can also get rid of all the quotation marks and commas, and even retain the dot as a format terminator. For example:

    sub myster_rite {                           
        our ($name, $age, $ID, $comments);     
        print form :interleave, <<'.'               
             =================================== 
            | NAME     |    AGE     | ID NUMBER |
            |----------+------------+-----------|
            | {<<<<<<} | {||||||||} | {>>>>>>>} |
            |===================================|
            | COMMENTS                          |
            |-----------------------------------|
            | {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[} |
             =================================== 
            .
              $name,     $age,        $ID,
              $comments;
    }

    # and elsewhere in the same package...

    ($name, $age, $ID, $comments) = get_data();
    myster_rite();

    ($name, $age, $ID, $comments) = get_more_data();
    myster_rite();

Let's take a look...

What's in a name?

But before we do, here's a quick run-down of some of the highly arcane technical jargon we'll be using as we talk about formatting:

Format
A string that is used as a template for the creation of text. It will contain zero or more fields, usually with some literal characters and whitespace between them.
Text
A string that is created by replacing the fields of a format with specific data values. For example, the string that a call to form returns.
Field
A fixed-width slot within a format string, into which data will be formatted.
Data
A string or numeric value (or an array of such values) that is interpolated into a format, in order to fill in a particular field.
Single-line field
A field that interpolates only as much of its corresponding data value as will fit inside it within a single line of text.
Block field
A field that interpolates all of its corresponding data value, over a series of text lines – as many as necessary – producing a text block.
Text block
The column of newline-separated text lines. A text block is produced when data is formatted into a block field that is too small to contain the data in a single line
Column
The amount of space on an output device required to display one single-width character. One character will occupy one column in most cases, the most obvious exceptions being CJK double-width characters.

Why, how now, ho! From whence ariseth this?

Unlike sprintf and pack, the form subroutine isn't built into Perl 6. It's just a regular subroutine, defined in the Form.pm module:

    module Form
    {
        type FormArgs ::= Str|Array|Pair;

        sub form (FormArgs *@args is context(Scalar)) returns Str
            is exported
        {
            ...
        }

        ...
    }

That means that if we want to use form we need to be sure we:

    use Form;

first.

Note that the above definition of form specifies that the subroutine takes a list of arguments (*@args), each of which must be a string, array or pair (type FormArgs ::= Str|Array|Pair). And the is context(Scalar) trait specifies that each of those arguments will be evaluated in a scalar context.

That last bit is important, because normally a "slurpy" array parameter like *@args would impose a list context on the corresponding arguments. We don't want that here, mainly because we're going to want to be able to pass arrays to form without having them flattened.

How called you...?

Like all Perl subroutines, form can be called in a variety of contexts.

When called in a scalar or list context, form returns a string containing the complete formatted text:

    my $formatted_text = form $format, *@data;

    @texts = ( form($format, *@data1), form($format, *@data2) );  # 2 elems

When called in a void context, form waxes lyrical about human frailty, betrayal of trust, and the pointlessness of calling out when nobody's there to heed the reply, before dying in a highly theatrical manner.

He doth fill fields...

The format strings passed to form determine what the resulting formatted text looks like. Each format consists of a series of field specifiers, which are usually separated by literal characters.

form understands a far larger number of field specifiers than format did, but they're easy to remember because they obey a small number of conventions:

The fields are fragrant...

That may still seem like quite a lot to remember, but the rules have been chosen so that the resulting fields are visually mnemonic. In other words, they're supposed to look like what they do. The intention is that we simply draw a (stylized) picture of how we want the finished text to look, using fields that look something like the finished product – left or right brackets brackets showing horizontal alignments, a middlish = or bottomed-out _ indicate middled or bottom vertical alignment, etc., etc. Then form fits our data into the fields so it looks right.

The typical field specifications used in a form format look like this:

                                      Field specifier
    Field type                 One-line             Block
    ==========                ==========          ==========

    left justified            {<<<<<<<<}          {[[[[[[[[}
    right justified           {>>>>>>>>}          {]]]]]]]]}
    centred                   {>>>><<<<}          {]]]][[[[}
    centred (alternative)     {||||||||}          {IIIIIIII}
    fully justified           {<<<<>>>>}          {[[[[]]]]}
    verbatim                  {''''''''}          {""""""""}

    numeric                   {>>>>>.<<}          {]]]]].[[}
    euronumeric               {>>>>>,<<}          {]]]]],[[}
    comma'd                   {>,>>>,>>>.<<}      {],]]],]]].[[}
    space'd                   {> >>> >>>.<<}      {] ]]] ]]].[[}
    eurocomma'd               {>.>>>.>>>,<<}      {].]]].]]],[[}
    Swiss Army comma'd        {>'>>>'>>>,<<}      {]']]]']]],[[}
    subcontinental            {>>,>>,>>>.<<}      {]],]],]]].[[}

    signed numeric            {->>>.<<<}          {-]]].[[[}
    post-signed numeric       {>>>>.<<-}          {]]]].[[-}
    paren-signed numeric      {(>>>.<<)}          {(]]].[[)}

    prefix currency           {$>>>.<<<}          {$]]].[[[}
    postfix currency          {>>>.<<<DM}         {]]].[[[DM}
    infix currency            {>>>$<< Esc}        {]]]$[[ Esc}

    left/middled              {=<<<<<<=}          {=[[[[[[=}
    right/middled             {=>>>>>>=}          {=]]]]]]=}
    infix currency/middled    {=>>$<< Esc}        {=]]$[[ Esc}
    eurocomma'd/middled       {>.>>>.>>>,<<=}     {].]]].]]],[[=}
    etc.

    left/bottomed             {_<<<<<<_}          {_[[[[[[_}
    right/bottomed            {_>>>>>>_}          {_]]]]]]_}
    etc.

What a block art thou...

When data is interpolated into a line field, the field grabs as much of the data as will fit on a single line, formats that data appropriately, and interpolates it into the format.

That means that if we use a one-line field, it only shows as much of the data as will fit on one line. For example:

    my $data1 = 'By the pricking of my thumbs, something wicked this way comes';
    my $data2 = 'A horse! A horse! My kingdom for a horse!';

    print form
        "...{<<<<<<<<<<<<<<<<<}...{>>>>>>>}...",
            $data1,               $data2;

prints:

    ...By the pricking of ... A horse!...

On the other hand, if our format string used block fields instead, the fields would extract one line of data at a time, repeating that process as many times as necessary to display all the available data. So:

    print form
        "...{[[[[[[[[[[[[[[[[[}...{]]]]]]]}...",
            $data1,               $data2;

would produce:

    ...By the pricking of ... A horse!...
    ...my thumbs,         ... A horse!...
    ...something wicked   ...       My...
    ...this way comes     ...  kingdom...
    ...                   ...    for a...
    ...                   ...   horse!...

We can mix line fields and block fields in the same format and form will extract and interpolate only as much data as each field requires. For example:

    print form
        "...{<<<<<<<<<<<<<<<<<}...{]]]]]]]}...",
            $data1,               $data2;

which produces:

    ...By the pricking of ... A horse!...
    ...                   ... A horse!...
    ...                   ...       My...
    ...                   ...  kingdom...
    ...                   ...    for a...
    ...                   ...   horse!...

Notice that, after the first line, the single-line {<<<<<<} field is simply replaced by the appropriate number of space characters, to keep the columns correctly aligned.

The usual reason for mixing line and block fields in this way is to allow numbered or bulleted points:

    print "I couldn't do my English Lit homework because...\n\n";

    for @reasons.kv -> $index, $reason {
        my $n = @reasons - $index ~ '.';
        print form "   {>}  {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
                       $n,  $reason,
                   "";
    }

which might produce:

    I couldn't do my English Lit homework because...

         10. Three witches told me I was going to be    
             king.                                      

          9. I was busy explaining wherefore am I Romeo.

          8. I was busy scrubbing the blood off my      
             hands.                                     

          7. Some dear friends had to charge once more  
             unto the breach.                           

          6. My so-called best friend tricked me into   
             killing my wife.                           

          5. My so-called best friend tricked me into   
             killing Caesar.                            

          4. My so-called best friend tricked me into   
             taming a shrew.                            

          3. My uncle killed my father and married my   
             mother.                                    

          2. I fell in love with my manservant, who was
             actually the disguised twin sister of the
             man that my former love secretly married,
             having mistaken him for my manservant who
             was wooing her on my behalf whilst secretly
             in love with me.

          1. I was abducted by fairies.                 

And mark what way I make...

Obviously, as a call to form builds up each line of its output – extracting data from one or more data arguments and formatting it into the corresponding fields – it needs to keep track of where it's up to in each datum. It does this by progressively updating the .pos of each datum, in exactly the same way as a pattern match does.

And as with a pattern match, by default that updated .pos is only used internally and not preserved after the call to form is finished. So passing a string to form doesn't interfere with any other pattern matching or text formatting that we might subsequently do with that data.

However, sometimes we do want to know how much of our data a call to form managed to extract and format. Or we may want to split a formatting task into several stages, with separate calls to form for each stage. So we need a way of telling form to preserve the .pos information in our data.

But, if we want to apply a series of form calls to the same data we also need to be able to tell form to respect the .pos information of that data – to start extracting from the previously preserved .pos position, rather than from the start of the string.

To achieve both those goals, we use a follow-on field. That is we use an ordinary field but mark it as .pos-sensitive with a special notation: Unicode ellipses or ASCII colons at either end. So instead of {<<<<>>>>}, we'd write {…<<<>>>…} or {:<<<>>>:}.

Note that each ellipsis is a single, one-column wide Unicode HORIZONTAL ELLIPSIS character (\c[2026]), not three separate dots. The connotation of the ellipses is "...then keep on formatting from where you previously left off, remembering there's probably still more to come...". And the colons are the ASCII symbol most like a single character ellipsis (try tilting your head and squinting).

Follow-on fields are most useful when we want to split a formatting task into distinct stages – or iterations – but still allow the contents of the follow-on field to flow uninterrupted from line to line. For example:

    print "The best Shakespearean roles are:\n\n";

    for @roles -> $role {
        print form "   * {<<<<<<<<<<<<<<<<<<<<<<<<<<<<}   *{E<hellip><<<<<<<>>>>>>>E<hellip>}*",
                         $role,                            $disclaimer;
    }

which produces:

    The best Shakespearean roles are:

       * Macbeth                          *WARNING:          *
       * King Lear                        *This list of roles*
       * Juliet                           *constitutes      a*
       * Othello                          *personal   opinion*
       * Hippolyta                        *only and is in  no*
       * Don John                         *way  endorsed   by*
       * Katerina                         *Shakespeare'R'Us. *
       * Richard                          *It   may   contain*
       * Malvolio                         *nuts.             *
       * Bottom                           *                  *

The multiple calls to form manage to produce a coherent disclaimer because the ellipses in the second field tell each call to start extracting data from $disclaimer at the offset indicated by $disclaimer.pos, and then to update $disclaimer.pos with the final position at which the field extracted data. So the next time form is called, the follow-on field starts extracting from where it left off in the previous call.

Follow-on fields are similar to ^<<<<< fields in a Perl 5 format, except they don't destroy the contents of a data source; they merely change that data source's .pos marker.

Therefore, put you in your best array...

Data, especially numeric data, is often stored in arrays. So form also accepts arrays as data arguments. It can do so because its parameter list is defined as:

        sub form (Str|Array|Pair *@args is context(Scalar)) {...}

which means that although its arguments may include one or more arrays, each such array argument is nevertheless evaluated in a scalar context. Which, in Perl 6, produces an array reference.

In other words, array arguments don't get flattened automatically, so form doesn't losing track of where in the argument list one array finishes and the next begins.

Once inside form, each array that was specified as the data source for a field is internally converted to a single string by joining it together with a newline between each element.

The upshot is that, instead of:

    print "The best Shakespearean roles are:\n\n";

    for @roles -> $role {
        print form "   * {<<<<<<<<<<<<<<<<<<<<<<<<<<<<}   *{E<hellip><<<<<<<>>>>>>>E<hellip>}*",
                         $role,                            $disclaimer;
    }

we could just write:

    print "The best Shakespearean roles are:\n\n";

    print form "   * {[[[[[[[[[[[[[[[[[[[[[[[[[[[[}   *{[[[[[[[[]]]]]]]]}*",
                     @roles,                           $disclaimer;

And the array of roles would be internally converted to a single string, with one role per line. Note that we also changed the disclaimer field to a regular block field, so that the entire disclaimer would be formatted. And there was no longer any need for the disclaimer field to be a follow-on field, since the block field would extract and format the entire disclaimer anyway.

Note, however, that this block-based approach wouldn't work so well if one of the elements of @roles was too big to fit on a single line. In that case we might end up with something like the following:

   The best Shakespearean roles are:

      * Either of the 'two foolish             *WARNING:          *
      * officers': Dogberry and Verges         *This list of roles*
      * That dour Scot, the Laird              *constitutes      a*
      * Macbeth                                *personal   opinion*
      * The tragic Moor of Venice,             *only and is in  no*
      * Othello                                *way  endorsed   by*
      * Rosencrantz's good buddy               *Shakespeare'R'Us. *
      * Guildenstern                           *It   may   contain*
      * The hideous and malevolent             *nuts.             *
      * Richard III                            *                  *

rather than:

   The best Shakespearean roles are:

      * Either of the 'two foolish             *WARNING:          *
        officers': Dogberry and Verges         *This list of roles*
      * That dour Scot, the Laird              *constitutes      a*
        Macbeth                                *personal   opinion*
      * The tragic Moor of Venice,             *only and is in  no*
        Othello                                *way  endorsed   by*
      * Rosencrantz's good buddy               *Shakespeare'R'Us. *
        Guildenstern                           *It   may   contain*
      * The hideous and malevolent             *nuts.             *
        Richard III                            *                  *

That's because the "*" that's being used as a bullet for the first column is a literal (i.e. mere decoration), and so it will be repeated on every line that is formatted, regardless of whether that line is the start of a new element of @roles or merely the broken-and-wrapped remains of the previous element. Happily, as we shall see later, this particular problem has a simple solution.

Despite these minor complications, array data sources are particularly useful when formatting, especially if the data is known to fit within the specified width. For example:

    print form
        '-------------------------------------------',   
        'Name             Score   Time  | Normalized',   
        '-------------------------------------------',   
        '{[[[[[[[[[[[[}   {III}   {II}  |  {]]].[[} ',
         @name,           @score, @time,   [@score E<raquo>/E<laquo> @time];

is a very easy way to produce the table:

    -------------------------------------------
    Name             Score   Time  | Normalized
    -------------------------------------------
    Thomas Mowbray    88      15   |     5.867
    Richard Scroop    54      13   |     4.154
    Harry Percy       99      18   |     5.5  

Note the use of the Perl6-ish listwise division (»/«) to produce the array of data for the "Normalized" column.

More particulars must justify my knowledge...

The most commonly used fields are those that justify their contents: to the left, to the right, to the left and right, or towards the centre.

Left-justified and right-justified fields extract from their data source the largest substring that will fit inside them, push that string to the left or right as appropriate, and then pad the string out to the required field width with spaces (or the nominated fill character).

Centred fields ({>>>><<<<} and {]]]][[[[}) likewise extract as much data as possible, and then pad both sides of it with (near) equal numbers of spaces. If the amount of padding required is not evenly divisible by 2, the one extra space is added after the data.

There is a second syntax for centred fields – a tip-o'-the-hat to Perl 5 formats: {|||||||||} and {IIIIIIII}. This variant also makes it easier to specify centering fields that are only three columns wide: {|} and {I}.

Note, however, that the behaviour of centering fields specified this way is exactly the same in every respect as the bracket-based versions, so we're free to use whichever we prefer.

Fully justified fields ({<<<<>>>>} and {[[[[]]]]}) extract a maximal substring and then distribute any padding as evenly as possible into the existing whitespace gaps in that data. For example:

    print form '({<<<<<<<<<>>>>>>>>>>>})',
               "A fellow of infinite jest, of most excellent fancy";

would print:

    (A fellow  of  infinite)

A fully-justified block field ({[[[[]]]]}) does the same across multiple lines, except that the very last line is always left-justified. Hence, this:

    print form '({[[[[[[[[]]]]]]]})',
               "All the world's a stage, And all the men and women merely players."

would print:

    (All the world's a)
    (stage,  And   all)
    (the men and women)
    (merely players.  )

By the way, with both centred fields ({>>>><<<}) and fully justified fields ({<<<>>>>}), the actual number of left vs right arrows is irrelevant, so long as there is at least one of each.

What, is't too short?

One special case we need to consider is an empty set of field delimiters:

    form 'ID number: {}'

This specification is treated as a two-column-wide, left-justified block field (since that seems to be the type of two-column-wide field most often required).

Other kinds of two-column (and single-column) fields can also be created using imperative field widths and and user-defined fields.

Command our present numbers be muster'd...

A field specifier of the form {>>>>.<<} or {]]]].[[} represents a decimal-aligned numeric field. The decimal marker always appears in exactly the position indicated and the rest of the number is aligned around it. The decimal places are rounded to the specific number of places indicated, but only "significant" digits are shown. For example:

    @nums = (1, 1.2, 1.23, 11.234, 111.235, 1.0001);

    print form "Thy score be: {]]]].[[}",
                              @nums; 

prints:

    Thy score be:     1.0
    Thy score be:     1.2
    Thy score be:     1.23
    Thy score be:    11.234
    Thy score be:   111.235
    Thy score be:     1.000

The points are all aligned, the minimal number of decimal places are shown, and the decimals are rounded (using the same rounding protocol that printf employs). Note in particular that, even though both 1 and 1.0001 would normally convert to the same 3-decimal-place value (1.000), a form call only shows all three zeros in the second case since only in the second case are they "significant".

In other words, unless we tell it otherwise, form tries to avoid displaying a number with more accuracy than it actually possesses (within the constraint that it must always show at least one decimal place).

Here are only numbers ratified.

You're probably wondering what happens if we try to format a number that's too large for the available places (as 123456.78 would be in the above format). Whereas sprintf would extend a numeric field to accommodate the number, form insists on preserving the specified layout; in particular, the position of the decimal point. But it obviously can't just cut off the extra high-order digits; that would change the value:

    Thy score be: 23456.78 

So, instead, it indicates that the number doesn't fit by filling the field with octothorpes (the way many spreadsheets do):

    Thy score be: #####.###

Note, however, that it is possible to change this behaviour should we need to.

It's also possible that someone (not you, of course!) might attempt to pass a numeric field some data that isn't numeric at all:

    my @mixed_data = (1, 2, "three", {4=>5}, "6", "7-Up");

    print form 'Thy score be: {]]]].[[}', 
                              @mixed_data;

Unlike Perl itself, form doesn't autoconvert non-numeric values. Instead it marks them with another special string, by filling the field with question-marks:

    Thy score be:     1.0  
    Thy score be:     2.0  
    Thy score be: ?????.???
    Thy score be: ?????.???
    Thy score be:     6.0  
    Thy score be: ?????.???

Note that strings per se aren't a problem – form will happily convert strings that contain valid numbers, such as "6" in the above example. But it does reject strings that contain anything else besides a number (even when Perl itself would successfully convert the number – as it would for "7-Up" above).

Those who'd prefer Perl's usual, more laissez-faire attitude to numerical conversion can just pre-numerify the values themselves using the unary numerification operator (shown here in its list form – – since we have an array of values to be numerified):

    print form 'Thy score be: {]]]].[[}',
                              +E<laquo> @mixed_data;

This version would print:

    Thy score be:     1.0  
    Thy score be:     2.0  
    Thy score be:     0.0  
    Thy score be:     1.0  
    Thy score be:     6.0  
    Thy score be:     7.0  

(The 1.0 on the fourth line appears because Perl 6 hashes numerify to the number of entries they contain).

See how the giddy multitude do point...

Of course, not everyone uses a dot for their decimal point. The other main contender is the comma, and naturally form supports that as well. If we specify a numeric field with a comma between the brackets:

    @les_nums = (1, 1.2, 1.23, 11.234, 111.235, 1.0001);

    print form 'Votre score est: {]]]],[[}',
                                 @les_nums; 

the call prints:

    Votre score est:     1,0
    Votre score est:     1,2
    Votre score est:     1,23
    Votre score est:    11,234
    Votre score est:   111,235
    Votre score est:     1,000

In fact, form is extremely flexible about the characters we're allowed to use as a decimal marker: anything except an angle- or square bracket or a plus sign is acceptable.

As a bonus, form allows us to use the specified decimal marker in the data as well as in the format. So this works too:

    @les_nums = ("1", "1,2", "1,23", "11,234", "111,235", "1,0001");

    print form 'Vos score est: {]]]],[[}',
                               @les_nums; 

Or else be impudently negative...

Negative numbers work as expected, with the minus sign taking up one column of the field's allotted span:

    @nums = ( 1, -1.2,  1.23, -11.234,  111.235, -12345.67);

    print form 'Thy score be: {]]]].[[}',
                              @nums;

This would print:

    Thy score be:     1.0  
    Thy score be:    -1.2  
    Thy score be:     1.23 
    Thy score be:   -11.234
    Thy score be:   111.235
    Thy score be: #####.###

However, form can also format numbers so that the minus sign trails the number. To do that we simple put an explicit minus sign inside the field specification, at the end:

    print form 'Thy score be: {]]]].[[-}',
                              @nums;

which would then print:

    Thy score be:     1.0   
    Thy score be:     1.2-  
    Thy score be:     1.23  
    Thy score be:    11.234-
    Thy score be:   111.235 
    Thy score be: 12345.67- 

form also understands the common financial usage where negative numbers are represented as positive numbers in parentheses. Once again, we draw an abstract picture of what we want (by putting parens at either end of the field specification):

    print form 'Thy dividend be: {(]]]].[[)}',
                                 @nums;

and form obliges:

    Thy dividend be:      1.0   
    Thy dividend be:     (1.2)  
    Thy dividend be:      1.23  
    Thy dividend be:    (11.234)
    Thy dividend be:    111.235 
    Thy dividend be: (12345.67) 

Note that the parens have to go inside the field's braces. Otherwise, they're just literal parts of the format string:

    print form 'Thy dividend be: ({]]]].[[})',
                                  @nums;

and we'd get:

    Thy dividend be: (    1.0  ) 
    Thy dividend be: (   -1.2  )  
    Thy dividend be: (    1.23 )
    Thy dividend be: (  -11.234)
    Thy dividend be: (  111.235)
    Thy dividend be: (#####.###) 

And stand a comma 'tween their amities...

If we add so-called "thousands separators" inside a numeric field at the usual places, form includes them appropriately in its output. It can handle the five major formatting conventions:

    my @nums = (0, 1, 1.1, 1.23, 4567.89, 34567.89, 234567.89, 1234567.89);

    print form
        "Brittannic      Continental     Subcontinental   Tyrolean        Asiatic",
        "_____________   _____________   ______________   _____________   _____________",
        "{],]]],]]].[}   {].]]].]]],[}    {]],]],]]].[}   {]']]]']]],[}   {]]]],]]]].[}",
         @nums,          @nums,          @nums,           @nums,          @nums;

to produce:

    Brittannic      Continental     Subcontinental   Tyrolean        Asiatic
    _____________   _____________   ______________   _____________   _____________
             0.0             0,0              0.0             0,0             0.0 
             1.0             1,0              1.0             1,0             1.0 
             1.1             1,1              1.1             1,1             1.1 
             1.23            1,23             1.23            1,23            1.23
         4,567.89        4.567,89         4,567.89        4'567,89         4567.89
        34,567.89       34.567,89        34,567.89       34'567,89       3,4567.89
       234,567.89      234.567,89      2,34,567.89      234'567,89      23,4567.89
     1,234,567.89    1.234.567,89     12,34,567.89    1'234'567,89     123,4567.89

It also accepts a space character as a "thousands separator" (with, of course, any decimal marker we might like):

    print form
        "Hyperspatial",
        "_____________",
        "{] ]]] ]]]:[}",
         @nums;

to produce:

    Hyperspatial
    _____________
             0:0 
             1:0 
             1:1 
             1:23
         4 567:89
        34 567:89
       234 567:89
     1 234 567:89


And gives to airy nothing a local habitation and a name

Of course, sometimes we don't know ahead of time just where in the world our formatted numbers will be displayed. Locales were invented to address that very problem, and form supports them.

If we use the :locale option, form detects the current locale and converts any numerical formats it finds to the appropriate layout. For example, if we wrote:

    @nums = ( 1, -1.2,  1.23, -11.234,  111.235, -12345.67);

    print form 
            "{],]]],]]].[[}",
            @nums; 

then we'd get:

          1.0
         -1.2
          1.23
        -11.234
        111.235
    -12,345.67

wherever the program was run. But if we had written:

    print form
            :locale,
            "{],]]],]]].[[}",
            @nums; 

then we'd get:

          1.0
         -1.2
          1.23
        -11.234
        111.235
    -12,345.67

or:

          1,0
          1,2-
          1,23
         11,23-
        111,235
     12.345,67-

or:

          1,0
         (1,2)
          1,23
        (11,23)
        111,235
    (12'345,67)

or whatever else the current locale indicated was the correct local layout for numbers.

That is, when the :locale option is specified, form ignores the actual decimal point, thousands separator, and negation sign we specified in the call, and instead uses the values for these markers that are returned by the POSIX localeconv function. That means that we can specify our numerical formatting in a style that seems natural to us, and at the same time allow the numbers to be formatted in a style that seems natural to the user.

Thou shalt have my best gown to make thee a pair...

Wait a minute...

Where exactly did we conjure that :locale syntax from? And what, exactly, did it create? What is an "option"?

Well, we're passing :locale as an argument to form, and form's signature guarantees us that it can only accept a Str, or an Array, or a Pair as an argument. So an "option" must be one of those three types, and that funky :identifier syntax must be a constructor for the equivalent data structure.

And indeed, that's the case. An "option" is just a pair, and the funky :identifier syntax is just another way of writing a pair constructor.

The standard "option" syntax is:

    :key( "value" )

which is identical in effect to:

    key => "value"

Both specify an autoquoted key; both associate that key with a value; both evaluate to a pair object that contains the key and value. So why have a second syntax for pairs?

Because it allows us to optimize the pair constructor syntax in two different ways. The now-familiar "fat arrow" pair constructor takes a key and a value, each of which can be of any type. In contrast, the key of an "option" pair constructor can only be an identifier, which is always autoquoted...at compile-time. So, if we use the "option" syntax we're guaranteed that the key of the resulting pair is a string, that the string that contains a valid identifier, and that the compiler can check that validity before the program starts.

Moreover, whereas the "fat arrow" has only one syntax, "options" have several highly useful syntactic variations. For example, "fat arrow" pairs can be especially annoying when we want to use them to pass named boolean arguments to a subroutine. For example:

    duel( $person1, $person2, to_death=>1, no_quarter=>1, left_handed=>1, bonetti=>1, capoferro=>1 );

In contrast, "options" have a special default behaviour. If we leave off their parenthesized value entirely, the implied value is 1. So we could rewrite the preceding function call as:

    duel( $person1, $person2, :to_death, :no_quarter, :left_handed, :bonetti, :capoferro );

Better still, when we have a series of options, we don't have to put commas between them:

    duel( $person1, $person2, :to_death :no_quarter :left_handed :bonetti :capoferro );

That makes them even more concise and uncluttered, especially in use statements:

    use POSIX :errno_h :fcntl_h :time_h;

There are other handy "option" variants as well, all of which simply substitute the parentheses following their key for some other kind of bracket (and hence some other kind of value). The full list of "option"...err...options is:

      Option syntax              Is equivalent to
    ==================     =============================

    :key("some value")     key => "some value"

    :key                   key => 1

    :key{ a=>1, b=>2 }     key => { a=>1, b=>2 }

    :key{ $^arg * 2; }     key => { $^arg * 2; }

    :key[ 1, 2, 3, 4 ]     key => [ 1, 2, 3, 4 ]

    :keyE<laquo>eat at Joe'sE<raquo>     key => ["eat", "at", "Joe's"]

Despite the deliberate differences in conciseness and flexibility, we can use "options" and "fat arrows" interchangeably in almost every situation where we need to construct a pair (except, of course, where the key needs to be something other than an identifier string, in which case the "fat arrow" is the only alternative). To illustrate that interchangeability, we'll use the "option" syntax throughout most of the rest of this discussion, except where using a "fat arrow" is clearly preferable for code readability.

Meanwhile, back in the fields...

Some tender money to me...

Formatting numbers gets even trickier when those numbers represent money. But form simply lets us specify how the local currency looks – including leading, trailing, or infix currency markers; leading, trailing, or circumfix negation markers; thousands separators; etc. – and then it formats it that way. For example:

    my @amounts = (0, 1, 1.2345, 1234.56, -1234.56, 1234567.89);

    my %format = (
        "Canadian (English)"    => q/   {-$],]]],]]].[}/,
        "Canadian (French)"     => q/    {-] ]]] ]]],[ $}/,
        "Dutch"                 => q/     {],]]],]]].[-EUR}/,
        "German (pre-euro)"     => q/    {-].]]].]]],[DM}/,
        "Indian"                => q/    {-]],]],]]].[ Rs}/,
        "Norwegian"             => q/ {kr -].]]].]]],[}/,
        "Portuguese (pre-euro)" => q/    {-].]]].]]]$[ Esc}/,
        "Swiss"                 => q/{Sfr -]']]]']]].[}/,
    );

    for %format.kv -> $nationality, $layout {
        print form "$nationality:",
                   "    $layout",
                        @amounts,
                   "\n";
    }

produces:

    Swiss:
                  Sfr 0.0 
                  Sfr 1.0 
                  Sfr 1.23
              Sfr 1'234.56
             Sfr -1'234.56
          Sfr 1'234'567.89

    Canadian (French):
                      0,0 $ 
                      1,0 $ 
                      1,23 $
                  1 234,56 $
                 -1 234,56 $
              1 234 567,89 $

    Dutch:
                      0.0EUR  
                      1.0EUR  
                      1.23EUR 
                  1,234.56EUR 
                  1,234.56-EUR
              1,234,567.89EUR 

    Norwegian:
                   kr 0,0 
                   kr 1,0 
                   kr 1,23
               kr 1.234,56
              kr -1.234,56
           kr 1.234.567,89

    German (pre-euro):
                      0,0DM 
                      1,0DM 
                      1,23DM
                  1.234,56DM
                 -1.234,56DM
              1.234.567,89DM

    Indian:
                      0.0 Rs 
                      1.0 Rs 
                      1.23 Rs
                  1,234.56 Rs
                 -1,234.56 Rs
              12,34,567.89 Rs

    Portuguese (pre-euro):
                      0$0 Esc 
                      1$0 Esc 
                      1$23 Esc
                  1.234$56 Esc
                 -1.234$56 Esc
              1.234.567$89 Esc

    Canadian (English):
                     $0.0 
                     $1.0 
                     $1.23
                 $1,234.56
                -$1,234.56
             $1,234,567.89

Nice, eh?

Able verbatim to rehearse...

But sometimes too nice. Sometimes all we want is an existing block of data laid out into columns – without any fancy reformatting or rejustification. For example, suppose we have an interesting string like this:

    $diagram = <<EODNA;
       G==C
         A==T
           T=A
           A=T
         T==A
       G===C
      T==A
     C=G
    TA
    AT
     A=T
      T==A
        G===C
          T==A
    EODNA

and we'd like to put beside some other text. Because it's already carefully formatted, we really don't want to interpolate it into a left-justified field:

    print form
        '{[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]}       {[[[[[[[[[[[[[[[}',
         $diatribe,                                        $diagram;

Because that would squash our lovely helix:

    Men at  some  time  are  masters  of  their       G==C             
    fates: / the fault, dear Brutus, is not  in       A==T             
    our genes, / but in ourselves, that we  are       T=A              
    underlings.  /  Brutus  and  Caesar:   what       A=T              
    should be in that 'Caesar'?  /  Why  should       T==A             
    that DNA be sequenced more  than  yours?  /       G===C            
    Extract them together, yours is as  fair  a       T==A             
    genome; / transcribe them, it  doth  become       C=G              
    mRNA as well; / recombine them,  it  is  as       TA               
    long; clone with 'em, / Brutus will start a       AT               
    twin as soon as Caesar. / Now, in the names       A=T              
    of all  the  gods  at  once,  /  upon  what       T==A             
    proteins doth our Caesar feed, / that he is       G===C            
    grown so great?                                   T==A             

Nor would right-, full-, centre- or numeric- justification help in this instance. What we really need is "leave-it-the-hell-alone" justification – a field specifier that lays out the data exactly as it is, leading whitespace included.

And that's the purpose of a verbatim field. A verbatim single-line field ({'''''''''}) grabs the next line of data it's offered and inserts as much of it as will fit in the field's width, preserving whitespace "as is". Likewise a verbatim block field ({"""""""""}) grabs every line of the data it's offered and interpolates it into the text without any reformatting or justification.

And that's precisely what we needed for our diagram:

    print form
        '{[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]}       {"""""""""""""""}',
         $diatribe,                                        $diagram;

to produce:

    Men at  some  time  are  masters  of  their          G==C          
    fates: / the fault, dear Brutus, is not  in            A==T        
    our genes, / but in ourselves, that we  are              T=A       
    underlings.  /  Brutus  and  Caesar:   what              A=T       
    should be in that 'Caesar'?  /  Why  should            T==A        
    that DNA be sequenced more  than  yours?  /          G===C         
    Extract them together, yours is as  fair  a         T==A           
    genome; / transcribe them, it  doth  become        C=G             
    mRNA as well; / recombine them,  it  is  as       TA               
    long; clone with 'em, / Brutus will start a       AT               
    twin as soon as Caesar. / Now, in the names        A=T             
    of all  the  gods  at  once,  /  upon  what         T==A           
    proteins doth our Caesar feed, / that he is           G===C        
    grown so great?                                         T==A       

Note that, unlike other types of fields, verbatim fields don't break and wrap their data if that data doesn't fit on a single line. Instead, they truncate each line to the appropriate field width. So a too-short verbatim field:

    print form
        '{[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]}       {""""""}',
         $diatribe,                                        $diagram;

results in gene slicing:

    Men at  some  time  are  masters  of  their          G==C 
    fates: / the fault, dear Brutus, is not  in            A==
    our genes, / but in ourselves, that we  are              T
    underlings.  /  Brutus  and  Caesar:   what              A
    should be in that 'Caesar'?  /  Why  should            T==
    that DNA be sequenced more  than  yours?  /          G===C
    Extract them together, yours is as  fair  a         T==A  
    genome; / transcribe them, it  doth  become        C=G    
    mRNA as well; / recombine them,  it  is  as       TA      
    long; clone with 'em, / Brutus will start a       AT      
    twin as soon as Caesar. / Now, in the names        A=T    
    of all  the  gods  at  once,  /  upon  what         T==A  
    proteins doth our Caesar feed, / that he is           G===
    grown so great?                                         T=

rather than teratogenesis:

    Men at  some  time  are  masters  of  their          G==C 
    fates: / the fault, dear Brutus, is not  in            A=-
    our genes, / but in ourselves, that we  are       =T      
    underlings.  /  Brutus  and  Caesar:   what              -
    should be in that 'Caesar'?  /  Why  should       T=A     
    that DNA be sequenced more  than  yours?  /              -
    Extract them together, yours is as  fair  a       A=T     
    genome; / transcribe them, it  doth  become            T=-
    mRNA as well; / recombine them,  it  is  as       =A      
    long; clone with 'em, / Brutus will start a          G===C
    twin as soon as Caesar. / Now, in the names         T==A  
    of all  the  gods  at  once,  /  upon  what        C=G    
    proteins doth our Caesar feed, / that he is       TA      
    grown so great?                                  AT      
                                                   A=T    
                                                    T==A  
                                                      G==-
                                                  =C      
                                                        T-
                                                  ==A     

And now at length they overflow their banks.

It's not uncommon for a report to need a series of data fields in one column and then a second column with only single field, perhaps containing a summary or discussion of the other data. For example, we might want to produce recipes of the form:

    =================[  Hecate's Broth of Ambition  ]=================

      Preparation time:             Method:                           
         66.6 minutes                  Remove the legs from the       
                                       lizard, the wings from the     
      Serves:                          owlet, and the tongue of the   
         2 doomed souls                adder. Set them aside.         
                                       Refrigerate the remains (they  
      Ingredients:                     can be used to make a lovely   
         2 snakes (1 fenny, 1          white-meat stock). Drain the   
         adder)                        newts' eyes if using pickled.  
         2 lizards (1 legless,         Wrap the toad toes in the      
         1 regular)                    bat's wool and immerse in half 
         3 eyes of newt (fresh         a pint of vegan stock in       
         or pickled)                   bottom of a preheated          
         2 toad toes (canned           cauldron. (If you can't get a  
         are fine)                     fresh vegan for the stock, a   
         2 cups of bat's wool          cup of boiling water poured    
         1 dog tongue                  over a vegetarian holding a    
         1 common or spotted           sprouted onion will do). Toss  
         owlet                         in the fenny snake, then the   
                                       legless lizard. Puree the      
                                       tongues together and fold      
                                       gradually into the mixture,    
                                       stirring widdershins at all   
                                       times.  Allow to bubble for 45 
                                       minutes then decant into two   
                                       tarnished copper chalices.         
                                       Garnish each with an owlet     
                                       wing, and serve immediately.   

There are several ways to achieve that effect. The most obvious is to format each column separately and then lay them out side-by-side with a pair of verbatim fields:

    my $prep = form 'Preparation time:        ',
                    '   {<<<<<<<<<<<<<<<<<<<<}', $prep_time,
                    '                         ',
                    'Serves:                  ',
                    '   {<<<<<<<<<<<<<<<<<<<<}', $serves,
                    '                         ',
                    'Ingredients:             ',
                    '   {[[[[[[[[[[[[[[[[[[[[}', $ingredients;

    my $make = form 'Method:                          ',
                    '   {[[[[[[[[[[[[[[[[[[[[[[[[[[[[}',
                        $method;

    print form 
        '=================[ {||||||||||||||||||||||||||} ]=================',
                                      $recipe,
        '                                                                  ',
        '  {"""""""""""""""""""""""}     {"""""""""""""""""""""""""""""""} ',
           $prep,                        $make;

We could even chain the calls to form to eliminate the interim variables:

    print form 
        '=================[ {||||||||||||||||||||||||||} ]=================',
                                      $recipe,
        '                                                                  ',
        '  {"""""""""""""""""""""""}     {"""""""""""""""""""""""""""""""} ',
           form('Preparation time:        ',
                '   {<<<<<<<<<<<<<<<<<<<<}', $prep_time,
                '                         ',
                'Serves:                  ',
                '   {<<<<<<<<<<<<<<<<<<<<}', $serves
                '                         ',
                'Ingredients:             ',
                '   {[[[[[[[[[[[[[[[[[[[[}', $ingredients,
               ),
           form('Method:                          ',
                '   {[[[[[[[[[[[[[[[[[[[[[[[[[[[[}',
                    $method,
               );

While it's impressive to be able to do that kind of nested formatting (and highly useful in extreme formatting scenarios), it's also far too ungainly for regular use. A cleaner, more maintainable solution is use a single format and just build the method column up piecemeal, like so:

    print form 
        '=================[ {||||||||||||||||||||||||||} ]=================',
                                      $recipe,
        '                                                                  ',
        'Preparation time:               Method:                           ',
        '   {<<<<<<<<<<<<<<<<<<<<}          {<<<<<<<<<<<<<<<<<<<<<<<<<<<E<hellip>} ',
            $prep_time,                     $method,
        '                                   {E<hellip><<<<<<<<<<<<<<<<<<<<<<<<<<E<hellip>} ',
                                            $method,
        'Serves:                            {E<hellip><<<<<<<<<<<<<<<<<<<<<<<<<<E<hellip>} ',
                                            $method,
        '   {<<<<<<<<<<<<<<<<<<<<}          {E<hellip><<<<<<<<<<<<<<<<<<<<<<<<<<E<hellip>} ',
            $serves,                        $method,
        '                                   {E<hellip><<<<<<<<<<<<<<<<<<<<<<<<<<E<hellip>} ',
                                            $method,
        'Ingredients:                       {E<hellip><<<<<<<<<<<<<<<<<<<<<<<<<<E<hellip>} ',
                                            $method,
        '   {[[[[[[[[[[[[[[[[[[[[}          {E<hellip>[[[[[[[[[[[[[[[[[[[[[[[[[[[} ',
            $ingredients,                   $method;

That produces exactly the same result as the previous versions, because each follow-on {…<<<<<<<…} field in the "Method" column grabs one extra line from $method, and then the final follow-on {…[[[[[[} field grabs as many more as are required to lay out the rest of the contents of the variable. The only down-side is that the resulting code is still downright ugly. With all those tedious repetitions of the same variable, there's far too much $method in our madness.

Having a series of follow-on fields like this – vertically continuing a single column across subsequent format lines – is so common that form provides a special shortcut: the {VVVVVVVVV} overflow field.

An overflow field automagically duplicates the field specification immediately above it. The important point being that, because that duplication includes copying the preceding field's data source, overflow fields don't require a separate data source of their own.

Using overflow fields, we could rewrite our quotation generator like this:

    print form 
        '=================[ {||||||||||||||||||||||||||} ]=================',
                                      $recipe,
        '                                                                  ',
        'Preparation time:               Method:                           ',
        '   {<<<<<<<<<<<<<<<<<<<<}          {<<<<<<<<<<<<<<<<<<<<<<<<<<<<} ',
            $prep_time,                     $method,
        '                                   {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ',
        'Serves:                            {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ',
        '   {<<<<<<<<<<<<<<<<<<<<}          {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ',
            $serves,                        
        '                                   {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ',
        'Ingredients:                       {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ',
        '   {[[[[[[[[[[[[[[[[[[[[}          {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ',
            $ingredients,
        '                                   {VVVVVVVVVVVVVVVVVVVVVVVVVVVV} ';

Which would once again produce the recipe shown earlier.

Note that the overflow fields interact equally well in formats with single-line and block fields. That's because block overflow fields have one other special feature: they're non-greedy. Unless we specify otherwise, all types of block fields will consume their entire data source. For example, if we wrote:

    print form :layoutE<laquo>acrossE<raquo>,
         '{<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>E<hellip>}',
                                  $speech,
         '{E<hellip><<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>E<hellip>}',
                                  $speech,
         '{E<hellip>[[[[[]]]]]E<hellip>}   {="""""""""""""""""""=}   {E<hellip>[[[[[]]]]]]E<hellip>}',
             $speech,             $advert,              $speech,
         '{E<hellip>[[[[[[[[[[[[[[[[[[[[[[[[[]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]}',
                                  $speech;

we'd get:

    Now is the winter of our discontent / Made glorious summer
    by this sun of York; / And all the clouds that lour'd upon
    our house / In                             the deep  bosom
    of  the  ocean                             buried.  /  Now
    are our  brows                             bound      with
    victorious                                 wreaths; /  Our
    bruised   arms                             hung   up   for
    monuments;   /                             Our       stern
    alarums          +---------------------+   changed      to
    merry            |                     |   meetings, / Our
    dreadful         | Eat at Mrs Miggins! |   marches      to
    delightful       |                     |   measures. Grim-
    visaged    war   +---------------------+   hath   smooth'd
    his   wrinkled                             front;  /   And
    now,   instead                             of     mounting
    barded  steeds                             / To fright the
    souls       of                             fearful        
    adversaries, /                             He       capers
    nimbly  in   a                             lady's chamber.

That's because the two {…[[[[[]]]]]…} block fields on either side of the verbatim advertisement field will eat all the data in $speech, leaving nothing for the final format. Then the advertisement will be centred on the two resulting columns of text.

But, block overflow fields are different. They only take as many lines as are required to fill the lines generated by the non-overflow fields in their format. So, if we changed our code to use overflows:

    print form :layoutE<laquo>acrossE<raquo>
         '{<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>}', $speech,
         '{VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}',
         '{VVVVVVVVVVVV}   {="""""""""""""""""""=}   {VVVVVVVVVVVVV}', $advert,
         '{VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}';

we get both a cleaner specification and a more elegant result:

    Now is the winter of our discontent / Made glorious summer
    by this sun of York; / And all the clouds that lour'd upon
    our house / In                             the deep  bosom
    of  the  ocean   +---------------------+   buried.  /  Now
    are our  brows   |                     |   bound      with
    victorious       | Eat at Mrs Miggins! |   wreaths; /  Our
    bruised   arms   |                     |   hung   up   for
    monuments;   /   +---------------------+   Our       stern
    alarums                                    changed      to
    merry meetings,  /  Our  dreadful  marches  to  delightful
    measures. Grim-visaged  war  hath  smooth'd  his  wrinkled
    front; / And now, instead of mounting barded steeds  /  To
    fright the souls  of  fearful  adversaries,  /  He  capers
    nimbly in a lady's chamber.

Notice that, in the third format line of the previous example, the two overflow fields on either side of the advertisement are each overflowing from the single field that's above both of them. This kind of multiple overflow is fine, but it does require that we specify how the various fields overflow (i.e. as two separate columns of text, or – as in this case – as a single, broken column across the page). That's the purpose of the :layout«across» option on the first line. This option is explained in detail below.

The {VVVVVVVV} fields only consumed as much data from $speech as was required to sandwich the output lines created by the verbatim advertisement. This feature is important, because it means we can lay out a series of block fields in one column and a single overflowed field in another column without introducing ugly gaps. For example, because the {VVVVVVVVV} fields in:

    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ", $name,
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}", $bio,
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $status,
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $comments;

only consume as much of the overflowing $bio field as necessary, the result is something like:

    Name:                                                  
      William                                             
      Shakespeare                                         
                      Biography:                          
    Status:             William Shakespeare was born on    
      Deceased (1564    April 23, 1564 in Strathford-upon- 
      -1616)            Avon, England; he was third of     
                        eight children from Father John    
    Comments:           Shakespeare and Mother Mary Arden. 
      Theories          Shakespeare began his education at 
      abound as to      the age of seven when he probably  
      the true          attended the Strathford grammar    
      author of his     school. The school provided        
      plays. The        Shakespeare with his formal        
      prime             education. The students chiefly    
      alternative       studied Latin rhetoric, logic, and 
      candidates        literature. His knowledge and      
      being Sir         imagination may have come from his 
      Francis           reading of ancient authors and     
      Bacon,            poetry. In November 1582,          
      Christopher       Shakespeare received a license to  
      Marlowe, or       marry Anne Hathaway. At the time of
      Edward de         their marriage, Shakespeare was 18 
      Vere              years old and Anne was 26. They had
                        three children, the oldest Susanna,
                        and twins- a boy, Hamneth, and a   
                        girl, Judith. Before his death on    
                        April 23 1616, William Shakespeare 
                        had written thirty-seven plays. He 
                        is generally considered the        
                        greatest playwright the world has  
                        ever known and has always been the 
                        world's most popular author.       

If {VVVVVVVVVVV} fields ate their entire data – the way {[[[[[[[[[} or {IIIIIIIIII} fields do – then the output would be much less satisfactory. The first block overflow field for $bio would have to consume the entire biography, before the comments field was even reached. So our output would be something like:

    Name:                                                                
      William                                                
      Shakespeare                                            
                      Biography:                          
    Status:             William Shakespeare was born on    
      Deceased (1564    April 23, 1564 in Strathford-upon- 
      -1616)            Avon, England; he was third of     
                        eight children from Father John    
                        Shakespeare and Mother Mary Arden. 
                        Shakespeare began his education at 
                        the age of seven when he probably  
                        attended the Strathford grammar    
                        school. The school provided        
                        Shakespeare with his formal        
                        education. The students chiefly    
                        studied Latin rhetoric, logic, and 
                        literature. His knowledge and      
                        imagination may have come from his 
                        reading of ancient authors and     
                        poetry. In November 1582,          
                        Shakespeare received a license to  
                        marry Anne Hathaway. At the time of
                        their marriage, Shakespeare was 18 
                        years old and Anne was 26. They had
                        three children, the oldest Susanna,
                        and twins- a boy, Hamneth, and a   
                        girl, Judith. Before his death on  
                        April 23 1616, William Shakespeare 
                        had written thirty-seven plays. He 
                        is generally considered the        
                        greatest playwright the world has  
                        ever known and has always been the 
                        world's most popular author.       

    Comments:                                               
      Theories                                               
      abound as to                                           
      the true                                               
      author of his                                          
      plays. The                                             
      prime                                                  
      alternative                                            
      candidates                                             
      being Sir                                              
      Francis                                                
      Bacon,                                                 
      Christopher                                            
      Marlowe, or                                            
      Edward de                                              
      Vere                                                   

Which is precisely why {VVVVVVVVVVV} fields don't work that way.

Great floods have flown from simple sources...

When it comes to specifying the data source for each field in a format, form offers several alternatives as to where that data placed, several alternatives as to the order in which that data is extracted, and an option that lets us control how the data is fitted into each field.

A man may break a word with you, sir...

Whenever a field is passed more data than it can accommodate in a single line, form is forced to "break" that data somewhere.

If the field in question is W columns wide, form first squeezes any whitespace (as specified by the user's :ws option) and then looks at the next W columns of the string. (Of course, that might actually correspond to less than W characters if the string contains wide characters. However, for the sake of exposition we'll pretend that all characters are one column wide here.)

form's breaking algorithm then searches for a newline, a carriage return, any other whitespace character, or a hyphen. If it finds a newline or carriage return within the first W columns, it immediately breaks the data string at that point. Otherwise it locates the last whitespace or hyphen in the first W columns and breaks the string immediately after that space or hyphen. If it can't find anywhere suitable to break the string, it breaks it at the (W-1)th column and appends a hyphen.

So, for example:

    $data = "You can play no part but Pyramus;\nfor Pyramus is a sweet-faced man";

    print form "|{[[[[[}|",
                 $data;

prints:

    |You can|
    |play no|
    |part   |
    |but    |
    |Pyramu-|
    |s;     |
    |for    |
    |Pyramus|
    |is a   |
    |sweet- |
    |faced  |
    |man    |

Note the line-breaks after can (at a whitespace), part (after a whitespace), sweet- (after a hyphen), and s; (at a newline). Note too that Pyramus; doesn't fit in the field, so it has to be chopped in two and a hyphen inserted.

Of course, this particular style of line-breaking may not be suitable to all applications, and we might prefer that form use some other algorithm. For example, if form used the TeX breaking algorithm it would have broken Pyramus; less clumsily, yielding:

    |You can|
    |play no|
    |part   |
    |but    |
    |Pyra-  |
    |mus;   |
    |for    |
    |Pyramus|
    |is a   |
    |sweet- |
    |faced  |
    |man    |

To support different line-breaking strategies form provides the :break option. The :break option's value must be a closure/subroutine, which will then be called whenever a data string needs to be broken to fit a particular field width.

That subroutine is passed three arguments: the data string itself, an integer specifying how wide the field is, and a regex indicating which (if any) characters are to be squeezed. It is expected to return a list of two values: a string which is taken as the "broken" text for the field, and a boolean value indicating whether or not any data remains after the break (so form knows when to stop breaking the data string). The subroutine is also expected to update the .pos of the data string to point immediately after the break it has imposed.

For example, if we always wanted to break at the exact width of the field (with no hyphens), we could do that with:

    sub break_width ($data is rw, $width, $ws) {
        given $data {
            # Treat any squeezed or vertical whitespace as a single character
            # (since they'll subsequently be squeezed to a single space)
            my rule single_char { <$ws> | \v+ | . }

            # Give up if there are no more characters to grab...
            return ("", 0) unless m:cont/ (<single_char><1,$width>) /;

            # Squeeze the resultant substring...
            (my $result = $1) ~~ s:each/ <$ws> | \v+ /\c[SPACE]/;

            # Check for any more data still to come...
            my bool $more = m:cont/ <before: .* \S> /;

            # Return the squeezed substring and the "more" indicator...
            return ($result, $more);
        }
    }

    print form
        :break(&break_width),
        "|{[[[[[}|",
          $data;

producing:

    |You can|
    |play no|
    |part bu|
    |t Pyram|
    |us; for|
    |Pyramus|
    |is a sw|
    |eet-fac|
    |ed man |

Or we might prefer to break on every single whitespace-separated word:

    sub break_word ($data is rw, $width, $ws) {
        given $data {
            # Locate the next word (no longer than $width cols)
            my $found = m:cont/ \s* $?word:=(\S<1,$width>) /;

            # Fail if no more words...
            return ("", 0) unless $found{word};

            # Check for any more data still to come...
            my bool $more = m:cont/ <before: .* \S> /;

            # Otherwise, return broken text and "more" flag... 
            return ($found{word}, $more);
        }
    }

    print form
        :break(&break_word),
        "|{[[[[[}|",
          $data;

producing:

    |You    |
    |can    |
    |play   |
    |no     |
    |part   |
    |but    |
    |Pyramus|
    |;      |
    |for    |
    |Pyramus|
    |is     |
    |a      |
    |sweet-f|
    |aced   |
    |man    |

We'll see yet another application of user-defined breaking when we discuss user-defined fields.

He, being in the vaward, placed behind...

There are (at least) three schools of thought when it comes to setting out a call to form that uses more than one format. The "traditional" way (i.e. the way Perl 5 formats do it) is to interleave each format string with a line containing the data it is to interpolate, with each datum aligned directly under the field into which it is to be fitted. Like so:

    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ",
           $name,
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",
                             $bio,
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
           $status,
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
           $comments;

This approach has the advantage that it self-documents: to know what a particular field is supposed to contain, we merely need to look down one line.

It does, however, break up the "abstract picture" that the formats portray, which can make it more difficult to envisage what the final formatted text will look like. So some people prefer to put all the data to the right of the formats:

    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ", $name,
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}", $bio,
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $status,
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", $comments;

And that's perfectly acceptable too.

Sometimes, however, the data to be interpolated doesn't come neatly pre-packaged in separate variables that are easy to intersperse between the formats. For example, the data might be a list returned by a subroutine call (get_info($next_person)) or might be stored in a hash ( %person{« name biog stat comm »} ). In such cases it's a nuisance to have to tease that data out into separate variables (or hash accesses) and then sprinkle them through the formats:

    print form
        "Name:                                                  ",
        "  {[[[[[[[[[[[[}                                       ",%person{name},
        "                  Biography:                           ",
        "Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}",%person{biog},
        "  {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",%person{stat},
        "                    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}", 
        "Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",
        "  {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}",%person{comm};

So form has an option that lets us put a single, multi-line format at the start of the argument list, place all the data together after it, and have that data automatically interleaved as necessary. Not surprisingly, that option is: :interleave. It's normally used in conjunction with a heredoc, since that's the easiest way to specify a multi-line string in Perl:

    print form :interleave, <<'EOFORMAT',
           Name:                                                 
             {[[[[[[[[[[[[}                                      
                             Biography:                          
           Status:             {<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<}
             {[[[[[[[[[[[[}    {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
                               {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
           Comments:           {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
             {[[[[[[[[[[[}     {VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV}
           EOFORMAT
         %person{E<laquo> name biog stat comm E<raquo>}

When :interleave is in effect, form grabs the first string argument it's passed and breaks that argument up into individual lines. It treats those individual lines as a series of distinct formats and grabs as many of the remaining arguments as are required to provide data for each format.

Of course, in this example we're also taking advantage of the new indenting behaviour of heredocs. The "Name:", "Status:", and "Comments:" titles are actually at the very beginning of their respective lines, because the start of a Perl 6 heredoc terminator marks the left margin of the entire heredoc string.

Would they were multitudes...

It's important to point out that, even when we're using form's default non-interleaving behaviour, it's still okay to use a format that spans multiple lines. There is however a significant (and useful) difference in behaviour between the two alternatives.

The normal behaviour of form is to take each format string, fill in each field in the format with a substring from the corresponding data source, and then repeat that process until all the data sources have been exhausted. Which means that a multi-line format like this:

    print form
         <<'EOFORMAT',
            Name:    {[[[[[[[[[[[[[[[}   Role: {[[[[[[[[[[}
            Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
            _______________________________________________
            EOFORMAT
         @names, @roles, @addresses;

would normally produce this:

    Name:    King Lear           Role: Protagonist 
    Address: The Cliffs, Dover                     
    _______________________________________________
    Name:    The Three Witches   Role: Plot devices
    Address: Dismal Forest, Scotland               
    _______________________________________________
    Name:    Iago                Role: Villain     
    Address: Casa d'Otello, Venezia               
    _______________________________________________

because the entire three-line format is repeatedly filled in as a single unit, line-by-line and datum-by-datum.

On the other hand, if we tell form that it's supposed to automatically interleave the data coming after the format, like so:

    print form :interleave,
         <<'EOFORMAT',
            Name:    {[[[[[[[[[[[[[[[}   Role: {[[[[[[[[[[}
            Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
            _______________________________________________
            EOFORMAT
         @names, @roles, @addresses;

then the call produces:

    Name:    King Lear           Role: Protagonist 
    Name:    The Three Witches   Role: Plot devices
    Name:    Iago                Role: Villain     
    Address: The Cliffs, Dover                     
    Address: Dismal Forest, Scotland               
    Address: Casa d'Otello, Venezia               
    _______________________________________________

because that second version is really equivalent to:

    print form
         "Name:    {[[[[[[[[[[[[[[[}   Role: {[[[[[[[[[[}",
                   @names,                   @roles,
         "Address: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
                   @addresses,
         "_______________________________________________";

That's not much use in this particular example, but it was exactly what was needed for the biography example earlier. It's just a matter of choosing the right type of data placement to achieve the particular effect we want.

Lay out. Lay out.

As we saw earlier, with follow-on fields and overflow fields, form is perfectly happy to have several fields in a single format that are all fed by the same data source. For example:

    print form
        "{[[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[[]]]]]]]]]]}",
             $soliloquy,             $soliloquy,              $soliloquy;

In fact, that kind of format is particularly useful for creating multi-column outputs (like newspaper columns, for example).

But a small quandry arises. In what order should form fill in these fields? Should the data be formatted down the page, filling each column completely before starting the next (and therefore potentially leaving the last column "short"):

    Now is the winter  of   torious  wreaths;   /   front; / And now, in-
    our discontent / Made   Our bruised arms hung   stead of mounting ba-
    glorious  summer   by   up for  monuments;  /   rded steeds / To fri-
    this sun of  York;  /   Our stern alarums ch-   ght the souls of fea-
    And  all  the  clouds   anged to merry meeti-   rful  adversaries,  /
    that lour'd upon  our   ngs, /  Our  dreadful   He capers nimbly in a
    house / In  the  deep   marches to delightful   lady's chamber.
    bosom  of  the  ocean   measures.   /   Grim-   
    buried. / Now are our   visaged war hath smo-   
    brows bound with vic-   oth'd  his   wrinkled   

Or should the data be run line-by-line across all three columns (the way a Perl 5 format does it), filling one line completely before starting the next:

    Now is the winter  of   our discontent / Made   glorious  summer   by
    this sun of  York;  /   And  all  the  clouds   that lour'd upon  our
    house / In  the  deep   bosom  of  the  ocean   buried. / Now are our
    brows bound with vic-   torious  wreaths;   /   Our bruised arms hung
    up for  monuments;  /   Our stern alarums ch-   anged to merry meeti-
    ngs, /  Our  dreadful   marches to delightful   measures.   /   Grim-
    visaged war hath smo-   oth'd  his   wrinkled   front; / And now, in-
    stead of mounting ba-   rded steeds / To fri-   ght the souls of fea-
    rful  adversaries,  /   He capers nimbly in a   lady's chamber.

Or should the text run down the columns, but in such a way as to leave those columns as evenly balanced in length as possible:

    Now is the winter  of   brows bound with vic-   visaged war hath smo-
    our discontent / Made   torious  wreaths;   /   oth'd  his   wrinkled
    glorious  summer   by   Our bruised arms hung   front; / And now, in-
    this sun of  York;  /   up for  monuments;  /   stead of mounting ba-
    And  all  the  clouds   Our stern alarums ch-   rded steeds / To fri-
    that lour'd upon  our   anged to merry meeti-   ght the souls of fea-
    house / In  the  deep   ngs, /  Our  dreadful   rful  adversaries,  /
    bosom  of  the  ocean   marches to delightful   He capers nimbly in a
    buried. / Now are our   measures.   /   Grim-   lady's chamber.

Well, of course, there's no "right" answer to that; it depends entirely on what kind of effect we're trying to achieve.

The first approach (i.e. lay out the text down each column first) works well if we're formatting a news-column, or a report, or a description of some kind. The second (i.e. lay out the text across each line first), is excellent for putting diagrams or call-outs in the middle of a piece of text (as we did for Mrs Miggins). The third approach (i.e. lay out the data downwards but balance the columns) is best for presenting a single list of data in multiple columns – like ls does.

So we need an option with which to tell form which of these useful alternatives we want for a particular format. That option is named :layout and can take one of three string values: "down", "across", or "balanced". So, for example, to produce three versions of Richard III's famous monologue in the order shown above, we'd use:

    print form :layoutE<laquo>downE<raquo>,
        "{[[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[[]]]]]]]]]]}",
             $soliloquy,             $soliloquy,              $soliloquy;

then:

    print form :layoutE<laquo>acrossE<raquo>,
        "{[[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[[]]]]]]]]]]}",
             $soliloquy,             $soliloquy,              $soliloquy;

then:

    print form :layoutE<laquo>balancedE<raquo>,
        "{[[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[]]]]]]]]]]E<hellip>}   {E<hellip>[[[[[[[[]]]]]]]]]]}",
             $soliloquy,             $soliloquy,              $soliloquy;

By the way, the default value for the :layout option is "balanced" since formatting regular columns of data is more common than formatting news or advertising inserts.

For the table, sir, it shall be served...

The :layout option controls one other form of inter-column formatting: tabular layout.

So far, all the examples of tables we've created (for example, our normalized scores) lined up nicely. But that was only because each item in each row happened to take the same number of lines (typically just one). So, a table generator like this:

    my @play = map {"$_\r"}  ( "Othello", "Richard III", "Hamlet"   );
    my @name = map {"$_\r"}  ( "Iago",    "Henry",       "Claudius" );

    print form 
         "Character       Appears in  ",
         "____________    ____________",
         "{[[[[[[[[[[}    {[[[[[[[[[[}",
          @name,          @play;

correctly produces:

    Character       Appears in
    ____________    ____________
    Iago            Othello     

    Henry           Richard III

    Claudius        Hamlet      

Note that we appended "\r" to each element to add an extra newline after each entry in the table. We can't use "\n" to specify a line-break within an array element, because form uses "\n" as an "end-of-element" marker. So, to allow line breaks within a single element of an array datum, form treats "\r" as "end-of-line-but-not-end-of-element" (somewhat like Perl 5's format does).

However, if we were to use the full titles for each character and each play:

    my @play = map {"$_\r"}  ( "Othello, The Moor of Venice",
                               "The Life and Death of King Richard III",
                               "Hamlet, Prince of Denmark",
                             );

    my @name = map {"$_\r"}  ( "Iago",
                               "Henry,\rEarl of Richmond",
                               "Claudius,\rKing of Denmark",
                             );

the same formatter would produce:

    Character       Appears in
    ____________    ____________
    Iago            Othello, The
                    Moor of     
    Henry,          Venice      
    Earl of
    Richmond        The Life and         
                    Death of    
    Claudius,       King Richard
    King of         III         
    Denmark         
                    Hamlet,     
                    Prince of   
                    Denmark     

The problem is that the two block fields we're using just grab all the data from each array and format it independently into each column. Usually that's fine because the columns are independent (as we've previously seen).

But in a table, the data in each column specifically relates to data in other columns, so corresponding elements from the column's data arrays ought to remain vertically aligned. To achieve this, we simply tell form that the data in the various columns should be laid out like a table:

    print form :layoutE<laquo>tabularE<raquo>,
         "Character       Appears in  ",
         "____________    ____________",
         "{[[[[[[[[[[}    {[[[[[[[[[[}",
          @name,          @play;

which then produces the desired result:

    Character       Appears in
    ____________    ____________
    Iago            Othello, The
                    Moor of     
                    Venice      

    Henry,          The Life and
    Earl of         Death of    
    Richmond        King Richard
                    III         

    Claudius,       Hamlet,     
    King of         Prince of   
    Denmark         Denmark     

Give him line and scope...

Sometimes we want to use a particular option or combination of options in every call we make to form. Or, more likely, in every call we make within a specific scope. For example, we might wish to default to a different line-breaking algorithm everywhere, or we might want to make repeated use of a new type of field specifier, or we might want to reset the standard page length from a printable 60 to a screenable 24.

Normally in Perl 6, if we wanted to preset a particular optional argument we'd simply make an assumption:

    my &down_form := &form.assuming(:layoutE<laquo>downE<raquo>);

But, of course, form collects all of its arguments in a single slurpy array, so it doesn't actually have a $layout parameter that we can prebind.

Fortunately, the .assuming method is smart enough to recognize when it being applied to a subroutine whose arguments are slurped. In such cases, it just prepends any prebound arguments to the resulting subroutine's argument list. That is, the binding of down_form shown above is equivalent to:

    my &down_form :=
        sub (FormArgs *@args is context(Scalar)) returns Str {
            return form( :layoutE<laquo>downE<raquo>, *@args );
        };

This was your default...

form provides one other mechanism by which options can be prebound. To use it, we (re-)load the Form module with an explicit argument list:

    use Form :layoutE<laquo>downE<raquo>, :locale, :interleave;

This causes the module to export a modified version of form in which the specified options are prebound. That modified version of form is exported lexically, and so form only has the specified defaults preset for the scope in which the use Form statement appears.

These default options are handy if we have a series of calls to form that all need some consistent non-standard behaviour. For example:

    use Form :layoutE<laquo>acrossE<raquo>,
             :interleave,
             :page{ :header("Draft $(localtime)\n\n") };

    print form $introduction_format, *@introduction_data;

    for @sections -> $format, @data {
        print form $format, *@data;
    }

    print form $conclusion_format, *@conclusion_data;

Another use is to set up a fixed formatting string into which different data is to be interpolated (much in the way Perl 5 formats are typically used). For example, we might want a standard format for errors in a CATCH block:

    CATCH {
        use Form :interleave, <<EOFORMAT;
                     Error {<<<<<<<}: {[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[}
                     ___________________________________________________
                     EOFORMAT

        when /Missing datum/ { warn form "EMISSDAT", $_.msg }
        when /too large/     { warn form "ETOOBIG",  $_.msg }
        when .core           { warn form "EINTERN",  "Internal error" }
        default              { warn form "EUNKNOWN", "Seek help" }
    }

And welcome to the wide fields...

All the fields we've seen so far have been exactly as wide as their specifications. That's the whole point of having fields – they allow us to lay out formats "by eye".

But form also allows us to specify field widths in other ways. And better yet, to avoid specifying them at all and let form work out how big they should be.

The measure then of one is easily told.

When specific field widths are required (perhaps by some design document or data formatting protocol) laying out wide fields can be error-prone. For example, most people can't visually distinguish between a 52-column field and a 53-column field and are therefore forced to manually verify the width of the corresponding field specifier in some way.

When such fields are part of a larger format, errors like that can easily result in a call to form producing, say, 81-column lines. That would merely be messy if the extra characters wrapped, but could be disasterous if they happened to be chopped instead. Suppose, for example, that the last 4 columns of output contain nuclear reactor core temperatures and then consider the difference between an apparently normal reading of 567 Celsius and what might actually be happening if the reading were in fact a truncated 5678 Celsius.

To catch mistakes of this kind, fields can be specified with an embedded integer in parentheses (with optional whitespace inside the parens). For example:

    print form '{[[[( 15 )[[[[} {<<<<<(17)<<<<<<}  {]]](14)]]].[[}',
               *@data;

The integer in the parentheses acts like a checksum. Its value must be identical to the actual width of the field (including the delimiting braces and the embedded integer itself). Otherwise an exception is thrown. For instance, running the above example produces the error message:

    Inconsistent width for field 3.
    Specified as '{]]](14)]]].[[}' but actual width is 15
    in call to &form at demo.pl line 1

Numeric fields can be given a decimal checksum, which then also specifies their number of decimal places.

    print form
        '{[[[( 15 )[[[[} {<<<<<(17)<<<<<<}  {]](14.2)]].[}',
         *@data;

Note that the digits before the decimal still indicate the total width of the field. So the {]](14.2)]].[} field in the above example means must be 14 columns wide, including 2 decimal places, in exactly the same way as a "%14.2f" specifier would in a sprintf.

What you will command me will I do...

Of course, in some instances it would be much more convenient if we could simply tell form that we want a particular field to be a particular width, instead of having to explicitly show it.

So there's another type of integer field annotation that, instead of acting like a checksum, acts like an...err..."tellsum". That is, we can tell form to ignore a field's physical width and instead insist that it be magically expanded (or shrunk) to a nominated width. Such a field is said to have an imperative width. The integer specifying the imperative width is placed in curly braces instead of parens.

For example, the format in the previous example could be specified imperatively as:

    print form
        '{[{15}[} {<{17}<<}  {]]]]{14.2}]]]].[[}',
         *@data;

Note that the actual width of any field becomes irrelevant if it contains an imperative width. The field will be condensed or expanded to the specified width, with subsequent fields pushed left or right accordingly.

Imperative fields disrupt the WYSIWYG layout of a format, so they're generally only used when the format itself is being generated programmatically. For example, when we were counting down the top ten reasons not to do one's English Lit homework, we used a fixed-width {>} field to format each number:

    for @reasons.kv -> $n, $reason {
        my $n = @reasons - $index ~ '.';
        print form "   {>}  {[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
                       $n,  $reason,
                   "";
    }

But, of course, there's not reason (theoretically, at least) why we couldn't find more than 99 reasons not to do our homework, in which case we'd overflow the {>} field.

So instead of limiting ourselves that way, we could just tell form to make the first field wide enough to enumerate however many reasons we come up with, like so:

    my $width = length(+@reasons)+1;

    for @reasons.kv -> $n, $reason {
        my $n = @reasons - $index ~ '.';
        print form "   {>>{$width}>>}  {[[[[[[[[[[[[[[[[[[[[[[[[[[[[}",
                       $n,             $reason,
                   "";
    }

By evaluating @reasons in a numeric context (+@reasons) we determine the number of reasons we have, and hence the largest number that need ever fit into the first field. Taking the length of that number (length(+@reasons)) gives us the number of digits in that largest number and hence the width of a field that can format that number. We add one extra column (for the dot we're appending to each number) and that's our required width. Then we just tell form to make the first field that wide ({>>{$width}>>}).

And every one shall share...

A special form of imperative width field is the starred field. A starred field is one that contains an imperative width specification in which the number is replaced by a single asterisk.

The width of a starred field is not fixed, but rather is computed during formatting. That width is whatever is required to cause the entire format to fill the current page width of the format (by default, 78 columns). Consider, for example:

    print form
        '{]]]]]]]]]]]]]]} {]]].[[}  {[[{*}[[}  ',
         @names,          @scores,  @comments;

The width of the starred comment field in this case is 49 columns – the default page width of 78 columns minus the 29 columns consumed by the fixed-width portions of the format (including the other two fields).

If a format contains two or more starred fields, the available space is shared equally between them. So, for example, to create two equal columns (say, to compare the contents of two files), we might use:

    print form 
         "{[[[[{*}[[[[}   {[[[[{*}[[[[}",
          slurp($file1),  slurp($file2);

And, yes, Perl 6 does have a built-in slurp function that takes a filename, opens the file, reads in the entire contents, and returns them as a single string. For more details see the Perl6::Slurp module (now on the CPAN).

There is one special case for starred fields: a starred verbatim field:

    {""""{*}""""}

It acts like any other starred field, growing according to the available space, except that it will never grow any wider than the widest line of the data it is formatting. For example, whereas a regular starred field:

    print form 
         '| {[[{*}[[} |',
            $monologue;

expands to the full page width:

    | Now is the winter of our discontent                           |
    | Made glorious summer by this sun of York;                     |
    | And all the clouds that lour'd upon our house                 |
    | In the deep bosom of the ocean buried.                        |
    | Now are our brows bound with victorious wreaths               |  
    | Our bruised arms hung up for monuments;                       |
    | Our stern alarums changed to merry meetings,                  |
    | Our dreadful marches to delightful measures.                  |
    | Grim-visaged war hath smooth'd his wrinkled front;            |
    | And now, instead of mounting barded steeds                    |  
    | To fright the souls of fearful adversaries,                   |  
    | He capers nimbly in a lady's chamber.                         |  

a starred verbatim field:

    print form 
         '| {""{*}""} |',
            $monologue;

only expands as much as is strictly necessary to accommodate the data:

    | Now is the winter of our discontent                |
    | Made glorious summer by this sun of York;          |
    | And all the clouds that lour'd upon our house      |
    | In the deep bosom of the ocean buried.             |
    | Now are our brows bound with victorious wreaths;   |  
    | Our bruised arms hung up for monuments;            |
    | Our stern alarums changed to merry meetings,       |
    | Our dreadful marches to delightful measures.       |
    | Grim-visaged war hath smooth'd his wrinkled front; |
    | And now, instead of mounting barded steeds         |  
    | To fright the souls of fearful adversaries,        |  
    | He capers nimbly in a lady's chamber.              |  

That we our largest bounty may extend...

By now you've probably noticed that there is quite a large overlap between the functionality of form and that of (s)printf. For example, the call:

    for @procs {
        print form
            "{>>>}  {<<<<<<<(20)<<<<<<<}  {>>>>>>}  {>>.}%",
            .{pid}, .{cmd},               .{time},  .{cpu};
    }

has approximately the same effect as the call:

    for @procs {
        printf "%5d  %-20s  %8s  %5.1f%%\n",
               .{pid}, .{cmd}, .{time}, .{cpu};
    }

One is more WYSIWYG, the other more concise, but (placed in a suitable loop), they would both print out lines like these:

     2461  vi -ii henry           0:55.83   11.6%
     2395  ex cathedra            0:06.59    3.5%
     2439  head anne.boleyn       0:00.18    0.1%
     2581  dig -short grave       0:01.04    0.0%

There is, however, a crucial difference between these two formatting facilities; one that only shows up when one of our processes runs over 99 hours. For example, suppose our browser has been running continuously for a few months (or, more precisely, for 1214:23.75 hours). Then the calls to printf would print:

     2461  vi -ii henry           0:55.83   11.6%
     2395  ex cathedra            0:06.59    3.5%
    27384  lynx www.divorce.com  1214:23.75    0.8%
     2439  head anne.boleyn       0:00.18    0.1%
     2581  dig -short grave       0:01.04    0.0%

whilst the calls to form would print:

     2461  vi -ii henry           0:55.83   11.6%
     2395  ex cathedra            0:06.59    3.5%
    27384  lynx www.divorce.com  1214:23-    0.8%
     2439  head anne.boleyn       0:00.18    0.1%
     2581  dig -short grave       0:01.04    0.0%

In other words, field widths in a printf represent minimal spacing (even if that throws off the overall layout), whereas field widths in a form represent guaranteed spacing (even if that truncates some of the data).

Of course, in a situation like this – where we knew that the data might not fit and we didn't want it truncated – we could use a block field instead:

    for @procs {
        print form
            "{>>>}  {<<<<<<<(19)<<<<<<}  {]]]]]]}  {>>.%}",
            .{pid}, .{cmd},              .{time},  .{cpu};
    }

in which case we'd get:

     2461  vi -ii henry           0:55.83   11.6%
     2395  ex cathedra            0:06.59    3.5%
    27384  lynx www.divorce.com  1214:23-    0.8%
                                      .75
     2439  head anne.boleyn       0:00.18    0.1%
     2581  dig -short grave       0:01.04    0.0%

That preserves the data, but the results are still ugly, and it also requires some fancy footwork – making the percentage sign part of the field specification, as if it were