[% setvar title BiDirectional Support in PERL %]

This file is part of the Perl 6 Archive

Note: these documents may be out of date. Do not use as reference!

To see what is currently happening visit http://www.perl6.org/


BiDirectional Support in PERL


  Maintainer: Roman M. Parparov <romm@empire.tau.ac.il>
  Date: 6 Aug 2000
  Mailing List: perl6-language@perl.org
  Number: 50
  Version: 1
  Status: Developing


This paper proposes an RFC regarding the BiDirectional input and output support which would allow easier correct treatment for the languages with RTL (Right-To-Left) writing direction, such as Hebrew, Arabic and others. Such a support would mean a breakthrough in preparing all kinds of applications of users that need bidirectional text.


Concepts of Bidirectional Support

The aspects of RTL support in general were outlined in the BDL ECMA document[1]. Since the bidirectional input issue is much more complicated than output and requires deep support of other applications than PERL, this document will only deal with the string output and string processing, just like the ECMA do. We will assume that the BDL data is always being read just as it is being read from the regular input buffer.

Output of BDL strings in Perl6:

Based on the above, PERL 6 should implement an output routine like this:


FILEHANDLE is the FILEHANDLE of print. LIST is the LIST of 'print'. REPRESENTATION should be a value represented by a scalar - either 'v' for VISUAL, or 'l' for LOGICAL. LIST is a LIST of 'print'. DIRECTION is either 'rtl' or 'ltr' and is optional, 'rtl' being the default value.

OR enhance the existing print operator to


In this case the default value of REPRESENTATION is LOGICAL.

The output formats, if left, should be supported by RTL output.

Other outputting statements like "die", do not have to support RTL output.

Data processing of BDL strings in Perl6:

Regexp processing for the RTL strings is required. The 'r' char after the last slash would mean an RTL string.

Routine substr should be able to return subroutines cut from the right side from the RTL strings, including mixed texts.


The implementation of that function should be a thorough coding of the BiDirectional Languages ECMA spec mentioned above. A mixed combination of RTL and LTR languages is allowed in a text, and the rules of punctuation are not often obeyed, but we should just stick to the ECMA standard.


[1] Handling of the Bi-Directional Texts - ECMA TR/53 - ftp.ecma.ch ftp.ecma.ch