[% setvar title Improved Module Versioning And Searching %]
|Note: these documents may be out of date. Do not use as reference!|
To see what is currently happening visit http://www.perl6.org/
Improved Module Versioning And Searching
Maintainer: Steve Simmons <email@example.com> Date: 8 Aug 2000 Mailing List: firstname.lastname@example.org Number: 78 Version: 1 Status: Developing
Modern production systems may have many versions of different modules in use simultaneously. Workarounds are possible, but they lead to a vast spaghetti of fragile installation webs. This proposal will attempt to redefine module versioning and its handling in a way that is fully upward compatible but solves the current problems.
An up-to-the-instant version of this RFC will be posted as HTML
www.nnaf.net as soon as I
know the RFC number.
There are several classes of problem with the current module versioning and searching, which will be discussed separately. The solutions proposed overlap, and are discussed in IMPLEMENTATION below.
These problems are ones in which I would go so far as to say that
the current (
perl5) performance is actually broken.
Currently a statement
use foo 1.2;
perl to search
@INC until it finds a
or exhausts the search path. When a
is checked against the one requested in the
foo.pm version is less than 1.2,
gives an error message and halts the compilation.
A satisfactory version may exist elsewhere in
@INC, but it is
not searched for.
I believe that when a programmer writes
`Do What I Mean' should find the newest version of the module present on the
@INC search path.
Instead, the very first
module.pm file found is taken,
regardless of the presence of others on the path.
Deployment of perl modules in high-reliability or widely shared
environments often requires multiple versions of modules installed
(Comments `but that's a bad idea' will be cheerfully ignored --
if I could control what other departments need, I would).
This leads to an endless proliferation of
use lib directories
and ever-more-pervasive `silos of development.'
Part of the problem is the limitations of the current system in
how modules are versioned and how
perl decides which version
to load. In worst case, code such as
use lib '/path/to/department/module/versionX'; use module ; # To get version X for sure no use lib '/path/to/department/module/versionX';
has been found in production equipment. Why does such bogosity occur? It's an attempt to solve both the above problems and the deployment issues which follow below.
Working tools persist. An application which does its job well will live as long as the problem it addresses. This means old code may continue running for a long time.
perl itself, most sites solve this problem by having
the perl invocation include versioning:
The indicated version will likely remain installed and stable as long as the script which uses it and the platform on which that script runs.
The proliferation and increasing use of modules is generally a
good thing. However, installation of new modules can and sometimes
does break existing scripts. Workarounds for this problem are
cumbersome at best, and we have existence proofs in other languages
that this can be handled better (notably
tcl, but there are
Mission-critical scripts often need to have a final test pass by releasing experimental versions onto productions systems alongside the production systems.
The inflexibility of perl module versioning also contributes to difficulties in releasing systems for test. A new script may require significant changes to internals of one or more supporting modules. The changes need not be visible to existing scripts; if bugs are introduced then previously working systems may change or break in obscure ways.
Ideally, there would be a mechanism by which
newscript could be released
simultaneously with an appropriate version of
while the previous version
remains in place for older code.
A more flexible mechanism for module version specification
and searching can fix the problem.
I believe that relatively simple changes can be made to the version identification and module installation systems which will solve all the above problems. In addition, those changes should be largely upward compatible from current functioning; and if needed could be made 100% compatible.
Several changes, working together, should provide the flexibility needed to solve all the stated problems and deficiencies:
Clarification on how version numbers are formed (largely done
Well-defined rules for version number comparison.
Extensions to the
use module version syntax to support
better specification of version numbers.
A modification to the module installation mechanism to make
version numbers more immediately recognizable without requiring
Modifications to the
@INC path searching rules to reflect the
changes in numbers 1-4 above.
We believe that most if not all of these changes can be made without requiring a change either to older scripts, existing modules, or items already in CPAN. New scripts and new modules should be able to take advantage of the changes with relatively minimal changes.
In brief, I propose the installation method for modules as provided
perl Makefile.PL be changed such that version numbers appear
in the path of the module being installed. This would require that
Makefile.PL support functions open the module, extract the
$VERSION (if any), and use that to build the pathnames
to install the module.
This change has two huge wins:
Authors of modules would have to do literally nothing to use the new mechanism.
Having the version numbers embedded in the path means they could be
reliably determined without having to actually
open and parse each candidate
Programs which request versions in their
use module statements
would be compiled with the ``best fit'' commensurate with their
request and with the request of other modules.
Note that it may not be possible to satisfy conflicting requests. If
A and module
B demand two different versions of the same
C, the compiler should halt and state the module conflicts.
``Best fit'' cannot reliably be determined without examining all
the secondary modules required as a consequence of using some
lower-level module and without processing the
use lib, etc.
Thus the compiler might have to examine the internals of a number
of versions of some modules before choosing which to use.
But it would not have to do a full parse of those modules,
and the section on Possible Optimization - Indexes suggests
some further wins.
There are a variety of mechanisms which could embed the version number into the path name. This RFC does not strongly favor any one over any other. It does have some general suggestions, but is not imposing a particular solution.
Here are some guidelines for choosing a naming system:
.pmfilename extension should be preserved. Thus it is probably better to embed the version number into the file name or a directory name immediately above it on the search path.
foo-1.0.pm) and the setting in the
perl6should issue a compile-time error which includes the full path to the module and the internal
VERSIONnumber. No recovery should be done.
Here are some possible implementations:
foo.pmfile would currently be installed, replace it with a
foo.pmdirectory. In that directory, versioned modules would be installed as
foo-version.pm, and versionless as
foo.pm. or as
foo.pmdirectory as above, but populate it with subdirectories for each installed version. A
foo.pmfile containing the module code would reside in that directory. Versionless modules could be installed into the
foo.pmdirectory rather than in a subdirectory. This mechanisms has some possible wins should it be appropriate to support simultaneous load of multiple versions.
A detailed definition of a version number appears immediately below.
It is my belief that this definition and usage is an upward extension
perl performance; and therefore simple (current) use
of version numbers should work without requiring script code change
under this proposal.
Note we are not requiring version numbers, just specifying format and comparison rules.
Version numbers consist of one or more version levels separated by dots.
Each version level must consist of a non-negative number expressed
as a series of the digits, ie
The first version level and first dot are required.
There is no limit to how many levels a version may have.
If a version number ends in a dot, a final level of
0 is assumed.
Leading zeros are allowed in level numbers, but are ignored.
If a version level contains leading zeros, those zeros will be
stripped in all cases except for version(s)
Trailing zeros in version numbers, whether explicit or implied by a final dot, are trimmed from the version number internally when deriving paths. See above for pathname deriving.
The following example shows some valid and invalid version numbers
use foo 1.; # Valid, means '1.0' use foo 01.; # Valid, means '1.0' use foo 1.1; # Valid, means '1.1' use foo 1.01; # Valid, means '1.1' use foo 1.01.; # Valid, means '1.1.0' use foo 1.1e; # Invalid, has non-digit use foo .1; # Invalid, must start with explicit level use foo 0.1; # Valid, means '0.1' use foo 1; # Invalid, must have at least one dot use foo 1.-1; # Invalid, no negative numbers (and not a digit) use foo ; # Valid, means no version specified
Invalid version numbers cause a compile-time error on the module.
The existing version request syntax is:
use module [ version ] [ qw(func1 func2 func3)] ;
Currently version is a single
perl-style version number (whatever
the heck that means). I propose we extend the allowable forms to allow
ranges, lists, limits, and version limiting.
Doing this properly requires
some well-defined mechanisms for comparing disparate version numbers.
Version numbers may appear in the
use statement of
and in the
VERSION statement of a
They may either be quoted strings or barewords.
Usage in any other circumstance is not treated as a version number,
but rather the appropriate
perl construct for the circumstances.
If a bareword, it is almost certainly an error.
I believe that this usage is consistent with current
A program can specify a list of versions in no-preference order by listing them separated by whitespace:
use foo 1.0 1.1 1.3; use foo 1.3 1.1 1.0;
These two requests are effectively identical, with the
compiler accepting any version of
beginning with 1.0, 1.1 or 1.3.
A program can specify a list of versions in preference order by adding commas:
use foo 1.0, 1.1, 1.3; use foo 1.3, 1.1, 1.0;
In these cases the compiler can proceed if any of the three versions are available. In the first case some version 1.0 is preferred, the the second 1.3 is preferred.
The whitespace following the commas is optional.
In cases where there are requests for two different versions of
foo, both of which were first in the request orders,
the highest-level module (closest to the original script) shall
If both requests are at the same level offset from the original script, the first requester shall win.
A program can indicate a minimum, maximum, exact, and super-exact version it will accept. The following syntax handles these requests:
use foo <1.2;
indicates that any version prior to 1.2 is acceptable. This would mean any version with 1.1 or less for its first two levels.
use foo <=1.2;
indicates that any version prior to 1.2 is acceptable. This would mean any version with 1.2 or less for its first two levels.
use foo >1.2
indicates that any version greater than 1.2 is acceptable. This would mean any version with 1.3 or more for its first two levels.
use foo >=1.2
indicates that any version greater than or equal to 1.2 is acceptable. This would mean any version with has 1.2 or more as the first two levels.
use foo =1.2
indicates that only versions which begins with 1.2 are acceptable.
These may be further tightened by ending the version number in a period. The period forces the rest of the version levels to always be treated at zeros. Thus the form
use foo =1.2.
indicates that only version 1.2.0 is acceptable, not 184.108.40.206...1.
A program should be able to use two version numbers to indicate a range of acceptable version numbers. The separator between the two ranges indicates preference order, with
meaning the right hand side is preferred, and
meaning the left hand side is preferred.
No whitespace is allowed between the separator and the version number; reasons for this will become apparent in the sections on complex lists.
Examples of ranges:
use foo 1.1-1.4
means that any version which begins with 1 and has a 1.4 as the second level is acceptable.
use foo 1.1<1.4
means that any version which begins with 1 and has a 1.4 as the second level is acceptable, with preference given to the highest version in the range.
use foo 1.1>1.4
means that any version which begins with 1 and has a 1.4 as the second level is acceptable, with reference given to the lowest version in the range.
A terminating dot may be used as well, so that
use foo 1.1-1.4.
means that any version 1.1 through 1.3 is acceptable, but the only acceptable 1.4 version is 1.4.0.
Lists and ranges may be combined in arbitrary ways to make complex preference sets. Thus
use foo 1.5 1.0-1.3;
means that any version 1.0, 1.1, 1.2, 1.3 or 1.5 is acceptable, without preference order. By contrast,
use foo 1.5, 1.0-1.3, 1.4;
means that 1.5 is preferred, then anything in the 1.0 to 1.3 range, then 1.4.
It has been suggested that
globs or even full-bore regular
expressions be allowed for version specification. It has not
been included for the following reasons:
use foo 1.
and indicate that there was some ordering to the sub-version. I
am concerned that people would naively expect that the Do What I
Mean principle would cause
perl to assume that the following
use foo 1.
is equivalent to
use foo 1.2, 1.0, 1.3
on the naive theory that the two regexps look different so they should do something different.
globshave been suggested. Having more than one mechanism for this would intermediate
perlprogrammers to assume that there was some subtle difference between the two.
Since regexps and
globs bring little additional utility and
introduce possible confusion, I have chosen not to put them in
When we permit modules to request only certain versions of
other modules, we will find cases where no version of
foo is acceptable to all modules which wish to use it.
In such as case, the compiler should give up with an error
message stating that due to conflicting version requests,
foo could not be loaded. This could
become The Error Message From Hell if sufficient detail was
A utility (perl module?!) should be provided which
would recursively examine the
use lines a perl script
and the system configuration and produces the appropriately
voluminous output report.
Modules load modules load modules, ad nauseum. It is quite possible that two or more different modules will request some other module. If there is only one version which satisfies all the requests, we don't have a problem.
If there is more than one version acceptable to all callers, we choose which to use based on the following rules:
If no preference was expressed, first acceptable version that was found is used.
If a preference was expressed, highest preference is given to the requests which come from the original script.
If no request came from the original script, highest preference is given to second-level requesters. If there is more than one second-level requester, the first requesters preferences are used. If there are no second-level requester, the third level is used, and so on.
There were four problems identified with the current system. Implementation of this proposal solves those problems as follows:
use foo 1.2;
perl to search
@INC until it finds the first
foo.pm file of version 1.2 or greater.
I believe that when a programmer writes
`Do What I Mean' should find the newest version of the module present
The current proposal does not cause this to occur,
and thereby permits backwards-compatible behavior.
However, the programmer can now write
use module >=0.0;
and accept any module, but give preference to the highest.
This proposal does not prevent new modules from breaking existing scripts. It does, however, permit those scripts to be repaired by the simple change of locking the script to the acceptable version(s) of the module. This is often significantly easier than updating the script, and avoids the possibility of introducing new bugs either due to modifications of the script or from bugs in the newer module.
In mission-critical environments, production versions of scripts could always be released to a version range of a module, reflecting the ranges of the module it had been tested and known to work against:
use foo 2.0-2.1; # Accept any 2.0 or 2.1 version
When a new version of
foo.pm needs to be rolled out with the
new version of the script, the version of
foo.pm could be
set to 2.2 and the new script released with
use foo =2.2; # Accept any 2.2 version
Now new and old versions of script and module can be released with no impact on existing production software. When it is decided that the new versions should become the standard versions, the new script is copied over the old and the modules are not touched.
A possible problem introduced with this proposal is an even greater increase in the amount of searching of directories that must be done. This is an often-expensive process, and can have a serious impact when even small scripts are run tens of thousands of times per day (as ours do).
This could be resolved by adding some sort of simple index
files to the installation tree. The index files could simply
be a list of all the files (pathnames) found under this particular
branch of the file tree. Since those pathnames would contain
the version numbers, examining any index file would be sufficient
for determining what versions lay where. If later uses of
use lib chose a subset of that tree, the index data would
already be present. In an ideal situation, only one or two
index files might be all that is needed to find all versions
of all modules.
More complex indexes could be built which might include all the
dependency information in a manner not dissimilar to the output
Reliability would best be done by implementing index construction as an automatic part of module install. As above, this could be automated such that neither the module developer nor the system administrator would have to worry about it; the process would still be:
perl6 Makefile.PM make make install
make install updates the indexes. At that point it
is not clear to mean that such an index is required; in its
perl6 should simply search the
as it does now.
There is nothing in this proposal which could not be implemented in perl5.X, and it would probably be a Good Thing if such were done.
Some languages allow multiple versions of a module to be loaded
simultaneously. It is my opinion that In This Way Lies Madness, but
perl has done stranger things. Should we decide to allow this,
incorporating the version number into the namespace would allow the
Let us suppose that module
foo requires module
baz requires module
bar 2.0. Also assume
that both versions of
bar provide a
op function. Then
these two modules could do
module foo.pm module baz.pm use bar =1.0.; use bar =2.0. ; # Uses bar1.0::op # Uses bar2.0::op bar::op(); bar::op();
and each time get the appropriate invocation of
baz both create a
the object should blessed into the appropriate version of
my $handle1 = new foo ; my $handle2 = new bar ; my $var1 = $handle1->op(); # Always gets op 1.0 my $var2 = $handle2->op(); # Always gets op 2.0
always invokes the appropriate version of
While I'm not seriously suggesting this dual loading be allowed, it should at least be considered by the folks who know more about objects than I do. Note, though, that this kind of feature might prove to be invaluable in testing new versions of modules. With appropriate aliasing added, a test script could do
use foo <3.0 as foo_old; use foo =3.0. as new_foo; # do something with foo_old # do identical things with foo_new # compare results
Again, I'm not seriously suggesting this feature. But if it comes up, all the module versioning rules above need to be revisited.
lorder(1) - Optimising .o file orders for UNIX loader ld(1)