Syntax-K

Know-How für Ihr Projekt

Perl Documentation

NAME

Unicode::Collate - Unicode Collation Algorithm

SYNOPSIS

use Unicode::Collate;
#construct
$Collator = Unicode::Collate->new(%tailoring);
#sort
@sorted = $Collator->sort(@not_sorted);
#compare
$result = $Collator->cmp($a, $b); # returns 1, 0, or -1.

Note: Strings in @not_sorted, $a and $b are interpreted according to Perl's Unicode support. See perlunicode, perluniintro, perlunitut, perlunifaq, utf8. Otherwise you can use preprocess or should decode them before.

DESCRIPTION

This module is an implementation of Unicode Technical Standard #10 (a.k.a. UTS #10) - Unicode Collation Algorithm (a.k.a. UCA).

Constructor and Tailoring

The new method returns a collator object. If new() is called with no parameters, the collator should do the default collation.

$Collator = Unicode::Collate->new(
   UCA_Version => $UCA_Version,
   alternate => $alternate, # alias for 'variable'
   backwards => $levelNumber, # or \@levelNumbers
   entry => $element,
   hangul_terminator => $term_primary_weight,
   highestFFFF => $bool,
   identical => $bool,
   ignoreName => qr/$ignoreName/,
   ignoreChar => qr/$ignoreChar/,
   ignore_level2 => $bool,
   katakana_before_hiragana => $bool,
   level => $collationLevel,
   long_contraction => $bool,
   minimalFFFE => $bool,
   normalization  => $normalization_form,
   overrideCJK => \&overrideCJK,
   overrideHangul => \&overrideHangul,
   preprocess => \&preprocess,
   rearrange => \@charList,
   rewrite => \&rewrite,
   suppress => \@charList,
   table => $filename,
   undefName => qr/$undefName/,
   undefChar => qr/$undefChar/,
   upper_before_lower => $bool,
   variable => $variable,
);

Methods for Collation

Methods for Searching

The match, gmatch, subst, gsubst methods work like m//, m//g, s///, s///g, respectively, but they are not aware of any pattern, but only a literal substring.

DISCLAIMER: If preprocess or normalization parameter is true for $Collator, calling these methods (index, match, gmatch, subst, gsubst) is croaked, as the position and the length might differ from those on the specified string.

rearrange and hangul_terminator parameters are neglected. katakana_before_hiragana and upper_before_lower don't affect matching and searching, as it doesn't matter whether greater or less.

Other Methods

EXPORT

No method will be exported.

INSTALL

Though this module can be used without any table file, to use this module easily, it is recommended to install a table file in the UCA format, by copying it under the directory <a place in @INC>/Unicode/Collate.

The most preferable one is "The Default Unicode Collation Element Table" (aka DUCET), available from the Unicode Consortium's website:

http://www.unicode.org/Public/UCA/
http://www.unicode.org/Public/UCA/latest/allkeys.txt (latest version)

If DUCET is not installed, it is recommended to copy the file from http://www.unicode.org/Public/UCA/latest/allkeys.txt to <a place in @INC>/Unicode/Collate/allkeys.txt manually.

CAVEATS

AUTHOR, COPYRIGHT AND LICENSE

The Unicode::Collate module for perl was written by SADAHIRO Tomoyuki, <SADAHIRO@cpan.org>. This module is Copyright(C) 2001-2014, SADAHIRO Tomoyuki. Japan. All rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The file Unicode/Collate/allkeys.txt was copied verbatim from http://www.unicode.org/Public/UCA/6.3.0/allkeys.txt. For this file, Copyright (c) 2001-2012 Unicode, Inc. Distributed under the Terms of Use in http://www.unicode.org/copyright.html.

SEE ALSO