VMS Help  —  SORT  Specification File Qualifiers, /COLLATING_SEQUENCE
    Specifies the collating instructions for a sort or merge
    operation. With the /COLLATING_SEQUENCE qualifier, you can
    specify ASCII (the default), EBCDIC, or Multinational sequence;
    you can also define your own sequence.

    Formats

      /COLLATING_SEQUENCE=

      (SEQUENCE=sequence_type

      [,MODIFICATION=(character operator character)]

      [,IGNORE=character or character-range,...]

      [,FOLD]

      [,[NO]TIE_BREAK])

1  –  Qualifier Values

 SEQUENCE=sequence_type

 ASCII

    Specifies ASCII collating sequence, which is the default
    sequence.

 EBCDIC

    Arranges characters according to EBCDIC sequence. The characters
    remain in ASCII representation; only the order is changed.

 MULTINATIONAL

    Arranges characters according to Multinational sequence, which
    collates the international character set. When you use the
    Multinational sequence, characters are ordered according to the
    following rules:

    o  All diacritical forms of a character are given the collating
       value of the character (A',A",A` collate as A).

    o  Lowercase characters are given the collating value of their
       uppercase equivalents (a collates as A, a" collates as A").

    o  If two strings compare as equal, tie-breaking is performed.
       The strings are compared to detect differences due to
       diacritical marks, ignored characters, or characters that
       collate as equal although they are actually different. If the
       strings still compare as equal, another comparison is done
       based on the numeric codes of the characters. In this final
       comparison, lowercase characters are ordered before uppercase.

    Care should be taken when sorting or merging files for further
    processing using the Multinational sequence. Sequence checking
    procedures in most programming languages compare numeric
    characters. Because Multinational is based on actual graphic
    characters and not on the codes representing those characters,
    normal sequence checking does not work.

 user-defined-sequence

    Specifies a user-defined collating sequence. Define a collating
    sequence by specifying a string of single or double characters
    or ranges of single characters. (A double character is any set
    of two single characters collated as if they were one character.
    For example, "CH" can be defined to collate as "C".) This string
    should be enclosed in parentheses.

    You can also represent characters by their corresponding octal,
    decimal, or hexadecimal values using the radix operators: %O, %D,
    %X.

    You must observe the following rules when defining your collating
    sequence:

    o  Enclose characters in quotation marks (" ").

    o  Separate each character and character range with a comma, and
       enclose the entire list in parentheses.

    o  Give all the characters appearing in the character keys in
       the sort or merge operation a collating value. Any character
       not given a collating value will be ignored unless the FOLD or
       MODIFICATION options are specified.

    o  Do not define a character more than once.

    o  Do not specify the null character by using quotation marks
       (""). Instead, use a radix operator such as %X0.

    o  Specify quotation marks by enclosing them within another set
       of quotation marks ("" "") or by using a radix operator.

 MODIFICATION=(character operator character)

    Specifies a change to the collating sequence specified in the
    SEQUENCE option. You can modify the ASCII, EBCDIC, Multinational,
    or user-defined sequence. The sequence being modified must be
    specified with the SEQUENCE qualifier even if the sequence is the
    default (ASCII).

 character

    Specifies a character in the collating sequence. You can specify
    a single or double character. A double character is any set
    of two single characters collated as if they were a single
    character. Enclose the character in quotation marks.

 operator

    Specifies the operator used to compare the characters. You can
    specify greater than (>), less than (<), or equal to (=).

    These are the kinds of changes permitted in the MODIFICATION
    option:

    o  A single or double character can be equated to a single
       character that has already been assigned a collating value
       ("a"="A").

    o  A single or double character can collate after a single
       character that has already been assigned a collating value
       ("CH">"C").

    o  A single or double character can collate before a single
       character that has already been assigned a collating value
       ("D"<"A").

    o  A double character can be equated to a previously defined
       double character ("CH" = "SH").

    o  A single character can be equated to a double character
       sequence ("C" = "CH").

 IGNORE

    Specifies that Sort/Merge ignore a character or character range
    in the collating sequence when making an initial comparison.
    Note that, when tie-breaking takes place, Sort/Merge considers
    the characters specified with the IGNORE qualifier. Tie-breaking
    takes place when two or more strings have compared as equal and
    the Multinational sequence is being used or when two or more
    strings have compared as equal and the TIE_BREAK qualifier has
    been specified.

 FOLD

    Specifies that all lowercase letters be given the collating value
    of their uppercase equivalents. For ASCII, EBCDIC, and user-
    defined sequences, the lowercase letters are a to z.

    Because the lowercase letters in the Multinational sequence
    already have the collating value of their uppercase equivalents,
    using FOLD is unnecessary.

 TIE_BREAK

    Specifies whether or not Sort/Merge should use numeric values to
    break any ties between characters that have equivalent values.
    By default, tie-breaking occurs with the Multinational sequence.
    Specifying NOTIE_BREAK overrides this default and ensures that no
    further comparisons are made after the initial comparison.

    A TIE_BREAK option must be specified for the ASCII, EBCDIC, and
    user-defined sequences in order for tie-breaking to occur. TIE_
    BREAK should be used when specifying FOLD or MODIFICATION for the
    these sequences.

2  –  Full Description

    The MODIFICATION, IGNORE, FOLD, and [NO]TIE_BREAK options of
    the /COLLATING_SEQUENCE qualifier can also be used to modify the
    collating sequence. You can make more than one modification to
    the collating sequence. If you intend to modify any collating
    sequence, you must specify the sequence in the SEQUENCE option,
    even if it is the default sequence (ASCII).

    Because the FOLD, MODIFICATION, and IGNORE qualifiers are
    processed in the order in which they are specified, care should
    be taken when specifying the order of those qualifiers. Normally,
    FOLD should be specified after all MODIFICATION and IGNORE
    qualifiers to ensure that the effects of the MODIFICATION and
    IGNORE qualifiers apply to uppercase and lowercase characters.

    You can request that Sort/Merge ignore a character or character
    range within the given collating sequence by using the IGNORE
    qualifier.

    By default, in the Multinational collating sequence, Sort/Merge
    folds lowercase letters into their uppercase equivalents. If
    you want this folding to occur in the other collating sequences,
    you must specify a FOLD qualifier with the instructions for the
    collating sequence.

    Also, by default in the Multinational collating sequence,
    Sort/Merge uses numeric comparisons to break any ties in the
    collating values. Ties occur when two equal keys collate the
    same. If you do not want the default when using the Multinational
    collating sequence, specify the keyword NOTIE_BREAK. For tie
    breaking in the other collating sequences, specify a TIE_BREAK
    qualifier.

3  –  Examples

    1./COLLATING_SEQUENCE=(SEQUENCE=ASCII,IGNORE=("-"," "))

      This /COLLATING_SEQUENCE qualifier with an IGNORE option
      specified results in the following fields being compared as
      equal before tie breaking:

         252-3412
         252 3412
         2523412

    2./COLLATING_SEQUENCE=(SEQUENCE=("A"-"L","LL","M"-"R","RR","S"-"Z"))

      This /COLLATING_SEQUENCE qualifier defines a sequence in
      which the double character LL collates as a single character
      between L and M, and the double character RR collates as a
      single character between R and S. These double characters
      would otherwise appear in their usual alphabetical order. By
      default, this user-defined sequence does not define any other
      characters, such as lowercase a to z.

    3./COLLATING_SEQUENCE=(SEQUENCE=
               ("AN","EB","AR","PR","AY","UN","UL",
                "UG","EP","CT","OV","EC","0"-"9"),
                MODIFICATION=("'"="19"),
                FOLD)

      This /COLLATING_SEQUENCE qualifier defines a collating
      sequence. It includes a user-defined sequence that gives each
      month a unique value in chronological order. For example, if
      you want to order a file called SEMINAR.DAT according to the
      date, the file SEMINAR.DAT would be set up as follows:

        16 NOV 1983   Communication Skills
        05 APR 1984   Coping with Alcoholism
        11 Jan '84    How to Be Assertive
        12 OCT 1983   Improving Productivity
        15 MAR 1984   Living with Your Teenager
        08 FEB 1984   Single Parenting
        07 Dec '83    Stress --- Causes and Cures
        14 SEP 1983   Time Management

      The primary key is the year field; the secondary key is the
      month field. Because the month field is not numeric and you
      want the months ordered chronologically, you must define
      your own collating sequence. You can do this by sorting on
      the second two letters of each month-in their chronological
      sequence-giving each month a unique key value.

      The MODIFICATION option specifies that the apostrophe (') be
      equated to 19, thereby allowing a comparison of '83 and 1984.
      The FOLD option specifies that uppercase and lowercase letters
      are treated as equal.

      The output from this sort operation appears as follows:

        14 SEP 1983   Time Management
        12 OCT 1983   Improving Productivity
        16 NOV 1983   Communication Skills
        07 Dec '83    Stress --- Causes and Cures
        11 Jan '84    How to Be Assertive
        08 FEB 1984   Single Parenting
        15 MAR 1984   Living with Your Teenager
        05 APR 1984   Coping with Alcoholism
Close Help