Home  ›  Alphabetically

Alphabetically

The standard ISO 15924 list of "Codes for the representation of names of scripts. " The Unicode Consortium manages the office of the registering authority and maintenance of the standard on behalf of ISO, which defines and approves the standard. However, ISO 15924 is not part of the standard Unicode (which uses relating solely to the distinctions of abstract characters).

Summary

Appointment and organization of writing systems according to ISO 15924

The standard defines for each script:

  • a descriptive name in English ;
  • a descriptive name in French ;
  • an alphabetical code element (normative) to four letters, for example:
    Arab: Arab ;
    Cyrl: Cyrillic ;
    Egyptians: Egyptian hieroglyphs ;
    Latn: Latin ;
    Laoo: Lao ;
    Yiii: yi ;
  • a digital code value (normative) between 000 and 999, and finally
  • a reference date for tracking changes (and corrections if any) for each script in the standard itself.

For a complete list (and current) set of codes and names, we refer simply to the website listed at end of article.

Nomenclature and numerical classification

The digital code elements are grouped in series of a hundred depending on the type and the relative proximity of writing systems (see examples below).

The codes and names are also defined to take into account the needs for bibliographic texts and whole documents, and are not reserved only for single characters. Also, different styles of writing using the same alphabet abstract feature specific code elements, code elements classified close to the same series, consecutive if possible. For this, the digital code values are not simply allocated in increments of 1 (there are "holes" in the numbering).

The following series are currently used:

  • 000-099: hieroglyphics (Egyptian or Mayan) and cuneiform (including Ugaritic);
  • 100 and 199: alphabetic characters from right to left (including Phoenician alphabet, tifinaghs , Abjad Semitic, Mongolian, Hungarian N'Ko and old);
  • 200-299: alphabetic characters from left to right (including European alphabets derived from Ancient Greek, Hangul and alphabet bobomofo or literary invented alphabets);
  • 300-399: alphasyllabic records (including many abugida Brahmic south and southeast Asia);
  • 400-499: syllabic scripts (including primers A or Linear B, Cypriot, Hiragana or Katakana, Ethiopian, Native Canadian, Cherokee, etc.).
  • 500-599: ideographic scripts or symbolic (whose writing Braille );
  • 600-699: Undeciphered scripts (classification still unknown, as the industrial and rongorongo );
  • 700 to 799 or 800 to 899: Series still used;
  • 900-999: codepoints for private use, alias (currently none), special codepoints.

Composition and allocation of alphabetic code values

The four-letter alphabetic code elements using the basic Latin alphabet to 26 letters. The case of these code elements are not significant, but the case recommended uses a capital letter followed by three lower case letters. The codes are alphabetic writings inspired names for mnemonic reasons. However, variants of the same writing styles differ, as far as possible, that by their fourth letter. These variations are also recognizable by their numeric code elements close in the same series. For example:

  • Latn = 21 5 = (en) ' Latin '= (fr) "Lat i n";
  • Lat = 21 6 = f (en) "Latin (broken version)" = (en) "Latin (F raktur variant)";
  • Lat = 21 7 = g (en) "Latin (variant Gaelic ) "= (en)" Latin (G aelic variant).

Or:

And also:

  • Hani = 50 0 = (en) " i dogrammes han '= (fr) "Han (Hanzi, Kanji, Hanja)";
  • Han s = 50 = 1 (en) "Han ideographs (simplified version)" = (en) "Han (S implified variant)";
  • Han T = 50 2 = (en) "Han ideographs (Traditional variant)" = (en) "Han (T raditional variant).

However, two alphabetic code elements beginning with the same first three letters does not necessarily mean two variants of the same writing system (which may possibly be due to the numerical classification in separate series):

  • Hani = 5 00 = (en) " i dogrammes han '= (fr) "Han (Hanzi, Kanji, Hanja)";
  • Hano = 3 71 = (en) " Hanunoo "= (en)" Han uno o (Hanunoo).

Special code values

If the standard entries are not enough, there are 50 code elements used at the discretion of the users (the names used are not normative and are subject to change):

  • Qaaa = 900 = (en) "reserved for private use (start)" = (en) "Reserved for private use (start)";
  • Qaab = 901 = (en) "reserved for private use (2)" = (en) "Reserved for private use (2 nd)";
  • ...
  • Qaaz = 925 = (en) "reserved for private use (26 e) '= (fr)" Reserved for private use (26 th).
  • Qaba = 926 = (en) "reserved for private use (27 th) '= (fr)" Reserved for private use (27 th) ";
  • ...
  • Qabx = 949 = (en) "reserved for private use (end)" = (en) "Reserved for private use (end).

There are code elements for special case of unwritten languages (eg the use classification of photographs and video recordings or sound systems in the collections of media libraries and museums), or when a script can not be determined reliably for many (in separate families and for which the set has no preset code more accurate), or even when the writing was not specified but could possibly be given more accurately with a other code:

  • Zxxx = 997 = (en) "codepoint for unwritten languages' = (fr)" Code for unwritten languages ";
  • Zyyy = 998 = (en) "Writing for indefinite codepoint" = (en) "Code for undetermined script";
  • Zzzz = 999 = (en) "codepoint to write unencoded" = (en) "Code for uncoded script".

History

This list of codes and names of scripts was created and is maintained by Michael Everson , a member of the Technical Committee of Unicode (UTC). The text of ISO 15924 was approved for the first time on 9 January 2004 , which set out general principles for defining codes.

The first list of codes, while extensive, was published on 1 May 2004 online at the website of the Unicode Consortium. It included, inter alia, all records used or defined in the standard, so Unicode 4.0 and ISO / IEC 10646. A significant number of corrections followed within weeks, and the list was finalized on 29 May 2004.

Since then, a few new entries were added regularly for the purpose of writing being standardized in ISO / IEC 19646 and Unicode, or uses literature as well as entries that are not standardized yet still subject to studies.

Relations with other standards and recommendations

Relation to language code elements of the standard ISO 639

In addition, the alphabetic code elements of ISO 15924 records began, as far as possible, with the same letters that the three-letter code elements of languages according to ISO 639 -2 or its extension ISO 639 -3 (which covers an extensive list of languages) when the names of the writing and language are homonyms. For example:

  • = language name (en) "Latin" = (en) "Latin" codepoint language ISO 639-2 alpha = lat;
  • name = write (in) 'Latin' = (en) "Latin" homonyms, so: codepoint alphabetical writing ISO 15924 = Latn.

The future standard ISO 639 -6 in preparation, which should extend the four-letter language code elements (to identify a larger number of language variants) incorporates this principle, and if possible use the same code elements already included in ISO 15924 entries for homonyms language, to preserve compatibility with the current standard RFC 5646 (BCP 47):

  • name = write (in) 'Latin' = (en) "Latin." : Alphabetically codepoint ISO 15924 script = Latn.
  • = language name (en) "Latin" = (en) "Latin" homonyms, so: codepoint alphabetical language ISO / CD 639-6 = LATN.

Appointment of local RFC 5646, with ISO 639 and ISO 3166

In practice, the alphabetic code values are preferable in applications that are internationalized locate data. These are the alphabetic code values for use in local codes, together with the alpha language code elements of the standard ISO 639 and the alpha or numeric code elements for countries and regions of the standard ISO 3166.

The local applications are described in accordance with the current RFC 5646 (BCP 47) to take into account both the ISO 15924 codepoints writing, in addition to code elements of language ISO 639 code elements and countries and regions ISO 3166.

Differences names with those of standard ISO / IEC 10646

There is no exact bijection between English and French names of scripts defined in ISO 15924 and designations in English and French names used in the normative character and character blocks allocated in the standards ISO / IEC 10646 (and So as Unicode ).

However, future blocks of characters and character standardized in ISO / IEC 10646 (Unicode and thus also) will be appointed, if possible, in accordance with ISO 15924.

Differences alphabetic code values with those of the standard Unicode

Similarly, there is no bijection between the exact code elements of alphabetic writing in ISO 15924 standard and codes of scripts used in the tables of character properties Unicode. Indeed, the ISO 15924 contains additional elements to make distinctions to use bibliographic entries between which were unified in ISO and Unicode character encoding. ISO 15924 contains code elements and distinctive names for the entries that were so unified in a single Unicode (which treats them as typographical variants without differential encoding at the characters and their properties normative or informative).

On the other hand, ISO 15924 was created after the Unicode standard, the format of alphabetic code values may differ from ISO 15924 normative codes used in the tables of Unicode properties (which may be longer and contain underscores).

For information purposes only, ISO 15924 defines an alias (or "documentation of property value) for standardized records, to find the correspondence with the character properties defined in the Unicode standard, where such a difference exists. Since ISO 15924 was published, the Unicode Consortium has committed not to establish new codes different from those defined in ISO 15924, and uses, whenever possible, the alphabetic code elements of ISO 15924. Therefore all the synonyms of Unicode properties are not listed in the tables ISO 15924 (see actual code used in the properties files in the Unicode standard itself, and Unicode added synonyms Property Values Character , which can now only use code elements in ISO 15924-compliant applications to Unicode).

See also

External Links

Related articles

Unicode
Character Sets UCS (ISO / IEC 10646) ISO 646 , ASCII ISO 8859-1 WGL4 Unihan
Equivalency standard NFC (precomposed) NFD (decomposed) NFKC (compatibility) NFKD (compatibility)
Properties and algorithms ISO 15924 Scrap Scheduling UCA Bidirectional text BOM
Coding UTF-7 UTF-8 CESU-8 UTF-EBCDIC BOCU-1 SCSU UTF-16 UTF-32
Other transformations Punycode GB 18030
Applications for data exchange Email and Unicode Unicode and HTML
Some standards ISO
Lists: List of ISO standards List of ISO romanization standards List of IEC standards
Categories: Category: ISO

Leave a Reply


Frequently Asked Questions

1 vote, average: 4.00 out of 51 vote, average: 4.00 out of 51 vote, average: 4.00 out of 51 vote, average: 4.00 out of 51 vote, average: 4.00 out of 5 (1 votes, average: 4.00 out of 5, rated)
Loading ... Loading ...
Help us improve the wiki Send Your Comments