The partial listing of ISO 639 two-character codes is supplied here will supplement the shorter lists given in Martin Bryan (SGML: An Author's Guide to the Standard Generalized Markup Language, 92-93) and
Eric van Herwijnen (Practical SGML, 67-68). The two-character
language codes of ISO 639 are relevant to SGML encoding in two respects.
First, the SGML standard (ISO 8879) itself specifies that declaration of
public text language should be given using the language code(s)
from language.code attribute of the nat.language
declaration, specifying the language in which the WSD is written.
ISO 639; see ISO 8879-1986(E) page 36, section 10.2.2.3. Second, the WSD (Writing System Declaration) implemented in the Text Encoding Initiative uses the [two-character] language code of ISO 639 (as amended) as a
The information on 2-character language codes summarized below has been taken from ISO 639 Code for the representation of the names of languages. First edition, 1988-04-01. Reference number: ISO 639: 1988 (E/F). iii + 17 pages. ISO 639:1988 is a technical revision of ISO 639: 1967, prepared by Technical Committee ISO/TC 37. The language codes are listed in ISO 639 with lowercase letters, but are given here in uppercase, as recommended for use as SGML tag names for "public text language." See ISO 8879 section 10.2.2.3: "the 'public text language' must be a two-character name, entered with upper-case letters."
ISO 639 contains much other information about the use of language symbols, registration of new symbols, etc. The language codes of ISO 639 are said to be "devised primarily for use in terminology, lexicography and linguistics, but they may be used for any application requiring the expression of languages in coded form." The registration authority for ISO 639 is given as Infoterm, Österreiches Normungsinstitut (ON), Postfach 130, A-1021 Vienna, AUSTRIA.
The two-character language codes of ISO 639 are recognized as being inadequate for use as SGML language attributes when tagging text, viz, for use as global lang attributes attached to any element to identify the language of the text element or a language shift. On lang as a global attribute, see the TEI Guidelines, page 45, section 3.2.1. In principle, there should be nothing wrong with tagging language using SGML elements rather than attributes, if the encoder has principled reasons for not using attributes (e.g., indexing engines which read simple tags but not SGML attributes). But the two-character codes of ISO 639 are neither sufficiently mnemonic nor complete for the world's languages: whereas ISO 639 supplies codes for only about 136 languages, the Ethnologue published by the Summer Institute of Linguistics identifies over 6100 languages (see Ethnologue: Languages of the World, ed. Barbara Grimes. 11th edition. Dallas, TX: Summer Institute of Linguistics, 1988). A revision of ISO 639 completed late 1990 is described as supplying 3-character language codes (following MARC 3-character language codes in part), based upon the code sequence of the American National Standard (ANSI Z39.53). This draft will be circulated for worldwide review in 1991. It remains to be seen whether these new ISO 639 3-character codes qualify mnemonically for use in SGML tagging and if the set is complete. Provisionally, and as a convenience, the set of 3-character MARC language codes are supplied in this appendix. Where they are mnemonic, unique and adequately distinguish dialectical variants, it would seem permissible to use them for lang attribute values or as language tags.
Changes made December 20, 1997, based upon information in the following note from a member of the W3C HTML group:
ISO 639; see ISO 8879-1986(E) page 36, section 10.2.2.3. Second, the WSD (Writing System Declaration) implemented in the Text Encoding Initiative uses the [two-character] language code of ISO 639 (as amended) as a
The information on 2-character language codes summarized below has been taken from ISO 639 Code for the representation of the names of languages. First edition, 1988-04-01. Reference number: ISO 639: 1988 (E/F). iii + 17 pages. ISO 639:1988 is a technical revision of ISO 639: 1967, prepared by Technical Committee ISO/TC 37. The language codes are listed in ISO 639 with lowercase letters, but are given here in uppercase, as recommended for use as SGML tag names for "public text language." See ISO 8879 section 10.2.2.3: "the 'public text language' must be a two-character name, entered with upper-case letters."
ISO 639 contains much other information about the use of language symbols, registration of new symbols, etc. The language codes of ISO 639 are said to be "devised primarily for use in terminology, lexicography and linguistics, but they may be used for any application requiring the expression of languages in coded form." The registration authority for ISO 639 is given as Infoterm, Österreiches Normungsinstitut (ON), Postfach 130, A-1021 Vienna, AUSTRIA.
The two-character language codes of ISO 639 are recognized as being inadequate for use as SGML language attributes when tagging text, viz, for use as global lang attributes attached to any element to identify the language of the text element or a language shift. On lang as a global attribute, see the TEI Guidelines, page 45, section 3.2.1. In principle, there should be nothing wrong with tagging language using SGML elements rather than attributes, if the encoder has principled reasons for not using attributes (e.g., indexing engines which read simple tags but not SGML attributes). But the two-character codes of ISO 639 are neither sufficiently mnemonic nor complete for the world's languages: whereas ISO 639 supplies codes for only about 136 languages, the Ethnologue published by the Summer Institute of Linguistics identifies over 6100 languages (see Ethnologue: Languages of the World, ed. Barbara Grimes. 11th edition. Dallas, TX: Summer Institute of Linguistics, 1988). A revision of ISO 639 completed late 1990 is described as supplying 3-character language codes (following MARC 3-character language codes in part), based upon the code sequence of the American National Standard (ANSI Z39.53). This draft will be circulated for worldwide review in 1991. It remains to be seen whether these new ISO 639 3-character codes qualify mnemonically for use in SGML tagging and if the set is complete. Provisionally, and as a convenience, the set of 3-character MARC language codes are supplied in this appendix. Where they are mnemonic, unique and adequately distinguish dialectical variants, it would seem permissible to use them for lang attribute values or as language tags.
ISO 639 CODES ALPHABETIC BY LANGUAGE NAME (ENGLISH SPELLING) LANGUAGE NAME CODE LANGUAGE FAMILY ABKHAZIAN AB IBERO-CAUCASIAN AFAN (OROMO) OM HAMITIC AFAR AA HAMITIC AFRIKAANS AF GERMANIC ALBANIAN SQ INDO-EUROPEAN (OTHER) AMHARIC AM SEMITIC ARABIC AR SEMITIC ARMENIAN HY INDO-EUROPEAN (OTHER) ASSAMESE AS INDIAN AYMARA AY AMERINDIAN AZERBAIJANI AZ TURKIC/ALTAIC BASHKIR BA TURKIC/ALTAIC BASQUE EU BASQUE BENGALI;BANGLA BN INDIAN BHUTANI DZ ASIAN BIHARI BH INDIAN BISLAMA BI [not given] BRETON BR CELTIC BULGARIAN BG SLAVIC BURMESE MY ASIAN BYELORUSSIAN BE SLAVIC CAMBODIAN KM ASIAN CATALAN CA ROMANCE CHINESE ZH ASIAN CORSICAN CO ROMANCE CROATIAN HR SLAVIC CZECH CS SLAVIC DANISH DA GERMANIC DUTCH NL GERMANIC ENGLISH EN GERMANIC ESPERANTO EO INTERNATIONAL AUX. ESTONIAN ET FINNO-UGRIC FAROESE FO GERMANIC FIJI FJ OCEANIC/INDONESIAN FINNISH FI FINNO-UGRIC FRENCH FR ROMANCE FRISIAN FY GERMANIC GALICIAN GL ROMANCE GEORGIAN KA IBERO-CAUCASIAN GERMAN DE GERMANIC GREEK EL LATIN/GREEK GREENLANDIC KL ESKIMO GUARANI GN AMERINDIAN GUJARATI GU INDIAN HAUSA HA NEGRO-AFRICAN HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW] HINDI HI INDIAN HUNGARIAN HU FINNO-UGRIC ICELANDIC IS GERMANIC INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] INTERLINGUA IA INTERNATIONAL AUX. INTERLINGUE IE INTERNATIONAL AUX. INUKTITUT IU [ ] INUPIAK IK ESKIMO IRISH GA CELTIC ITALIAN IT ROMANCE JAPANESE JA ASIAN JAVANESE JV OCEANIC/INDONESIAN KANNADA KN DRAVIDIAN KASHMIRI KS INDIAN KAZAKH KK TURKIC/ALTAIC KINYARWANDA RW NEGRO-AFRICAN KIRGHIZ KY TURKIC/ALTAIC KURUNDI RN NEGRO-AFRICAN KOREAN KO ASIAN KURDISH KU IRANIAN LAOTHIAN LO ASIAN LATIN LA LATIN/GREEK LATVIAN;LETTISH LV BALTIC LINGALA LN NEGRO-AFRICAN LITHUANIAN LT BALTIC MACEDONIAN MK SLAVIC MALAGASY MG OCEANIC/INDONESIAN MALAY MS OCEANIC/INDONESIAN MALAYALAM ML DRAVIDIAN MALTESE MT SEMITIC MAORI MI OCEANIC/INDONESIAN MARATHI MR INDIAN MOLDAVIAN MO ROMANCE MONGOLIAN MN [not given] NAURU NA [not given] NEPALI NE INDIAN NORWEGIAN NO GERMANIC OCCITAN OC ROMANCE ORIYA OR INDIAN PASHTO;PUSHTO PS IRANIAN PERSIAN (farsi) FA IRANIAN POLISH PL SLAVIC PORTUGUESE PT ROMANCE PUNJABI PA INDIAN QUECHUA QU AMERINDIAN RHAETO-ROMANCE RM ROMANCE ROMANIAN RO ROMANCE RUSSIAN RU SLAVIC SAMOAN SM OCEANIC/INDONESIAN SANGHO SG NEGRO-AFRICAN SANSKRIT SA INDIAN SCOTS GAELIC GD CELTIC SERBIAN SR SLAVIC SERBO-CROATIAN SH SLAVIC SESOTHO ST NEGRO-AFRICAN SETSWANA TN NEGRO-AFRICAN SHONA SN NEGRO-AFRICAN SINDHI SD INDIAN SINGHALESE SI INDIAN SISWATI SS NEGRO-AFRICAN SLOVAK SK SLAVIC SLOVENIAN SL SLAVIC SOMALI SO HAMITIC SPANISH ES ROMANCE SUNDANESE SU OCEANIC/INDONESIAN SWAHILI SW NEGRO-AFRICAN SWEDISH SV GERMANIC TAGALOG TL OCEANIC/INDONESIAN TAJIK TG IRANIAN TAMIL TA DRAVIDIAN TATAR TT TURKIC/ALTAIC TELUGU TE DRAVIDIAN THAI TH ASIAN TIBETAN BO ASIAN TIGRINYA TI SEMITIC TONGA TO OCEANIC/INDONESIAN TSONGA TS NEGRO-AFRICAN TURKISH TR TURKIC/ALTAIC TURKMEN TK TURKIC/ALTAIC TWI TW NEGRO-AFRICAN UIGUR UG [ ] UKRAINIAN UK SLAVIC URDU UR INDIAN UZBEK UZ TURKIC/ALTAIC VIETNAMESE VI ASIAN VOLAPUK VO INTERNATIONAL AUX. WELSH CY CELTIC WOLOF WO NEGRO-AFRICAN XHOSA XH NEGRO-AFRICAN YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI] YORUBA YO NEGRO-AFRICAN ZHUANG ZA [ ] ZULU ZU NEGRO-AFRICAN ISO 639 CODES SORTED BY LANGUAGE CODE LANGUAGE NAME CODE LANGUAGE FAMILY AFAR AA HAMITIC ABKHAZIAN AB IBERO-CAUCASIAN AFRIKAANS AF GERMANIC AMHARIC AM SEMITIC ARABIC AR SEMITIC ASSAMESE AS INDIAN AYMARA AY AMERINDIAN AZERBAIJANI AZ TURKIC/ALTAIC BASHKIR BA TURKIC/ALTAIC BYELORUSSIAN BE SLAVIC BULGARIAN BG SLAVIC BIHARI BH INDIAN BISLAMA BI [not given] BENGALI;BANGLA BN INDIAN TIBETAN BO ASIAN BRETON BR CELTIC CATALAN CA ROMANCE CORSICAN CO ROMANCE CZECH CS SLAVIC WELSH CY CELTIC DANISH DA GERMANIC GERMAN DE GERMANIC BHUTANI DZ ASIAN GREEK EL LATIN/GREEK ENGLISH EN GERMANIC ESPERANTO EO INTERNATIONAL AUX. SPANISH ES ROMANCE ESTONIAN ET FINNO-UGRIC BASQUE EU BASQUE PERSIAN (farsi) FA IRANIAN FINNISH FI FINNO-UGRIC FIJI FJ OCEANIC/INDONESIAN FAROESE FO GERMANIC FRENCH FR ROMANCE FRISIAN FY GERMANIC IRISH GA CELTIC SCOTS GAELIC GD CELTIC GALICIAN GL ROMANCE GUARANI GN AMERINDIAN GUJARATI GU INDIAN HAUSA HA NEGRO-AFRICAN HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW] HINDI HI INDIAN CROATIAN HR SLAVIC HUNGARIAN HU FINNO-UGRIC ARMENIAN HY INDO-EUROPEAN (OTHER) INTERLINGUA IA INTERNATIONAL AUX. INTERLINGUE IE INTERNATIONAL AUX. INUPIAK IK ESKIMO INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] ICELANDIC IS GERMANIC ITALIAN IT ROMANCE INUKTITUT IU [ ] JAPANESE JA ASIAN JAVANESE JV OCEANIC/INDONESIAN GEORGIAN KA IBERO-CAUCASIAN KAZAKH KK TURKIC/ALTAIC GREENLANDIC KL ESKIMO CAMBODIAN KM ASIAN KANNADA KN DRAVIDIAN KOREAN KO ASIAN KASHMIRI KS INDIAN KURDISH KU IRANIAN KIRGHIZ KY TURKIC/ALTAIC LATIN LA LATIN/GREEK LINGALA LN NEGRO-AFRICAN LAOTHIAN LO ASIAN LITHUANIAN LT BALTIC LATVIAN;LETTISH LV BALTIC MALAGASY MG OCEANIC/INDONESIAN MAORI MI OCEANIC/INDONESIAN MACEDONIAN MK SLAVIC MALAYALAM ML DRAVIDIAN MONGOLIAN MN [not given] MOLDAVIAN MO ROMANCE MARATHI MR INDIAN MALAY MS OCEANIC/INDONESIAN MALTESE MT SEMITIC BURMESE MY ASIAN NAURU NA [not given] NEPALI NE INDIAN DUTCH NL GERMANIC NORWEGIAN NO GERMANIC OCCITAN OC ROMANCE AFAN (OROMO) OM HAMITIC ORIYA OR INDIAN PUNJABI PA INDIAN POLISH PL SLAVIC PASHTO;PUSHTO PS IRANIAN PORTUGUESE PT ROMANCE QUECHUA QU AMERINDIAN RHAETO-ROMANCE RM ROMANCE KURUNDI RN NEGRO-AFRICAN ROMANIAN RO ROMANCE RUSSIAN RU SLAVIC KINYARWANDA RW NEGRO-AFRICAN SANSKRIT SA INDIAN SINDHI SD INDIAN SANGHO SG NEGRO-AFRICAN SERBO-CROATIAN SH SLAVIC SINGHALESE SI INDIAN SLOVAK SK SLAVIC SLOVENIAN SL SLAVIC SAMOAN SM OCEANIC/INDONESIAN SHONA SN NEGRO-AFRICAN SOMALI SO HAMITIC ALBANIAN SQ INDO-EUROPEAN (OTHER) SERBIAN SR SLAVIC SISWATI SS NEGRO-AFRICAN SESOTHO ST NEGRO-AFRICAN SUNDANESE SU OCEANIC/INDONESIAN SWEDISH SV GERMANIC SWAHILI SW NEGRO-AFRICAN TAMIL TA DRAVIDIAN TELUGU TE DRAVIDIAN TAJIK TG IRANIAN THAI TH ASIAN TIGRINYA TI SEMITIC TURKMEN TK TURKIC/ALTAIC TAGALOG TL OCEANIC/INDONESIAN SETSWANA TN NEGRO-AFRICAN TONGA TO OCEANIC/INDONESIAN TURKISH TR TURKIC/ALTAIC TSONGA TS NEGRO-AFRICAN TATAR TT TURKIC/ALTAIC TWI TW NEGRO-AFRICAN UIGUR UG [ ] UKRAINIAN UK SLAVIC URDU UR INDIAN UZBEK UZ TURKIC/ALTAIC VIETNAMESE VI ASIAN VOLAPUK VO INTERNATIONAL AUX. WOLOF WO NEGRO-AFRICAN XHOSA XH NEGRO-AFRICAN YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI] YORUBA YO NEGRO-AFRICAN ZHUANG ZA [ ] CHINESE ZH ASIAN ZULU ZU NEGRO-AFRICAN ISO 639 LANGUAGE CODES SORTED BY LANGUAGE GROUP LANGUAGE NAME CODE LANGUAGE FAMILY AYMARA AY AMERINDIAN GUARANI GN AMERINDIAN QUECHUA QU AMERINDIAN BHUTANI DZ ASIAN BURMESE MY ASIAN CAMBODIAN KM ASIAN CHINESE ZH ASIAN JAPANESE JA ASIAN KOREAN KO ASIAN LAOTHIAN LO ASIAN THAI TH ASIAN TIBETAN BO ASIAN VIETNAMESE VI ASIAN LATVIAN;LETTISH LV BALTIC LITHUANIAN LT BALTIC BASQUE EU BASQUE BRETON BR CELTIC IRISH GA CELTIC SCOTS GAELIC GD CELTIC WELSH CY CELTIC KANNADA KN DRAVIDIAN MALAYALAM ML DRAVIDIAN TAMIL TA DRAVIDIAN TELUGU TE DRAVIDIAN GREENLANDIC KL ESKIMO INUPIAK IK ESKIMO ESTONIAN ET FINNO-UGRIC FINNISH FI FINNO-UGRIC HUNGARIAN HU FINNO-UGRIC AFRIKAANS AF GERMANIC DANISH DA GERMANIC DUTCH NL GERMANIC ENGLISH EN GERMANIC FAROESE FO GERMANIC FRISIAN FY GERMANIC GERMAN DE GERMANIC ICELANDIC IS GERMANIC NORWEGIAN NO GERMANIC SWEDISH SV GERMANIC YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI] AFAN (OROMO) OM HAMITIC AFAR AA HAMITIC SOMALI SO HAMITIC ABKHAZIAN AB IBERO-CAUCASIAN GEORGIAN KA IBERO-CAUCASIAN ASSAMESE AS INDIAN BENGALI;BANGLA BN INDIAN BIHARI BH INDIAN GUJARATI GU INDIAN HINDI HI INDIAN KASHMIRI KS INDIAN MARATHI MR INDIAN NEPALI NE INDIAN ORIYA OR INDIAN PUNJABI PA INDIAN SANSKRIT SA INDIAN SINDHI SD INDIAN SINGHALESE SI INDIAN URDU UR INDIAN ALBANIAN SQ INDO-EUROPEAN (OTHER) ARMENIAN HY INDO-EUROPEAN (OTHER) ESPERANTO EO INTERNATIONAL AUX. INTERLINGUA IA INTERNATIONAL AUX. INTERLINGUE IE INTERNATIONAL AUX. VOLAPUK VO INTERNATIONAL AUX. KURDISH KU IRANIAN PASHTO;PUSHTO PS IRANIAN PERSIAN (farsi) FA IRANIAN TAJIK TG IRANIAN GREEK EL LATIN/GREEK LATIN LA LATIN/GREEK HAUSA HA NEGRO-AFRICAN KINYARWANDA RW NEGRO-AFRICAN KURUNDI RN NEGRO-AFRICAN LINGALA LN NEGRO-AFRICAN SANGHO SG NEGRO-AFRICAN SESOTHO ST NEGRO-AFRICAN SETSWANA TN NEGRO-AFRICAN SHONA SN NEGRO-AFRICAN SISWATI SS NEGRO-AFRICAN SWAHILI SW NEGRO-AFRICAN TSONGA TS NEGRO-AFRICAN TWI TW NEGRO-AFRICAN WOLOF WO NEGRO-AFRICAN XHOSA XH NEGRO-AFRICAN YORUBA YO NEGRO-AFRICAN ZULU ZU NEGRO-AFRICAN FIJI FJ OCEANIC/INDONESIAN INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN] JAVANESE JV OCEANIC/INDONESIAN MALAGASY MG OCEANIC/INDONESIAN MALAY MS OCEANIC/INDONESIAN MAORI MI OCEANIC/INDONESIAN SAMOAN SM OCEANIC/INDONESIAN SUNDANESE SU OCEANIC/INDONESIAN TAGALOG TL OCEANIC/INDONESIAN TONGA TO OCEANIC/INDONESIAN CATALAN CA ROMANCE CORSICAN CO ROMANCE FRENCH FR ROMANCE GALICIAN GL ROMANCE ITALIAN IT ROMANCE MOLDAVIAN MO ROMANCE OCCITAN OC ROMANCE PORTUGUESE PT ROMANCE RHAETO-ROMANCE RM ROMANCE ROMANIAN RO ROMANCE SPANISH ES ROMANCE AMHARIC AM SEMITIC ARABIC AR SEMITIC HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW] MALTESE MT SEMITIC TIGRINYA TI SEMITIC BULGARIAN BG SLAVIC BYELORUSSIAN BE SLAVIC CROATIAN HR SLAVIC CZECH CS SLAVIC MACEDONIAN MK SLAVIC POLISH PL SLAVIC RUSSIAN RU SLAVIC SERBIAN SR SLAVIC SERBO-CROATIAN SH SLAVIC SLOVAK SK SLAVIC SLOVENIAN SL SLAVIC UKRAINIAN UK SLAVIC AZERBAIJANI AZ TURKIC/ALTAIC BASHKIR BA TURKIC/ALTAIC KAZAKH KK TURKIC/ALTAIC KIRGHIZ KY TURKIC/ALTAIC TATAR TT TURKIC/ALTAIC TURKISH TR TURKIC/ALTAIC TURKMEN TK TURKIC/ALTAIC UZBEK UZ TURKIC/ALTAIC BISLAMA BI [not given] MONGOLIAN MN [not given] NAURU NA [not given]
Changes made December 20, 1997, based upon information in the following note from a member of the W3C HTML group:
"In 1989, the ISO 639 Registration Authority changed a number of codes as follows (the quote is taken from RFC 1766): : The following codes have been added in 1989 (nothing later): ug : (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew, : replacing iw), yi (Yiddish, replacing ji), and id (Indonesian, : replacing in)." Hence these changes in the listings above (assignment of UIGUR, INUKTITUT and ZHUANG to a 'LANGUAGE FAMILY' to be determined): HEBREW HE SEMITIC (3 occurrences, replacing IW with HE) YIDDISH YI GERMANIC (3 occurrences, replacing JI with YI) INDONESIAN ID OCEANIC/INDONESIAN (3 occurrences, replacing IN with ID) UIGUR UG [ ] (2 occurrences added) INUKTITUT IU [ ] (2 occurrences added) ZHUANG ZA [ ] (2 occurrences added)
Additional Note 2001-08-29 The provisional/draft (informative) "Annex B" in ISO 639-1:2001 (FDIS) offers these clarifications: From: http://www.rtt.org/ISO/TC37/SC2/WG1/639/639-1-FDIS-x-2001-02-09.htm Changes from ISO 639:1988 to ISO 639-1:2001 This annex lists all languages that have been added since the publication of ISO 639:1988. Modifications to the names of the languages are not included. Three language identifiers were changed in 1989. The changes were publicised, but they have not been included in printed versions of ISO 639. These changes are: The identifier for Hebrew was changed from "iw" to "he". The identifier for Indonesian was changed from "in" to "id". The identifier for Yiddish was changed from "ji" to "yi". In addition, ISO 639:1988 contains one error. The identifier for Javanese is rendered as "jw" in table 1, while it is correctly given as "jv" in the other tables.