From 3590c846d4c2febbc05b4ad6b14a06edc549e453 Mon Sep 17 00:00:00 2001 From: "Manuel A. Fernandez Montecelo" Date: Fri, 27 May 2016 14:35:16 +0100 Subject: Imported Upstream version 0.9.6+really0.9.6 --- doc/libunistring_14.html | 550 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 537 insertions(+), 13 deletions(-) (limited to 'doc/libunistring_14.html') diff --git a/doc/libunistring_14.html b/doc/libunistring_14.html index 1a7a338..5f261d6 100644 --- a/doc/libunistring_14.html +++ b/doc/libunistring_14.html @@ -1,6 +1,6 @@ - + -GNU libunistring: 14. Regular expressions <uniregex.h> +GNU libunistring: 14. Case mappings <unicase.h> - - + + @@ -43,7 +43,7 @@ ul.toc {list-style: none} - + @@ -51,21 +51,545 @@ ul.toc {list-style: none} - +
[ << ][ >> ][ >> ]         [Top] [Contents][Index][Index] [ ? ]

- + -

14. Regular expressions <uniregex.h>

+

14. Case mappings <unicase.h>

-

This include file is not yet implemented. +

This include file defines functions for case mapping for Unicode strings and +case insensitive comparison of Unicode strings and C strings.

+

These string functions fix the problems that were mentioned in +char *’ strings, namely, they handle the Croatian +LETTER DZ WITH CARON, the German LATIN SMALL LETTER SHARP S, the +Greek sigma and the Lithuanian i correctly. +

+ +
+ + +

14.1 Case mappings of characters

+ +

The following functions implement case mappings on Unicode characters — +for those cases only where the result of the mapping is a again a single +Unicode character. +

+

These mappings are locale and context independent. +

+
+

WARNING! These functions are not sufficient for languages such as +German, Greek and Lithuanian. Better use the functions below that treat an +entire string at once and are language aware. +

+ +
+
Function: ucs4_t uc_toupper (ucs4_t uc) + +
+

Returns the uppercase mapping of the Unicode character uc. +

+ +
+
Function: ucs4_t uc_tolower (ucs4_t uc) + +
+

Returns the lowercase mapping of the Unicode character uc. +

+ +
+
Function: ucs4_t uc_totitle (ucs4_t uc) + +
+

Returns the titlecase mapping of the Unicode character uc. +

+

The titlecase mapping of a character is to be used when the character should +look like upper case and the following characters are lower cased. +

+

For most characters, this is the same as the uppercase mapping. There are +only few characters where the title case variant and the uuper case variant +are different. These characters occur in the Latin writing of the Croatian, +Bosnian, and Serbian languages. +

+ + + + + + +

Lower case

Title case

Upper case +

LATIN SMALL LETTER LJ +

LATIN CAPITAL LETTER L WITH SMALL LETTER J +

LATIN CAPITAL LETTER LJ +

LATIN SMALL LETTER NJ +

LATIN CAPITAL LETTER N WITH SMALL LETTER J +

LATIN CAPITAL LETTER NJ +

LATIN SMALL LETTER DZ +

LATIN CAPITAL LETTER D WITH SMALL LETTER Z +

LATIN CAPITAL LETTER DZ +

LATIN SMALL LETTER DZ WITH CARON +

LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON +

LATIN CAPITAL LETTER DZ WITH CARON +

+
+ +
+ + +

14.2 Case mappings of strings

+ +

Case mapping should always be performed on entire strings, not on individual +characters. The functions in this sections do so. +

+

These functions allow to apply a normalization after the case mapping. The +reason is that if you want to treat ‘ä’ and ‘Ä’ the same, +you most often also want to treat the composed and decomposed forms of such +a character, U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS and +U+0041 LATIN CAPITAL LETTER A U+0308 COMBINING DIAERESIS the same. +The nf argument designates the normalization. +

+ +

These functions are locale dependent. The iso639_language argument +identifies the language (e.g. "tr" for Turkish). NULL means to use +locale independent case mappings. +

+
+
Function: const char * uc_locale_language () + +
+

Returns the ISO 639 language code of the current locale. +Returns "" if it is unknown, or in the "C" locale. +

+ +
+
Function: uint8_t * u8_toupper (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_toupper (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_toupper (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the uppercase mapping of a string. +

+

The nf argument identifies the normalization form to apply after the +case-mapping. It can also be NULL, for no normalization. +

+ +
+
Function: uint8_t * u8_tolower (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_tolower (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_tolower (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the lowercase mapping of a string. +

+

The nf argument identifies the normalization form to apply after the +case-mapping. It can also be NULL, for no normalization. +

+ +
+
Function: uint8_t * u8_totitle (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_totitle (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_totitle (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the titlecase mapping of a string. +

+

Mapping to title case means that, in each word, the first cased character +is being mapped to title case and the remaining characters of the word +are being mapped to lower case. +

+

The nf argument identifies the normalization form to apply after the +case-mapping. It can also be NULL, for no normalization. +

+ +
+ + +

14.3 Case mappings of substrings

+ +

Case mapping of a substring cannot simply be performed by extracting the +substring and then applying the case mapping function to it. This does not +work because case mapping requires some information about the surrounding +characters. The following functions allow to apply case mappings to +substrings of a given string, while taking into account the characters that +precede it (the “prefix”) and the characters that follow it (the “suffix”). +

+
+
Type: casing_prefix_context_t + +
+

This data type denotes the case-mapping context that is given by a prefix +string. It is an immediate type that can be copied by simple assignment, +without involving memory allocation. It is not an array type. +

+ +
+
Constant: casing_prefix_context_t unicase_empty_prefix_context + +
+

This constant is the case-mapping context that corresponds to an empty prefix +string. +

+ +

The following functions return casing_prefix_context_t objects: +

+
+
Function: casing_prefix_context_t u8_casing_prefix_context (const uint8_t *s, size_t n) + +
+
Function: casing_prefix_context_t u16_casing_prefix_context (const uint16_t *s, size_t n) + +
+
Function: casing_prefix_context_t u32_casing_prefix_context (const uint32_t *s, size_t n) + +
+

Returns the case-mapping context of a given prefix string. +

+ +
+
Function: casing_prefix_context_t u8_casing_prefixes_context (const uint8_t *s, size_t n, casing_prefix_context_t a_context) + +
+
Function: casing_prefix_context_t u16_casing_prefixes_context (const uint16_t *s, size_t n, casing_prefix_context_t a_context) + +
+
Function: casing_prefix_context_t u32_casing_prefixes_context (const uint32_t *s, size_t n, casing_prefix_context_t a_context) + +
+

Returns the case-mapping context of the prefix concat(a, s), +given the case-mapping context of the prefix a. +

+ +
+
Type: casing_suffix_context_t + +
+

This data type denotes the case-mapping context that is given by a suffix +string. It is an immediate type that can be copied by simple assignment, +without involving memory allocation. It is not an array type. +

+ +
+
Constant: casing_suffix_context_t unicase_empty_suffix_context + +
+

This constant is the case-mapping context that corresponds to an empty suffix +string. +

+ +

The following functions return casing_suffix_context_t objects: +

+
+
Function: casing_suffix_context_t u8_casing_suffix_context (const uint8_t *s, size_t n) + +
+
Function: casing_suffix_context_t u16_casing_suffix_context (const uint16_t *s, size_t n) + +
+
Function: casing_suffix_context_t u32_casing_suffix_context (const uint32_t *s, size_t n) + +
+

Returns the case-mapping context of a given suffix string. +

+ +
+
Function: casing_suffix_context_t u8_casing_suffixes_context (const uint8_t *s, size_t n, casing_suffix_context_t a_context) + +
+
Function: casing_suffix_context_t u16_casing_suffixes_context (const uint16_t *s, size_t n, casing_suffix_context_t a_context) + +
+
Function: casing_suffix_context_t u32_casing_suffixes_context (const uint32_t *s, size_t n, casing_suffix_context_t a_context) + +
+

Returns the case-mapping context of the suffix concat(s, a), +given the case-mapping context of the suffix a. +

+ +

The following functions perform a case mapping, considering the +prefix context and the suffix context. +

+
+
Function: uint8_t * u8_ct_toupper (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_ct_toupper (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_ct_toupper (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the uppercase mapping of a string that is surrounded by a prefix +and a suffix. +

+ +
+
Function: uint8_t * u8_ct_tolower (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_ct_tolower (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_ct_tolower (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the lowercase mapping of a string that is surrounded by a prefix +and a suffix. +

+ +
+
Function: uint8_t * u8_ct_totitle (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_ct_totitle (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_ct_totitle (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the titlecase mapping of a string that is surrounded by a prefix +and a suffix. +

+ +

For example, to uppercase the UTF-8 substring between s + start_index +and s + end_index of a string that extends from s to +s + u8_strlen (s), you can use the statements +

+
 
size_t result_length;
+uint8_t result =
+  u8_ct_toupper (s + start_index, end_index - start_index,
+                 u8_casing_prefix_context (s, start_index),
+                 u8_casing_suffix_context (s + end_index,
+                                           u8_strlen (s) - end_index),
+                 iso639_language, NULL, NULL, &result_length);
+
+ +
+ + +

14.4 Case insensitive comparison

+ +

The following functions implement comparison that ignores differences in case +and normalization. +

+
+
Function: uint8_t * u8_casefold (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_casefold (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_casefold (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the case folded string. +

+

Comparing u8_casefold (s1) and u8_casefold (s2) +with the u8_cmp2 function is equivalent to comparing s1 and +s2 with u8_casecmp. +

+

The nf argument identifies the normalization form to apply after the +case-mapping. It can also be NULL, for no normalization. +

+ +
+
Function: uint8_t * u8_ct_casefold (const uint8_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t *lengthp) + +
+
Function: uint16_t * u16_ct_casefold (const uint16_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint16_t *resultbuf, size_t *lengthp) + +
+
Function: uint32_t * u32_ct_casefold (const uint32_t *s, size_t n, casing_prefix_context_t prefix_context, casing_suffix_context_t suffix_context, const char *iso639_language, uninorm_t nf, uint32_t *resultbuf, size_t *lengthp) + +
+

Returns the case folded string. The case folding takes into account the +case mapping contexts of the prefix and suffix strings. +

+ +
+
Function: int u8_casecmp (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+
Function: int u16_casecmp (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+
Function: int u32_casecmp (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+
Function: int ulc_casecmp (const char *s1, size_t n1, const char *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+

Compares s1 and s2, ignoring differences in case and normalization. +

+

The nf argument identifies the normalization form to apply after the +case-mapping. It can also be NULL, for no normalization. +

+

If successful, sets *resultp to -1 if s1 < s2, +0 if s1 = s2, 1 if s1 > s2, and returns 0. +Upon failure, returns -1 with errno set. +

+ + + + + +

The following functions additionally take into account the sorting rules of the +current locale. +

+
+
Function: char * u8_casexfrm (const uint8_t *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) + +
+
Function: char * u16_casexfrm (const uint16_t *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) + +
+
Function: char * u32_casexfrm (const uint32_t *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) + +
+
Function: char * ulc_casexfrm (const char *s, size_t n, const char *iso639_language, uninorm_t nf, char *resultbuf, size_t *lengthp) + +
+

Converts the string s of length n to a NUL-terminated byte +sequence, in such a way that comparing u8_casexfrm (s1) and +u8_casexfrm (s2) with the gnulib function memcmp2 is +equivalent to comparing s1 and s2 with u8_casecoll. +

+

nf must be either UNINORM_NFC, UNINORM_NFKC, or NULL for +no normalization. +

+ +
+
Function: int u8_casecoll (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+
Function: int u16_casecoll (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+
Function: int u32_casecoll (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+
Function: int ulc_casecoll (const char *s1, size_t n1, const char *s2, size_t n2, const char *iso639_language, uninorm_t nf, int *resultp) + +
+

Compares s1 and s2, ignoring differences in case and normalization, +using the collation rules of the current locale. +

+

The nf argument identifies the normalization form to apply after the +case-mapping. It must be either UNINORM_NFC or UNINORM_NFKC. +It can also be NULL, for no normalization. +

+

If successful, sets *resultp to -1 if s1 < s2, +0 if s1 = s2, 1 if s1 > s2, and returns 0. +Upon failure, returns -1 with errno set. +

+ +
+ + +

14.5 Case detection

+ +

The following functions determine whether a Unicode string is entirely in +upper case. or entirely in lower case, or entirely in title case, or already +case-folded. +

+
+
Function: int u8_is_uppercase (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u16_is_uppercase (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u32_is_uppercase (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+

Sets *resultp to true if mapping NFD(s) to upper case is +a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with +errno set. +

+ +
+
Function: int u8_is_lowercase (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u16_is_lowercase (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u32_is_lowercase (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+

Sets *resultp to true if mapping NFD(s) to lower case is +a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with +errno set. +

+ +
+
Function: int u8_is_titlecase (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u16_is_titlecase (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u32_is_titlecase (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+

Sets *resultp to true if mapping NFD(s) to title case is +a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with +errno set. +

+ +
+
Function: int u8_is_casefolded (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u16_is_casefolded (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u32_is_casefolded (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+

Sets *resultp to true if applying case folding to NFD(S) is +a no-op, or to false otherwise, and returns 0. Upon failure, returns -1 with +errno set. +

+ +

The following functions determine whether case mappings have any effect on a +Unicode string. +

+
+
Function: int u8_is_cased (const uint8_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u16_is_cased (const uint16_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+
Function: int u32_is_cased (const uint32_t *s, size_t n, const char *iso639_language, bool *resultp) + +
+

Sets *resultp to true if case matters for s, that is, if +mapping NFD(s) to either upper case or lower case or title case is not +a no-op. Set *resultp to false if NFD(s) maps to itself +under the upper case mapping, under the lower case mapping, and under the title +case mapping; in other words, when NFD(s) consists entirely of caseless +characters. Upon failure, returns -1 with errno set. +


- - + + @@ -73,12 +597,12 @@ ul.toc {list-style: none} - +
[ << ][ >> ]
[ << ][ >> ]         [Top] [Contents][Index][Index] [ ? ]

- This document was generated by Bruno Haible on March, 30 2010 using texi2html 1.78a. + This document was generated by Daiki Ueno on July, 8 2015 using texi2html 1.78a.
-- cgit v1.2.3