diff options
Diffstat (limited to 'doc/libunistring.info')
-rw-r--r-- | doc/libunistring.info | 128 |
1 files changed, 86 insertions, 42 deletions
diff --git a/doc/libunistring.info b/doc/libunistring.info index 52882c2..d1fdfa2 100644 --- a/doc/libunistring.info +++ b/doc/libunistring.info @@ -1,4 +1,4 @@ -This is libunistring.info, produced by makeinfo version 6.1 from +This is libunistring.info, produced by makeinfo version 6.3 from libunistring.texi. INFO-DIR-SECTION Software development @@ -2866,6 +2866,10 @@ clusters in a string. if no grapheme cluster break is encountered before it. Returns NULL if and only if ‘S == END’. + Note that these functions do not handle the case when a character + outside of the range between S and END is needed to determine the + boundary. Use ‘_grapheme_breaks’ functions for such cases. + -- Function: void u8_grapheme_prev (const uint8_t *S, const uint8_t *START) -- Function: void u16_grapheme_prev (const uint16_t *S, const uint16_t @@ -2876,6 +2880,10 @@ clusters in a string. no grapheme cluster break is encountered before it. Returns NULL if and only if ‘S == START’. + Note that these functions do not handle the case when a character + outside of the range between START and S is needed to determine the + boundary. Use ‘_grapheme_breaks’ functions for such cases. + The following functions determine all of the grapheme cluster boundaries in a string. @@ -2887,8 +2895,10 @@ boundaries in a string. char *P) -- Function: void ulc_grapheme_breaks (const char *S, size_t N, char *P) + -- Function: void uc_grapheme_breaks (const ucs_t *S, size_t N, char + *P) Determines the grapheme cluster break points in S, an array of N - units, and stores the result at ‘P[0..N-1]’. + units, and stores the result at ‘P[0..NX-1]’. ‘P[i] = 1’ means that there is a grapheme cluster boundary between ‘S[i-1]’ and ‘S[i]’. @@ -2898,6 +2908,14 @@ boundaries in a string. ‘P[0]’ is always set to 1, because there is always a grapheme cluster break at start of text. + In addition to the above variants for UTF-8, UTF-16, and UTF-32 + strings, ‘<unigbrk.h>’ provides another variant: + ‘uc_grapheme_breaks’. + + This is similar to ‘u32_grapheme_breaks’, but it accepts any + characters which may not be represented in UTF-32, such as control + characters. + File: libunistring.info, Node: Grapheme cluster break property, Prev: Grapheme cluster breaks in a string, Up: unigbrk.h @@ -2925,6 +2943,12 @@ property. More values may be added in the future. -- Constant: int GBP_T -- Constant: int GBP_LV -- Constant: int GBP_LVT + -- Constant: int GBP_RI + -- Constant: int GBP_ZWJ + -- Constant: int GBP_EB + -- Constant: int GBP_EM + -- Constant: int GBP_GAZ + -- Constant: int GBP_EBG The following function looks up the grapheme cluster break property of a character. @@ -2948,6 +2972,10 @@ the higher-level functions in the previous section are directly based. described in the Unicode standard, because the standard says that they are preferred. + Note that this function do not handle the case when three ore more + consecutive characters are needed to determine the boundary. Use + ‘uc_grapheme_breaks’ for such cases. + File: libunistring.info, Node: uniwbrk.h, Next: unilbrk.h, Prev: unigbrk.h, Up: Top @@ -3016,6 +3044,15 @@ More values may be added in the future. -- Constant: int WBP_MIDNUM -- Constant: int WBP_NUMERIC -- Constant: int WBP_EXTENDNUMLET + -- Constant: int WBP_RI + -- Constant: int WBP_DQ + -- Constant: int WBP_SQ + -- Constant: int WBP_HL + -- Constant: int WBP_ZWJ + -- Constant: int WBP_EB + -- Constant: int WBP_EM + -- Constant: int WBP_GAZ + -- Constant: int WBP_EBG The following function looks up the word break property of a character. @@ -3235,6 +3272,11 @@ single Unicode character. When a decomposition exists, ‘DECOMPOSITION[0..N-1]’ is filled and N is returned. Otherwise -1 is returned. + Note: This function returns the (simple) “canonical decomposition” + of UC. If you want the “full canonical decomposition” of UC, that + is, the recursive application of “canonical decomposition”, use the + function ‘u*_normalize’ with argument ‘UNINORM_NFD’ instead. + File: libunistring.info, Node: Composition of characters, Next: Normalization of strings, Prev: Decomposition of characters, Up: uninorm.h @@ -5642,11 +5684,11 @@ Index * u16_endswith: Elementary string functions on NUL terminated strings. (line 259) * u16_grapheme_breaks: Grapheme cluster breaks in a string. - (line 34) + (line 42) * u16_grapheme_next: Grapheme cluster breaks in a string. (line 11) * u16_grapheme_prev: Grapheme cluster breaks in a string. - (line 21) + (line 25) * u16_is_cased: Case detection. (line 55) * u16_is_casefolded: Case detection. (line 42) * u16_is_lowercase: Case detection. (line 22) @@ -5801,11 +5843,11 @@ Index * u32_endswith: Elementary string functions on NUL terminated strings. (line 261) * u32_grapheme_breaks: Grapheme cluster breaks in a string. - (line 36) + (line 44) * u32_grapheme_next: Grapheme cluster breaks in a string. (line 13) * u32_grapheme_prev: Grapheme cluster breaks in a string. - (line 23) + (line 27) * u32_is_cased: Case detection. (line 57) * u32_is_casefolded: Case detection. (line 44) * u32_is_lowercase: Case detection. (line 24) @@ -5960,11 +6002,11 @@ Index * u8_endswith: Elementary string functions on NUL terminated strings. (line 257) * u8_grapheme_breaks: Grapheme cluster breaks in a string. - (line 32) + (line 40) * u8_grapheme_next: Grapheme cluster breaks in a string. (line 9) * u8_grapheme_prev: Grapheme cluster breaks in a string. - (line 19) + (line 23) * u8_is_cased: Case detection. (line 53) * u8_is_casefolded: Case detection. (line 40) * u8_is_lowercase: Case detection. (line 20) @@ -6117,7 +6159,9 @@ Index * uc_general_category_or: Object oriented API. (line 174) * uc_general_category_t: Object oriented API. (line 6) * uc_graphemeclusterbreak_property: Grapheme cluster break property. - (line 31) + (line 37) +* uc_grapheme_breaks: Grapheme cluster breaks in a string. + (line 48) * uc_is_alnum: Classifications like in ISO C. (line 13) * uc_is_alpha: Classifications like in ISO C. @@ -6138,7 +6182,7 @@ Index * uc_is_graph: Classifications like in ISO C. (line 30) * uc_is_grapheme_break: Grapheme cluster break property. - (line 38) + (line 44) * uc_is_java_whitespace: ISO C and Java syntax. (line 13) * uc_is_lower: Classifications like in ISO C. @@ -6357,7 +6401,7 @@ Index * uc_toupper: Case mappings of characters. (line 16) * uc_width: uniwidth.h. (line 22) -* uc_wordbreak_property: Word break property. (line 31) +* uc_wordbreak_property: Word break property. (line 40) * uint16_t: unitypes.h. (line 9) * uint32_t: unitypes.h. (line 10) * uint8_t: unitypes.h. (line 8) @@ -6371,7 +6415,7 @@ Index (line 77) * ulc_fprintf: unistdio.h. (line 184) * ulc_grapheme_breaks: Grapheme cluster breaks in a string. - (line 38) + (line 46) * ulc_possible_linebreaks: unilbrk.h. (line 48) * ulc_snprintf: unistdio.h. (line 44) * ulc_sprintf: unistdio.h. (line 42) @@ -6496,36 +6540,36 @@ Node: Classifications like in ISO C112234 Node: uniwidth.h115046 Node: unigbrk.h117092 Node: Grapheme cluster breaks in a string118586 -Node: Grapheme cluster break property120691 -Node: uniwbrk.h122592 -Node: Word breaks in a string123130 -Node: Word break property124222 -Node: unilbrk.h125321 -Node: uninorm.h129617 -Node: Decomposition of characters130254 -Node: Composition of characters133731 -Node: Normalization of strings134444 -Node: Normalizing comparisons136521 -Node: Normalization of streams138923 -Node: unicase.h141048 -Node: Case mappings of characters141737 -Node: Case mappings of strings143886 -Node: Case mappings of substrings147237 -Node: Case insensitive comparison154159 -Node: Case detection159564 -Node: uniregex.h162878 -Node: Using the library163105 -Node: Installation163516 -Node: Compiler options164003 -Node: Include files165643 -Node: Autoconf macro166896 -Node: Reporting problems168536 -Node: More functionality169354 -Node: Licenses169797 -Node: GNU GPL172228 -Node: GNU LGPL209973 -Node: GNU FDL218456 -Node: Index243765 +Node: Grapheme cluster break property121521 +Node: uniwbrk.h123765 +Node: Word breaks in a string124303 +Node: Word break property125395 +Node: unilbrk.h126722 +Node: uninorm.h131018 +Node: Decomposition of characters131655 +Node: Composition of characters135436 +Node: Normalization of strings136149 +Node: Normalizing comparisons138226 +Node: Normalization of streams140628 +Node: unicase.h142753 +Node: Case mappings of characters143442 +Node: Case mappings of strings145591 +Node: Case mappings of substrings148942 +Node: Case insensitive comparison155864 +Node: Case detection161269 +Node: uniregex.h164583 +Node: Using the library164810 +Node: Installation165221 +Node: Compiler options165708 +Node: Include files167348 +Node: Autoconf macro168601 +Node: Reporting problems170241 +Node: More functionality171059 +Node: Licenses171502 +Node: GNU GPL173933 +Node: GNU LGPL211678 +Node: GNU FDL220161 +Node: Index245470 End Tag Table |