summaryrefslogtreecommitdiff
path: root/doc/libunistring.info
diff options
context:
space:
mode:
Diffstat (limited to 'doc/libunistring.info')
-rw-r--r--doc/libunistring.info128
1 files changed, 86 insertions, 42 deletions
diff --git a/doc/libunistring.info b/doc/libunistring.info
index 52882c2..d1fdfa2 100644
--- a/doc/libunistring.info
+++ b/doc/libunistring.info
@@ -1,4 +1,4 @@
-This is libunistring.info, produced by makeinfo version 6.1 from
+This is libunistring.info, produced by makeinfo version 6.3 from
libunistring.texi.
INFO-DIR-SECTION Software development
@@ -2866,6 +2866,10 @@ clusters in a string.
if no grapheme cluster break is encountered before it. Returns
NULL if and only if ‘S == END’.
+ Note that these functions do not handle the case when a character
+ outside of the range between S and END is needed to determine the
+ boundary. Use ‘_grapheme_breaks’ functions for such cases.
+
-- Function: void u8_grapheme_prev (const uint8_t *S, const uint8_t
*START)
-- Function: void u16_grapheme_prev (const uint16_t *S, const uint16_t
@@ -2876,6 +2880,10 @@ clusters in a string.
no grapheme cluster break is encountered before it. Returns NULL
if and only if ‘S == START’.
+ Note that these functions do not handle the case when a character
+ outside of the range between START and S is needed to determine the
+ boundary. Use ‘_grapheme_breaks’ functions for such cases.
+
The following functions determine all of the grapheme cluster
boundaries in a string.
@@ -2887,8 +2895,10 @@ boundaries in a string.
char *P)
-- Function: void ulc_grapheme_breaks (const char *S, size_t N, char
*P)
+ -- Function: void uc_grapheme_breaks (const ucs_t *S, size_t N, char
+ *P)
Determines the grapheme cluster break points in S, an array of N
- units, and stores the result at ‘P[0..N-1]’.
+ units, and stores the result at ‘P[0..NX-1]’.
‘P[i] = 1’
means that there is a grapheme cluster boundary between
‘S[i-1]’ and ‘S[i]’.
@@ -2898,6 +2908,14 @@ boundaries in a string.
‘P[0]’ is always set to 1, because there is always a grapheme
cluster break at start of text.
+ In addition to the above variants for UTF-8, UTF-16, and UTF-32
+ strings, ‘<unigbrk.h>’ provides another variant:
+ ‘uc_grapheme_breaks’.
+
+ This is similar to ‘u32_grapheme_breaks’, but it accepts any
+ characters which may not be represented in UTF-32, such as control
+ characters.
+

File: libunistring.info, Node: Grapheme cluster break property, Prev: Grapheme cluster breaks in a string, Up: unigbrk.h
@@ -2925,6 +2943,12 @@ property. More values may be added in the future.
-- Constant: int GBP_T
-- Constant: int GBP_LV
-- Constant: int GBP_LVT
+ -- Constant: int GBP_RI
+ -- Constant: int GBP_ZWJ
+ -- Constant: int GBP_EB
+ -- Constant: int GBP_EM
+ -- Constant: int GBP_GAZ
+ -- Constant: int GBP_EBG
The following function looks up the grapheme cluster break property
of a character.
@@ -2948,6 +2972,10 @@ the higher-level functions in the previous section are directly based.
described in the Unicode standard, because the standard says that
they are preferred.
+ Note that this function do not handle the case when three ore more
+ consecutive characters are needed to determine the boundary. Use
+ ‘uc_grapheme_breaks’ for such cases.
+

File: libunistring.info, Node: uniwbrk.h, Next: unilbrk.h, Prev: unigbrk.h, Up: Top
@@ -3016,6 +3044,15 @@ More values may be added in the future.
-- Constant: int WBP_MIDNUM
-- Constant: int WBP_NUMERIC
-- Constant: int WBP_EXTENDNUMLET
+ -- Constant: int WBP_RI
+ -- Constant: int WBP_DQ
+ -- Constant: int WBP_SQ
+ -- Constant: int WBP_HL
+ -- Constant: int WBP_ZWJ
+ -- Constant: int WBP_EB
+ -- Constant: int WBP_EM
+ -- Constant: int WBP_GAZ
+ -- Constant: int WBP_EBG
The following function looks up the word break property of a
character.
@@ -3235,6 +3272,11 @@ single Unicode character.
When a decomposition exists, ‘DECOMPOSITION[0..N-1]’ is filled and
N is returned. Otherwise -1 is returned.
+ Note: This function returns the (simple) “canonical decomposition”
+ of UC. If you want the “full canonical decomposition” of UC, that
+ is, the recursive application of “canonical decomposition”, use the
+ function ‘u*_normalize’ with argument ‘UNINORM_NFD’ instead.
+

File: libunistring.info, Node: Composition of characters, Next: Normalization of strings, Prev: Decomposition of characters, Up: uninorm.h
@@ -5642,11 +5684,11 @@ Index
* u16_endswith: Elementary string functions on NUL terminated strings.
(line 259)
* u16_grapheme_breaks: Grapheme cluster breaks in a string.
- (line 34)
+ (line 42)
* u16_grapheme_next: Grapheme cluster breaks in a string.
(line 11)
* u16_grapheme_prev: Grapheme cluster breaks in a string.
- (line 21)
+ (line 25)
* u16_is_cased: Case detection. (line 55)
* u16_is_casefolded: Case detection. (line 42)
* u16_is_lowercase: Case detection. (line 22)
@@ -5801,11 +5843,11 @@ Index
* u32_endswith: Elementary string functions on NUL terminated strings.
(line 261)
* u32_grapheme_breaks: Grapheme cluster breaks in a string.
- (line 36)
+ (line 44)
* u32_grapheme_next: Grapheme cluster breaks in a string.
(line 13)
* u32_grapheme_prev: Grapheme cluster breaks in a string.
- (line 23)
+ (line 27)
* u32_is_cased: Case detection. (line 57)
* u32_is_casefolded: Case detection. (line 44)
* u32_is_lowercase: Case detection. (line 24)
@@ -5960,11 +6002,11 @@ Index
* u8_endswith: Elementary string functions on NUL terminated strings.
(line 257)
* u8_grapheme_breaks: Grapheme cluster breaks in a string.
- (line 32)
+ (line 40)
* u8_grapheme_next: Grapheme cluster breaks in a string.
(line 9)
* u8_grapheme_prev: Grapheme cluster breaks in a string.
- (line 19)
+ (line 23)
* u8_is_cased: Case detection. (line 53)
* u8_is_casefolded: Case detection. (line 40)
* u8_is_lowercase: Case detection. (line 20)
@@ -6117,7 +6159,9 @@ Index
* uc_general_category_or: Object oriented API. (line 174)
* uc_general_category_t: Object oriented API. (line 6)
* uc_graphemeclusterbreak_property: Grapheme cluster break property.
- (line 31)
+ (line 37)
+* uc_grapheme_breaks: Grapheme cluster breaks in a string.
+ (line 48)
* uc_is_alnum: Classifications like in ISO C.
(line 13)
* uc_is_alpha: Classifications like in ISO C.
@@ -6138,7 +6182,7 @@ Index
* uc_is_graph: Classifications like in ISO C.
(line 30)
* uc_is_grapheme_break: Grapheme cluster break property.
- (line 38)
+ (line 44)
* uc_is_java_whitespace: ISO C and Java syntax.
(line 13)
* uc_is_lower: Classifications like in ISO C.
@@ -6357,7 +6401,7 @@ Index
* uc_toupper: Case mappings of characters.
(line 16)
* uc_width: uniwidth.h. (line 22)
-* uc_wordbreak_property: Word break property. (line 31)
+* uc_wordbreak_property: Word break property. (line 40)
* uint16_t: unitypes.h. (line 9)
* uint32_t: unitypes.h. (line 10)
* uint8_t: unitypes.h. (line 8)
@@ -6371,7 +6415,7 @@ Index
(line 77)
* ulc_fprintf: unistdio.h. (line 184)
* ulc_grapheme_breaks: Grapheme cluster breaks in a string.
- (line 38)
+ (line 46)
* ulc_possible_linebreaks: unilbrk.h. (line 48)
* ulc_snprintf: unistdio.h. (line 44)
* ulc_sprintf: unistdio.h. (line 42)
@@ -6496,36 +6540,36 @@ Node: Classifications like in ISO C112234
Node: uniwidth.h115046
Node: unigbrk.h117092
Node: Grapheme cluster breaks in a string118586
-Node: Grapheme cluster break property120691
-Node: uniwbrk.h122592
-Node: Word breaks in a string123130
-Node: Word break property124222
-Node: unilbrk.h125321
-Node: uninorm.h129617
-Node: Decomposition of characters130254
-Node: Composition of characters133731
-Node: Normalization of strings134444
-Node: Normalizing comparisons136521
-Node: Normalization of streams138923
-Node: unicase.h141048
-Node: Case mappings of characters141737
-Node: Case mappings of strings143886
-Node: Case mappings of substrings147237
-Node: Case insensitive comparison154159
-Node: Case detection159564
-Node: uniregex.h162878
-Node: Using the library163105
-Node: Installation163516
-Node: Compiler options164003
-Node: Include files165643
-Node: Autoconf macro166896
-Node: Reporting problems168536
-Node: More functionality169354
-Node: Licenses169797
-Node: GNU GPL172228
-Node: GNU LGPL209973
-Node: GNU FDL218456
-Node: Index243765
+Node: Grapheme cluster break property121521
+Node: uniwbrk.h123765
+Node: Word breaks in a string124303
+Node: Word break property125395
+Node: unilbrk.h126722
+Node: uninorm.h131018
+Node: Decomposition of characters131655
+Node: Composition of characters135436
+Node: Normalization of strings136149
+Node: Normalizing comparisons138226
+Node: Normalization of streams140628
+Node: unicase.h142753
+Node: Case mappings of characters143442
+Node: Case mappings of strings145591
+Node: Case mappings of substrings148942
+Node: Case insensitive comparison155864
+Node: Case detection161269
+Node: uniregex.h164583
+Node: Using the library164810
+Node: Installation165221
+Node: Compiler options165708
+Node: Include files167348
+Node: Autoconf macro168601
+Node: Reporting problems170241
+Node: More functionality171059
+Node: Licenses171502
+Node: GNU GPL173933
+Node: GNU LGPL211678
+Node: GNU FDL220161
+Node: Index245470

End Tag Table