From fa095a4504cbe668e4244547e2c141597bea4ecf Mon Sep 17 00:00:00 2001 From: Andreas Rottmann Date: Mon, 14 Sep 2009 12:32:44 +0200 Subject: Imported Upstream version 0.9.1 --- doc/libunistring_4.html | 864 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 864 insertions(+) create mode 100644 doc/libunistring_4.html (limited to 'doc/libunistring_4.html') diff --git a/doc/libunistring_4.html b/doc/libunistring_4.html new file mode 100644 index 0000000..60992cd --- /dev/null +++ b/doc/libunistring_4.html @@ -0,0 +1,864 @@ + + + + + +GNU libunistring: 4. Elementary Unicode string functions <unistr.h> + + + + + + + + + + + + + + + + + + + + + + + + + + +
[ << ][ >> ]           [Top][Contents][Index][ ? ]
+ +
+ + +

4. Elementary Unicode string functions <unistr.h>

+ +

This include file declares elementary functions for Unicode strings. It is +essentially the equivalent of what <string.h> is for C strings. +

+ +
+ + +

4.1 Elementary string checks

+ +

The following function is available to verify the integrity of a Unicode string. +

+
+
Function: const uint8_t * u8_check (const uint8_t *s, size_t n) + +
+
Function: const uint16_t * u16_check (const uint16_t *s, size_t n) + +
+
Function: const uint32_t * u32_check (const uint32_t *s, size_t n) + +
+

This function checks whether a Unicode string is well-formed. +It returns NULL if valid, or a pointer to the first invalid unit otherwise. +

+ +
+ + +

4.2 Elementary string conversions

+ +

The following functions perform conversions between the different forms of Unicode strings. +

+
+
Function: uint16_t * u8_to_u16 (const uint8_t *s, size_t n, uint16_t *resultbuf, size_t *lengthp) + +
+

Converts an UTF-8 string to an UTF-16 string. +

+ +
+
Function: uint32_t * u8_to_u32 (const uint8_t *s, size_t n, uint32_t *resultbuf, size_t *lengthp) + +
+

Converts an UTF-8 string to an UTF-32 string. +

+ +
+
Function: uint8_t * u16_to_u8 (const uint16_t *s, size_t n, uint8_t *resultbuf, size_t *lengthp) + +
+

Converts an UTF-16 string to an UTF-8 string. +

+ +
+
Function: uint32_t * u16_to_u32 (const uint16_t *s, size_t n, uint32_t *resultbuf, size_t *lengthp) + +
+

Converts an UTF-16 string to an UTF-32 string. +

+ +
+
Function: uint8_t * u32_to_u8 (const uint32_t *s, size_t n, uint8_t *resultbuf, size_t *lengthp) + +
+

Converts an UTF-32 string to an UTF-8 string. +

+ +
+
Function: uint16_t * u32_to_u16 (const uint32_t *s, size_t n, uint16_t *resultbuf, size_t *lengthp) + +
+

Converts an UTF-32 string to an UTF-16 string. +

+ +
+ + +

4.3 Elementary string functions

+ +

The following functions inspect and return details about the first character +in a Unicode string. +

+
+
Function: int u8_mblen (const uint8_t *s, size_t n) + +
+
Function: int u16_mblen (const uint16_t *s, size_t n) + +
+
Function: int u32_mblen (const uint32_t *s, size_t n) + +
+

Returns the length (number of units) of the first character in s, which +is no longer than n. Returns 0 if it is the NUL character. Returns -1 +upon failure. +

+

This function is similar to mblen, except that it operates on a +Unicode string and that s must not be NULL. +

+ +
+
Function: int u8_mbtouc_unsafe (ucs4_t *puc, const uint8_t *s, size_t n) + +
+
Function: int u16_mbtouc_unsafe (ucs4_t *puc, const uint16_t *s, size_t n) + +
+
Function: int u32_mbtouc_unsafe (ucs4_t *puc, const uint32_t *s, size_t n) + +
+

Returns the length (number of units) of the first character in s, +putting its ucs4_t representation in *puc. Upon failure, +*puc is set to 0xfffd, and an appropriate number of units +is returned. +

+

The number of available units, n, must be > 0. +

+

This function is similar to mbtowc, except that it operates on a +Unicode string, puc and s must not be NULL, n must be > 0, +and the NUL character is not treated specially. +

+ +
+
Function: int u8_mbtouc (ucs4_t *puc, const uint8_t *s, size_t n) + +
+
Function: int u16_mbtouc (ucs4_t *puc, const uint16_t *s, size_t n) + +
+
Function: int u32_mbtouc (ucs4_t *puc, const uint32_t *s, size_t n) + +
+

This function is like u8_mbtouc_unsafe, except that it will detect an +invalid UTF-8 character, even if the library is compiled without +‘--enable-safety’. +

+ +
+
Function: int u8_mbtoucr (ucs4_t *puc, const uint8_t *s, size_t n) + +
+
Function: int u16_mbtoucr (ucs4_t *puc, const uint16_t *s, size_t n) + +
+
Function: int u32_mbtoucr (ucs4_t *puc, const uint32_t *s, size_t n) + +
+

Returns the length (number of units) of the first character in s, +putting its ucs4_t representation in *puc. Upon failure, +*puc is set to 0xfffd, and -1 is returned for an invalid +sequence of units, -2 is returned for an incomplete sequence of units. +

+

The number of available units, n, must be > 0. +

+

This function is similar to u8_mbtouc, except that the return value +gives more details about the failure, similar to mbrtowc. +

+ +

The following function stores a Unicode character as a Unicode string in +memory. +

+
+
Function: int u8_uctomb (uint8_t *s, ucs4_t uc, int n) + +
+
Function: int u16_uctomb (uint16_t *s, ucs4_t uc, int n) + +
+
Function: int u32_uctomb (uint32_t *s, ucs4_t uc, int n) + +
+

Puts the multibyte character represented by uc in s, returning its +length. Returns -1 upon failure, -2 if the number of available units, n, +is too small. The latter case cannot occur if n >= 6/2/1, respectively. +

+

This function is similar to wctomb, except that it operates on a +Unicode strings, s must not be NULL, and the argument n must be +specified. +

+ + +

The following functions copy Unicode strings in memory. +

+
+
Function: uint8_t * u8_cpy (uint8_t *dest, const uint8_t *src, size_t n) + +
+
Function: uint16_t * u16_cpy (uint16_t *dest, const uint16_t *src, size_t n) + +
+
Function: uint32_t * u32_cpy (uint32_t *dest, const uint32_t *src, size_t n) + +
+

Copies n units from src to dest. +

+

This function is similar to memcpy, except that it operates on +Unicode strings. +

+ +
+
Function: uint8_t * u8_move (uint8_t *dest, const uint8_t *src, size_t n) + +
+
Function: uint16_t * u16_move (uint16_t *dest, const uint16_t *src, size_t n) + +
+
Function: uint32_t * u32_move (uint32_t *dest, const uint32_t *src, size_t n) + +
+

Copies n units from src to dest, guaranteeing correct +behavior for overlapping memory areas. +

+

This function is similar to memmove, except that it operates on +Unicode strings. +

+ +

The following function fills a Unicode string. +

+
+
Function: uint8_t * u8_set (uint8_t *s, ucs4_t uc, size_t n) + +
+
Function: uint16_t * u16_set (uint16_t *s, ucs4_t uc, size_t n) + +
+
Function: uint32_t * u32_set (uint32_t *s, ucs4_t uc, size_t n) + +
+

Sets the first n characters of s to uc. uc should be +a character that occupies only 1 unit. +

+

This function is similar to memset, except that it operates on +Unicode strings. +

+ + +

The following function compares two Unicode strings of the same length. +

+
+
Function: int u8_cmp (const uint8_t *s1, const uint8_t *s2, size_t n) + +
+
Function: int u16_cmp (const uint16_t *s1, const uint16_t *s2, size_t n) + +
+
Function: int u32_cmp (const uint32_t *s1, const uint32_t *s2, size_t n) + +
+

Compares s1 and s2, each of length n, lexicographically. +Returns a negative value if s1 compares smaller than s2, +a positive value if s1 compares larger than s2, or 0 if +they compare equal. +

+

This function is similar to memcmp, except that it operates on +Unicode strings. +

+ +

The following function compares two Unicode strings of possibly different +lengths. +

+
+
Function: int u8_cmp2 (const uint8_t *s1, size_t n1, const uint8_t *s2, size_t n2) + +
+
Function: int u16_cmp2 (const uint16_t *s1, size_t n1, const uint16_t *s2, size_t n2) + +
+
Function: int u32_cmp2 (const uint32_t *s1, size_t n1, const uint32_t *s2, size_t n2) + +
+

Compares s1 and s2, lexicographically. +Returns a negative value if s1 compares smaller than s2, +a positive value if s1 compares larger than s2, or 0 if +they compare equal. +

+

This function is similar to the gnulib function memcmp2, except that it +operates on Unicode strings. +

+ + +

The following function searches for a given Unicode character. +

+
+
Function: uint8_t * u8_chr (const uint8_t *s, size_t n, ucs4_t uc) + +
+
Function: uint16_t * u16_chr (const uint16_t *s, size_t n, ucs4_t uc) + +
+
Function: uint32_t * u32_chr (const uint32_t *s, size_t n, ucs4_t uc) + +
+

Searches the string at s for uc. Returns a pointer to the first +occurrence of uc in s, or NULL if uc does not occur in +s. +

+

This function is similar to memchr, except that it operates on +Unicode strings. +

+ + +

The following function counts the number of Unicode characters. +

+
+
Function: size_t u8_mbsnlen (const uint8_t *s, size_t n) + +
+
Function: size_t u16_mbsnlen (const uint16_t *s, size_t n) + +
+
Function: size_t u32_mbsnlen (const uint32_t *s, size_t n) + +
+

Counts and returns the number of Unicode characters in the n units +from s. +

+

This function is similar to the gnulib function mbsnlen, except that +it operates on Unicode strings. +

+ +
+ + +

4.4 Elementary string functions with memory allocation

+ +

The following function copies a Unicode string. +

+
+
Function: uint8_t * u8_cpy_alloc (const uint8_t *s, size_t n) + +
+
Function: uint16_t * u16_cpy_alloc (const uint16_t *s, size_t n) + +
+
Function: uint32_t * u32_cpy_alloc (const uint32_t *s, size_t n) + +
+

Makes a freshly allocated copy of s, of length n. +

+ +
+ + +

4.5 Elementary string functions on NUL terminated strings

+ +

The following functions inspect and return details about the first character +in a Unicode string. +

+
+
Function: int u8_strmblen (const uint8_t *s) + +
+
Function: int u16_strmblen (const uint16_t *s) + +
+
Function: int u32_strmblen (const uint32_t *s) + +
+

Returns the length (number of units) of the first character in s. +Returns 0 if it is the NUL character. Returns -1 upon failure. +

+ + +
+
Function: int u8_strmbtouc (ucs4_t *puc, const uint8_t *s) + +
+
Function: int u16_strmbtouc (ucs4_t *puc, const uint16_t *s) + +
+
Function: int u32_strmbtouc (ucs4_t *puc, const uint32_t *s) + +
+

Returns the length (number of units) of the first character in s, +putting its ucs4_t representation in *puc. Returns 0 +if it is the NUL character. Returns -1 upon failure. +

+ +
+
Function: const uint8_t * u8_next (ucs4_t *puc, const uint8_t *s) + +
+
Function: const uint16_t * u16_next (ucs4_t *puc, const uint16_t *s) + +
+
Function: const uint32_t * u32_next (ucs4_t *puc, const uint32_t *s) + +
+

Forward iteration step. Advances the pointer past the next character, +or returns NULL if the end of the string has been reached. Puts the +character's ucs4_t representation in *puc. +

+ +

The following function inspects and returns details about the previous +character in a Unicode string. +

+
+
Function: const uint8_t * u8_prev (ucs4_t *puc, const uint8_t *s, const uint8_t *start) + +
+
Function: const uint16_t * u16_prev (ucs4_t *puc, const uint16_t *s, const uint16_t *start) + +
+
Function: const uint32_t * u32_prev (ucs4_t *puc, const uint32_t *s, const uint32_t *start) + +
+

Backward iteration step. Advances the pointer to point to the previous +character, or returns NULL if the beginning of the string had been reached. +Puts the character's ucs4_t representation in *puc. +

+ +

The following functions determine the length of a Unicode string. +

+
+
Function: size_t u8_strlen (const uint8_t *s) + +
+
Function: size_t u16_strlen (const uint16_t *s) + +
+
Function: size_t u32_strlen (const uint32_t *s) + +
+

Returns the number of units in s. +

+

This function is similar to strlen and wcslen, except +that it operates on Unicode strings. +

+ +
+
Function: size_t u8_strnlen (const uint8_t *s, size_t maxlen) + +
+
Function: size_t u16_strnlen (const uint16_t *s, size_t maxlen) + +
+
Function: size_t u32_strnlen (const uint32_t *s, size_t maxlen) + +
+

Returns the number of units in s, but at most maxlen. +

+

This function is similar to strnlen and wcsnlen, except +that it operates on Unicode strings. +

+ + +

The following functions copy portions of Unicode strings in memory. +

+
+
Function: uint8_t * u8_strcpy (uint8_t *dest, const uint8_t *src) + +
+
Function: uint16_t * u16_strcpy (uint16_t *dest, const uint16_t *src) + +
+
Function: uint32_t * u32_strcpy (uint32_t *dest, const uint32_t *src) + +
+

Copies src to dest. +

+

This function is similar to strcpy and wcscpy, except +that it operates on Unicode strings. +

+ +
+
Function: uint8_t * u8_stpcpy (uint8_t *dest, const uint8_t *src) + +
+
Function: uint16_t * u16_stpcpy (uint16_t *dest, const uint16_t *src) + +
+
Function: uint32_t * u32_stpcpy (uint32_t *dest, const uint32_t *src) + +
+

Copies src to dest, returning the address of the terminating NUL +in dest. +

+

This function is similar to stpcpy, except that it operates on +Unicode strings. +

+ +
+
Function: uint8_t * u8_strncpy (uint8_t *dest, const uint8_t *src, size_t n) + +
+
Function: uint16_t * u16_strncpy (uint16_t *dest, const uint16_t *src, size_t n) + +
+
Function: uint32_t * u32_strncpy (uint32_t *dest, const uint32_t *src, size_t n) + +
+

Copies no more than n units of src to dest. +

+

This function is similar to strncpy and wcsncpy, except +that it operates on Unicode strings. +

+ +
+
Function: uint8_t * u8_stpncpy (uint8_t *dest, const uint8_t *src, size_t n) + +
+
Function: uint16_t * u16_stpncpy (uint16_t *dest, const uint16_t *src, size_t n) + +
+
Function: uint32_t * u32_stpncpy (uint32_t *dest, const uint32_t *src, size_t n) + +
+

Copies no more than n units of src to dest, returning the +address of the last unit written into dest. +

+

This function is similar to stpncpy, except that it operates on +Unicode strings. +

+ +
+
Function: uint8_t * u8_strcat (uint8_t *dest, const uint8_t *src) + +
+
Function: uint16_t * u16_strcat (uint16_t *dest, const uint16_t *src) + +
+
Function: uint32_t * u32_strcat (uint32_t *dest, const uint32_t *src) + +
+

Appends src onto dest. +

+

This function is similar to strcat and wcscat, except +that it operates on Unicode strings. +

+ +
+
Function: uint8_t * u8_strncat (uint8_t *dest, const uint8_t *src, size_t n) + +
+
Function: uint16_t * u16_strncat (uint16_t *dest, const uint16_t *src, size_t n) + +
+
Function: uint32_t * u32_strncat (uint32_t *dest, const uint32_t *src, size_t n) + +
+

Appends no more than n units of src onto dest. +

+

This function is similar to strncat and wcsncat, except +that it operates on Unicode strings. +

+ + +

The following functions compare two Unicode strings. +

+
+
Function: int u8_strcmp (const uint8_t *s1, const uint8_t *s2) + +
+
Function: int u16_strcmp (const uint16_t *s1, const uint16_t *s2) + +
+
Function: int u32_strcmp (const uint32_t *s1, const uint32_t *s2) + +
+

Compares s1 and s2, lexicographically. +Returns a negative value if s1 compares smaller than s2, +a positive value if s1 compares larger than s2, or 0 if +they compare equal. +

+

This function is similar to strcmp and wcscmp, except +that it operates on Unicode strings. +

+ + +
+
Function: int u8_strcoll (const uint8_t *s1, const uint8_t *s2) + +
+
Function: int u16_strcoll (const uint16_t *s1, const uint16_t *s2) + +
+
Function: int u32_strcoll (const uint32_t *s1, const uint32_t *s2) + +
+

Compares s1 and s2 using the collation rules of the current +locale. +Returns -1 if s1 < s2, 0 if s1 = s2, 1 if +s1 > s2. Upon failure, sets errno and returns any value. +

+

This function is similar to strcoll and wcscoll, except +that it operates on Unicode strings. +

+

Note that this function may consider different canonical normalizations +of the same string as having a large distance. It is therefore better to +use the function u8_normcoll instead of this one; see Normalization forms (composition and decomposition) <uninorm.h>. +

+ +
+
Function: int u8_strncmp (const uint8_t *s1, const uint8_t *s2, size_t n) + +
+
Function: int u16_strncmp (const uint16_t *s1, const uint16_t *s2, size_t n) + +
+
Function: int u32_strncmp (const uint32_t *s1, const uint32_t *s2, size_t n) + +
+

Compares no more than n units of s1 and s2. +

+

This function is similar to strncmp and wcsncmp, except +that it operates on Unicode strings. +

+ + +

The following function allocates a duplicate of a Unicode string. +

+
+
Function: uint8_t * u8_strdup (const uint8_t *s) + +
+
Function: uint16_t * u16_strdup (const uint16_t *s) + +
+
Function: uint32_t * u32_strdup (const uint32_t *s) + +
+

Duplicates s, returning an identical malloc'd string. +

+

This function is similar to strdup and wcsdup, except +that it operates on Unicode strings. +

+ + +

The following functions search for a given Unicode character. +

+
+
Function: uint8_t * u8_strchr (const uint8_t *str, ucs4_t uc) + +
+
Function: uint16_t * u16_strchr (const uint16_t *str, ucs4_t uc) + +
+
Function: uint32_t * u32_strchr (const uint32_t *str, ucs4_t uc) + +
+

Finds the first occurrence of uc in str. +

+

This function is similar to strchr and wcschr, except +that it operates on Unicode strings. +

+ +
+
Function: uint8_t * u8_strrchr (const uint8_t *str, ucs4_t uc) + +
+
Function: uint16_t * u16_strrchr (const uint16_t *str, ucs4_t uc) + +
+
Function: uint32_t * u32_strrchr (const uint32_t *str, ucs4_t uc) + +
+

Finds the last occurrence of uc in str. +

+

This function is similar to strrchr and wcsrchr, except +that it operates on Unicode strings. +

+ +

The following functions search for the first occurrence of some Unicode +character in or outside a given set of Unicode characters. +

+
+
Function: size_t u8_strcspn (const uint8_t *str, const uint8_t *reject) + +
+
Function: size_t u16_strcspn (const uint16_t *str, const uint16_t *reject) + +
+
Function: size_t u32_strcspn (const uint32_t *str, const uint32_t *reject) + +
+

Returns the length of the initial segment of str which consists entirely +of Unicode characters not in reject. +

+

This function is similar to strcspn and wcscspn, except +that it operates on Unicode strings. +

+ +
+
Function: size_t u8_strspn (const uint8_t *str, const uint8_t *accept) + +
+
Function: size_t u16_strspn (const uint16_t *str, const uint16_t *accept) + +
+
Function: size_t u32_strspn (const uint32_t *str, const uint32_t *accept) + +
+

Returns the length of the initial segment of str which consists entirely +of Unicode characters in accept. +

+

This function is similar to strspn and wcsspn, except +that it operates on Unicode strings. +

+ +
+
Function: uint8_t * u8_strpbrk (const uint8_t *str, const uint8_t *accept) + +
+
Function: uint16_t * u16_strpbrk (const uint16_t *str, const uint16_t *accept) + +
+
Function: uint32_t * u32_strpbrk (const uint32_t *str, const uint32_t *accept) + +
+

Finds the first occurrence in str of any character in accept. +

+

This function is similar to strpbrk and wcspbrk, except +that it operates on Unicode strings. +

+ + +

The following functions search whether a given Unicode string is a substring +of another Unicode string. +

+
+
Function: uint8_t * u8_strstr (const uint8_t *haystack, const uint8_t *needle) + +
+
Function: uint16_t * u16_strstr (const uint16_t *haystack, const uint16_t *needle) + +
+
Function: uint32_t * u32_strstr (const uint32_t *haystack, const uint32_t *needle) + +
+

Finds the first occurrence of needle in haystack. +

+

This function is similar to strstr and wcsstr, except +that it operates on Unicode strings. +

+ +
+
Function: bool u8_startswith (const uint8_t *str, const uint8_t *prefix) + +
+
Function: bool u16_startswith (const uint16_t *str, const uint16_t *prefix) + +
+
Function: bool u32_startswith (const uint32_t *str, const uint32_t *prefix) + +
+

Tests whether str starts with prefix. +

+ +
+
Function: bool u8_endswith (const uint8_t *str, const uint8_t *suffix) + +
+
Function: bool u16_endswith (const uint16_t *str, const uint16_t *suffix) + +
+
Function: bool u32_endswith (const uint32_t *str, const uint32_t *suffix) + +
+

Tests whether str ends with suffix. +

+ +

The following function does one step in tokenizing a Unicode string. +

+
+
Function: uint8_t * u8_strtok (uint8_t *str, const uint8_t *delim, uint8_t **ptr) + +
+
Function: uint16_t * u16_strtok (uint16_t *str, const uint16_t *delim, uint16_t **ptr) + +
+
Function: uint32_t * u32_strtok (uint32_t *str, const uint32_t *delim, uint32_t **ptr) + +
+

Divides str into tokens separated by characters in delim. +

+

This function is similar to strtok_r and wcstok, except +that it operates on Unicode strings. Its interface is actually more similar to +wcstok than to strtok. +

+
+ + + + + + + + + + + + +
[ << ][ >> ]           [Top][Contents][Index][ ? ]
+

+ + This document was generated by Bruno Haible on July, 1 2009 using texi2html 1.78a. + +
+ +

+ + -- cgit v1.2.3