[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
<uniwbrk.h>
This include file declares functions for determining where in a string “words” start and end. Here “words” are not necessarily the same as entities that can be looked up in dictionaries, but rather groups of consecutive characters that should not be split by text processing operations.
The following functions determine the word breaks in a string.
Determines the word break points in s, an array of n units, and
stores the result at p[0..n-1]
.
p[i] = 1
means that there is a word boundary between s[i-1]
and
s[i]
.
p[i] = 0
means that s[i-1]
and s[i]
must not be separated.
p[0]
is always set to 0. If an application wants to consider a
word break to be present at the beginning of the string (before
s[0]
) or at the end of the string (after
s[0..n-1]
), it has to treat these cases explicitly.
This is a more low-level API. The word break property is a property defined in Unicode Standard Annex #29, section “Word Boundaries”, see https://www.unicode.org/reports/tr29/#Word_Boundaries. It is used for determining the word breaks in a string.
The following are the possible values of the word break property. More values may be added in the future.
The following function looks up the word break property of a character.
Returns the Word_Break property of a Unicode character.
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated by Bruno Haible on January, 2 2022 using texi2html 1.78a.