From 5f2b09982312c98863eb9a8dfe2c608b81f58259 Mon Sep 17 00:00:00 2001 From: "Manuel A. Fernandez Montecelo" Date: Thu, 26 May 2016 16:48:15 +0100 Subject: Imported Upstream version 0.9.6 --- doc/libunistring_11.html | 204 +++++++++++++++++++++++------------------------ 1 file changed, 98 insertions(+), 106 deletions(-) (limited to 'doc/libunistring_11.html') diff --git a/doc/libunistring_11.html b/doc/libunistring_11.html index 7fd2dc3..1e95b7a 100644 --- a/doc/libunistring_11.html +++ b/doc/libunistring_11.html @@ -1,6 +1,6 @@ - + -GNU libunistring: 11. Line breaking <unilbrk.h> +GNU libunistring: 11. Word breaks in strings <uniwbrk.h> - - + + @@ -42,8 +42,8 @@ ul.toc {list-style: none} - - + + @@ -51,134 +51,126 @@ ul.toc {list-style: none} - +
[ << ][ >> ]
[ << ][ >> ]         [Top] [Contents][Index][Index] [ ? ]

- - -

11. Line breaking <unilbrk.h>

+ + +

11. Word breaks in strings <uniwbrk.h>

This include file declares functions for determining where in a string -line breaks could or should be introduced, in order to make the displayed -string fit into a column of given width. +“words” start and end. Here “words” are not necessarily the same as +entities that can be looked up in dictionaries, but rather groups of +consecutive characters that should not be split by text processing +operations.

-

These functions are locale dependent. The encoding argument identifies -the encoding (e.g. "ISO-8859-2" for Polish). -

-

The following enumerated values indicate whether, at a given position, a line -break is possible or not. Given an string s as an array -s[0..n-1] and a position i, the values have the -following meanings: + +


+ + +

11.1 Word breaks in a string

+ +

The following functions determine the word breaks in a string.

-
Constant: int UC_BREAK_MANDATORY - +
Function: void u8_wordbreaks (const uint8_t *s, size_t n, char *p) +
-

This value indicates that s[i] is a line break character. -

- -
-
Constant: int UC_BREAK_POSSIBLE - +
Function: void u16_wordbreaks (const uint16_t *s, size_t n, char *p) +
-

This value indicates that a line break may be inserted between -s[i-1] and s[i]. -

- -
-
Constant: int UC_BREAK_HYPHENATION - +
Function: void u32_wordbreaks (const uint32_t *s, size_t n, char *p) +
-

This value indicates that a hyphen and a line break may be inserted between -s[i-1] and s[i]. But beware of language -dependent hyphenation rules. -

- -
-
Constant: int UC_BREAK_PROHIBITED - +
Function: void ulc_wordbreaks (const char *s, size_t n, char *p) +
-

This value indicates that s[i-1] and s[i] -must not be separated. +

Determines the word break points in s, an array of n units, and +stores the result at p[0..n-1]. +

+
p[i] = 1
+

means that there is a word boundary between s[i-1] and +s[i]. +

+
p[i] = 0
+

means that s[i-1] and s[i] must not be separated. +

+
+

p[0] is always set to 0. If an application wants to consider a +word break to be present at the beginning of the string (before +s[0]) or at the end of the string (after +s[0..n-1]), it has to treat these cases explicitly.

-
-
Constant: int UC_BREAK_UNDEFINED - -
-

This value is not used as a return value; rather, in the overriding argument of -the u*_width_linebreaks functions, it indicates the absence of an -override. -

+
+ + +

11.2 Word break property

-

The following functions determine the positions at which line breaks are -possible. +

This is a more low-level API. The word break property is a property defined +in Unicode Standard Annex #29, section “Word Boundaries”, see +http://www.unicode.org/reports/tr29/#Word_Boundaries. It is +used for determining the word breaks in a string. +

+

The following are the possible values of the word break property. More values +may be added in the future.

-
Function: void u8_possible_linebreaks (const uint8_t *s, size_t n, const char *encoding, char *p) - +
Constant: int WBP_OTHER +
-
Function: void u16_possible_linebreaks (const uint16_t *s, size_t n, const char *encoding, char *p) - +
Constant: int WBP_CR +
-
Function: void u32_possible_linebreaks (const uint32_t *s, size_t n, const char *encoding, char *p) - +
Constant: int WBP_LF +
-
Function: void ulc_possible_linebreaks (const char *s, size_t n, const char *encoding, char *p) - +
Constant: int WBP_NEWLINE +
-

Determines the line break points in s, and stores the result at -p[0..n-1]. Every p[i] is assigned one of -the values UC_BREAK_MANDATORY, UC_BREAK_POSSIBLE, -UC_BREAK_HYPHENATION, UC_BREAK_PROHIBITED. -

- -

The following functions determine where line breaks should be inserted so that -each line fits in a given width, when output to a device that uses -non-proportional fonts. -

-
-
Function: int u8_width_linebreaks (const uint8_t *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) - +
Constant: int WBP_EXTEND +
-
Function: int u16_width_linebreaks (const uint16_t *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) - +
Constant: int WBP_FORMAT +
-
Function: int u32_width_linebreaks (const uint32_t *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) - +
Constant: int WBP_KATAKANA +
-
Function: int ulc_width_linebreaks (const char *s, size_t n, int width, int start_column, int at_end_columns, const char *override, const char *encoding, char *p) - +
Constant: int WBP_ALETTER +
-

Chooses the best line breaks, assuming that every character occupies a width -given by the uc_width function (see Display width <uniwidth.h>). -

-

The string is s[0..n-1]. -

-

The maximum number of columns per line is given as width. -The starting column of the string is given as start_column. -If the algorithm shall keep room after the last piece, this amount of room can -be given as at_end_columns. -

-

override is an optional override; if -override[i] != UC_BREAK_UNDEFINED, -override[i] takes precedence over p[i] -as returned by the u*_possible_linebreaks function. -

-

The given encoding is used for disambiguating widths in uc_width. +

Constant: int WBP_MIDNUMLET + +
+
Constant: int WBP_MIDLETTER + +
+
Constant: int WBP_MIDNUM + +
+
Constant: int WBP_NUMERIC + +
+
Constant: int WBP_EXTENDNUMLET + +
+
+ +

The following function looks up the word break property of a character.

-

Returns the column after the end of the string, and stores the result at -p[0..n-1]. Every p[i] is assigned one of -the values UC_BREAK_MANDATORY, UC_BREAK_POSSIBLE, -UC_BREAK_HYPHENATION, UC_BREAK_PROHIBITED. Here the value -UC_BREAK_POSSIBLE indicates that a line break should be inserted. +

+
Function: int uc_wordbreak_property (ucs4_t uc) + +
+

Returns the Word_Break property of a Unicode character.


- - + + @@ -186,12 +178,12 @@ the values UC_BREAK_MANDATORY, UC_BREAK_POSSIBLE, - +
[ << ][ >> ]
[ << ][ >> ]         [Top] [Contents][Index][Index] [ ? ]

- This document was generated by Bruno Haible on March, 30 2010 using texi2html 1.78a. + This document was generated by Daiki Ueno on July, 8 2015 using texi2html 1.78a.
-- cgit v1.2.3