summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
Diffstat (limited to 'doc')
-rw-r--r--doc/API25
-rw-r--r--doc/API.ja24
-rw-r--r--doc/FAQ27
-rw-r--r--doc/FAQ.ja114
-rw-r--r--doc/RE16
-rw-r--r--doc/RE.ja16
-rw-r--r--doc/UNICODE_PROPERTIES698
7 files changed, 749 insertions, 171 deletions
diff --git a/doc/API b/doc/API
index f3b8875..9904a06 100644
--- a/doc/API
+++ b/doc/API
@@ -1,13 +1,19 @@
-Oniguruma API Version 5.9.2 2008/02/19
+Oniguruma API Version 6.0.0 2016/05/06
#include <oniguruma.h>
-# int onig_init(void)
+# int onig_initialize(OnigEncoding use_encodings[], int num_encodings)
Initialize library.
- You don't have to call it explicitly, because it is called in onig_new().
+ You have to call it explicitly.
+
+ * onig_init() is deprecated.
+
+ arguments
+ 1 use_encodings: array of encodings used in application.
+ 2 num_encodings: number of encodings.
# int onig_error_code_to_str(UChar* err_buf, int err_code, ...)
@@ -585,6 +591,19 @@ Oniguruma API Version 5.9.2 2008/02/19
normal return: ONIG_NORMAL
+# int onig_unicode_define_user_property(const char* name, OnigCodePoint* ranges))
+
+ Define new Unicode property.
+ (This function is not thread safe.)
+
+ arguments
+ 1 name: property name (ASCII only. character ' ', '-', '_' are ignored.)
+ 2 ranges: property code point ranges
+ (first element is number of ranges.)
+
+ normal return: ONIG_NORMAL
+
+
# int onig_end(void)
The use of this library is finished.
diff --git a/doc/API.ja b/doc/API.ja
index f681fa5..ac8cc6a 100644
--- a/doc/API.ja
+++ b/doc/API.ja
@@ -1,13 +1,18 @@
-鬼車インターフェース Version 5.9.2 2008/02/19
+鬼車インターフェース Version 6.0.0 2016/05/06
#include <oniguruma.h>
-# int onig_init(void)
+# int onig_initialize(OnigEncoding use_encodings[], int num_encodings)
ライブラリの初期化
+ 最初に呼び出す必要がある。
- onig_new()の中で呼び出されるので、この関数を明示的に呼び出さなくてもよい。
+ * onig_init() は廃止
+
+ 引数
+ 1 use_encodings: 使用する文字エンコーディングの配列
+ 2 num_encodings: 文字エンコーディングの数
# int onig_error_code_to_str(UChar* err_buf, int err_code, ...)
@@ -593,6 +598,19 @@
正常終了戻り値: ONIG_NORMAL
+# int onig_unicode_define_user_property(const char* name, OnigCodePoint* ranges))
+
+ 新しいUnicodeプロパティを定義する。
+ (この関数はスレッドセーフではない)
+
+ 引数
+ 1 name: プロパティ名 (ASCIIコードのみ。 文字 ' ', '-', '_' は無視される。)
+ 2 ranges: プロパティコードポイント範囲
+ (最初の要素は範囲の数)
+
+ 正常終了戻り値: ONIG_NORMAL
+
+
# int onig_end(void)
ライブラリの使用を終了する。
diff --git a/doc/FAQ b/doc/FAQ
index 46a3e0e..c00f030 100644
--- a/doc/FAQ
+++ b/doc/FAQ
@@ -5,32 +5,7 @@ FAQ 2006/11/14
You can execute longest match by using ONIG_OPTION_FIND_LONGEST option
in onig_new().
-
-2. Thread safe
-
- In order to make thread safe, which of (A) or (B) must be done.
-
- (A) Oniguruma Layer
-
- Define the macro below in oniguruma/regint.h.
-
- USE_MULTI_THREAD_SYSTEM
- THREAD_ATOMIC_START
- THREAD_ATOMIC_END
- THREAD_PASS
-
- THREAD_SYSTEM_INIT
- THREAD_SYSTEM_END
-
-
- (B) Application Layer
-
- The plural threads should not do simultaneously that making
- new regexp objects or re-compiling objects or freeing objects,
- even if these objects are differ.
-
-
-3. Mailing list
+2. Mailing list
There is no mailing list about Oniguruma.
diff --git a/doc/FAQ.ja b/doc/FAQ.ja
index 1d65f9f..b8f4aa9 100644
--- a/doc/FAQ.ja
+++ b/doc/FAQ.ja
@@ -1,4 +1,4 @@
-FAQ 2007/07/23
+FAQ 2016/04/06
1. 最長マッチ
@@ -6,36 +6,7 @@ FAQ 2007/07/23
を使用すれば最長マッチになる。
-2. スレッドセーフ
-
- スレッドセーフにするには、以下の(A)と(B)のどちらかを行なえば
- よい。
-
- (A) Oniguruma Layer
-
- oniguruma/regint.hの中の以下のマクロを定義する。
-
- USE_MULTI_THREAD_SYSTEM
- THREAD_ATOMIC_START
- THREAD_ATOMIC_END
- THREAD_PASS
-
- 何らかの初期化/終了処理が必要であれば、以下のマクロに定義する。
- THREAD_SYSTEM_INIT
- THREAD_SYSTEM_END
-
-
- (B) Application Layer
-
- 同時に複数のスレッドが、正規表現オブジェクトを作成する、
- または解放する、ことを行なってはならない。
- それらのオブジェクトが全く別のものであっても。
-
- もう少し詳しい説明は、このドキュメントの中の
- "スレッドセーフに関する補足"に書いておいた。
-
-
-3. CR + LF
+2. CR + LF
DOSの改行(CR(0x0c) + LF(0x0a)の連続)
@@ -44,87 +15,8 @@ FAQ 2007/07/23
/* #define USE_CRNL_AS_LINE_TERMINATOR */
-4. メーリングリスト
+3. メーリングリスト
鬼車に関するメーリングリストは存在しない。
//END
-
-
-
-スレッドセーフに関する補足
-
-スレッドセーフにするには、個別のアプリケーションの中で行うか、
-Onigurumaライブラリの中で行うか、どちらかを選ぶことができます。
-(Onigurumaを使用する側で対処するか、Onigurumaに対処させるか
-どちらか片方で行う必要があるということです。)
-
-これらの方法について、以下(A)と(B)で説明します。
-
-マルチスレッドAPIは、それぞれのプラットフォームによっても
-異なりますので、以下の説明の中で具体的に何を呼ぶのかを
-書くことは無理です。実際に使用されるマルチスレッドAPIで、
-対応する機能のものを指定してください。
-
-(A) Onigurumaの中で対応する場合
-
-oniguruma/regint.hの中で以下のマクロを定義して再コンパイルしてください。
-
-USE_MULTI_THREAD_SYSTEM
-
- 単に有効にすればよいです。
-
-THREAD_ATOMIC_START
-THREAD_ATOMIC_END
-
- THREAD_ATOMIC_STARTからTHREAD_ATOMIC_ENDで囲まれた
- プログラムのコード部分をあるスレッドが実行中に、他の
- スレッドに実行権が移動しないことを保障するものに定義
- してください。
- (名前の通り、囲まれたコード部分をスレッドアトミックに
- するという意味)
-
-THREAD_PASS
-
- これを実行したスレッドから、他のスレッドに実行権を委譲
- するものに定義をしてください。(再スケジュールを呼び出す
- という意味)
- 対応する機能が全くなければ、空定義にしてください。
-
-(参考例)
-Rubyの場合を例にすると、
-Rubyは自分自身で独自のスレッド機能を実装しています。
-その機能を使用すると、以下のように定義すればよいことに
-なります。
-
-#define USE_MULTI_THREAD_SYSTEM
-#define THREAD_SYSTEM_INIT
-#define THREAD_SYSTEM_END
-#define THREAD_ATOMIC_START DEFER_INTS
-#define THREAD_ATOMIC_END ENABLE_INTS
-#define THREAD_PASS rb_thread_schedule()
-
-Rubyの場合、タイマ割り込みを使用して、スレッドの切り替えを
-行っています。DEFER_INTSは割り込みハンドラの実行を一時的に
-止めるためのマクロです。ENABLE_INTSマクロで割り込みハンドラ
-の実行を許可します。
-これによって、THREAD_ATOMIC_STARTからTHREAD_ATOMIC_END
-で囲まれた部分の実行中に、他のスレッドに実行権が移動しません。
-
-
-(B) アプリケーションの中で対応する場合
-
-以下を保障するように、スレッドの実行を制御してください。
-
-同時に複数のスレッドが、正規表現オブジェクトを作成する、または解放する、ことを
-行なってはならない。それらのオブジェクトが全く別のものであっても。
-
-onig_new(), onig_new_deluxe(), onig_free()のどれかの呼び出しを、
-複数のスレッドが同時に実行することを避けてください。同時でなければ別にかまいません。
-
-これは何故必要なのかというと、正規表現オブジェクトを作成する
-過程で、内部で共通に参照するテーブルがあります。
-このテーブルに対してのデータ登録処理が複数のスレッドで衝突して
-異常な状態にならないために必要です。
-
-// END
diff --git a/doc/RE b/doc/RE
index 21efe53..b4bf536 100644
--- a/doc/RE
+++ b/doc/RE
@@ -1,4 +1,4 @@
-Oniguruma Regular Expressions Version 5.9.1 2007/09/05
+Oniguruma Regular Expressions Version 6.0.0 2016/05/02
syntax: ONIG_SYNTAX_RUBY (default)
@@ -86,19 +86,7 @@ syntax: ONIG_SYNTAX_RUBY (default)
Hiragana, Katakana
+ works on UTF8, UTF16, UTF32
- Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu,
- M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps,
- S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs,
- Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese,
- Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic,
- Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian,
- Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul,
- Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana,
- Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam,
- Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian,
- Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac,
- Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan,
- Tifinagh, Ugaritic, Yi
+ See doc/UNICODE_PROPERTIES.
diff --git a/doc/RE.ja b/doc/RE.ja
index abde849..bc877f2 100644
--- a/doc/RE.ja
+++ b/doc/RE.ja
@@ -1,4 +1,4 @@
-鬼車 正規表現 Version 5.9.1 2007/09/05
+鬼車 正規表現 Version 6.0.0 2016/05/02
使用文法: ONIG_SYNTAX_RUBY (既定値)
@@ -86,19 +86,7 @@
Hiragana, Katakana
+ UTF8, UTF16, UTF32で有効
- Any, Assigned, C, Cc, Cf, Cn, Co, Cs, L, Ll, Lm, Lo, Lt, Lu,
- M, Mc, Me, Mn, N, Nd, Nl, No, P, Pc, Pd, Pe, Pf, Pi, Po, Ps,
- S, Sc, Sk, Sm, So, Z, Zl, Zp, Zs,
- Arabic, Armenian, Bengali, Bopomofo, Braille, Buginese,
- Buhid, Canadian_Aboriginal, Cherokee, Common, Coptic,
- Cypriot, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian,
- Glagolitic, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul,
- Hanunoo, Hebrew, Hiragana, Inherited, Kannada, Katakana,
- Kharoshthi, Khmer, Lao, Latin, Limbu, Linear_B, Malayalam,
- Mongolian, Myanmar, New_Tai_Lue, Ogham, Old_Italic, Old_Persian,
- Oriya, Osmanya, Runic, Shavian, Sinhala, Syloti_Nagri, Syriac,
- Tagalog, Tagbanwa, Tai_Le, Tamil, Telugu, Thaana, Thai, Tibetan,
- Tifinagh, Ugaritic, Yi
+ doc/UNICODE_PROPERTIES参照
diff --git a/doc/UNICODE_PROPERTIES b/doc/UNICODE_PROPERTIES
new file mode 100644
index 0000000..dedc658
--- /dev/null
+++ b/doc/UNICODE_PROPERTIES
@@ -0,0 +1,698 @@
+Unicode Properties (from Unicode Version: 8.0.0)
+
+ 1: Any
+ 2: Assigned
+ 3: C
+ 4: Cc
+ 5: Cf
+ 6: Cn
+ 7: Co
+ 8: Cs
+ 9: L
+ 10: LC
+ 11: Ll
+ 12: Lm
+ 13: Lo
+ 14: Lt
+ 15: Lu
+ 16: M
+ 17: Mc
+ 18: Me
+ 19: Mn
+ 20: N
+ 21: Nd
+ 22: Nl
+ 23: No
+ 24: P
+ 25: Pc
+ 26: Pd
+ 27: Pe
+ 28: Pf
+ 29: Pi
+ 30: Po
+ 31: Ps
+ 32: S
+ 33: Sc
+ 34: Sk
+ 35: Sm
+ 36: So
+ 37: Z
+ 38: Zl
+ 39: Zp
+ 40: Zs
+ 41: Math
+ 42: Alphabetic
+ 43: Lowercase
+ 44: Uppercase
+ 45: Cased
+ 46: Case_Ignorable
+ 47: Changes_When_Lowercased
+ 48: Changes_When_Uppercased
+ 49: Changes_When_Titlecased
+ 50: Changes_When_Casefolded
+ 51: Changes_When_Casemapped
+ 52: ID_Start
+ 53: ID_Continue
+ 54: XID_Start
+ 55: XID_Continue
+ 56: Default_Ignorable_Code_Point
+ 57: Grapheme_Extend
+ 58: Grapheme_Base
+ 59: Grapheme_Link
+ 60: Common
+ 61: Latin
+ 62: Greek
+ 63: Cyrillic
+ 64: Armenian
+ 65: Hebrew
+ 66: Arabic
+ 67: Syriac
+ 68: Thaana
+ 69: Devanagari
+ 70: Bengali
+ 71: Gurmukhi
+ 72: Gujarati
+ 73: Oriya
+ 74: Tamil
+ 75: Telugu
+ 76: Kannada
+ 77: Malayalam
+ 78: Sinhala
+ 79: Thai
+ 80: Lao
+ 81: Tibetan
+ 82: Myanmar
+ 83: Georgian
+ 84: Hangul
+ 85: Ethiopic
+ 86: Cherokee
+ 87: Canadian_Aboriginal
+ 88: Ogham
+ 89: Runic
+ 90: Khmer
+ 91: Mongolian
+ 92: Hiragana
+ 93: Katakana
+ 94: Bopomofo
+ 95: Han
+ 96: Yi
+ 97: Old_Italic
+ 98: Gothic
+ 99: Deseret
+100: Inherited
+101: Tagalog
+102: Hanunoo
+103: Buhid
+104: Tagbanwa
+105: Limbu
+106: Tai_Le
+107: Linear_B
+108: Ugaritic
+109: Shavian
+110: Osmanya
+111: Cypriot
+112: Braille
+113: Buginese
+114: Coptic
+115: New_Tai_Lue
+116: Glagolitic
+117: Tifinagh
+118: Syloti_Nagri
+119: Old_Persian
+120: Kharoshthi
+121: Balinese
+122: Cuneiform
+123: Phoenician
+124: Phags_Pa
+125: Nko
+126: Sundanese
+127: Lepcha
+128: Ol_Chiki
+129: Vai
+130: Saurashtra
+131: Kayah_Li
+132: Rejang
+133: Lycian
+134: Carian
+135: Lydian
+136: Cham
+137: Tai_Tham
+138: Tai_Viet
+139: Avestan
+140: Egyptian_Hieroglyphs
+141: Samaritan
+142: Lisu
+143: Bamum
+144: Javanese
+145: Meetei_Mayek
+146: Imperial_Aramaic
+147: Old_South_Arabian
+148: Inscriptional_Parthian
+149: Inscriptional_Pahlavi
+150: Old_Turkic
+151: Kaithi
+152: Batak
+153: Brahmi
+154: Mandaic
+155: Chakma
+156: Meroitic_Cursive
+157: Meroitic_Hieroglyphs
+158: Miao
+159: Sharada
+160: Sora_Sompeng
+161: Takri
+162: Caucasian_Albanian
+163: Bassa_Vah
+164: Duployan
+165: Elbasan
+166: Grantha
+167: Pahawh_Hmong
+168: Khojki
+169: Linear_A
+170: Mahajani
+171: Manichaean
+172: Mende_Kikakui
+173: Modi
+174: Mro
+175: Old_North_Arabian
+176: Nabataean
+177: Palmyrene
+178: Pau_Cin_Hau
+179: Old_Permic
+180: Psalter_Pahlavi
+181: Siddham
+182: Khudawadi
+183: Tirhuta
+184: Warang_Citi
+185: Ahom
+186: Anatolian_Hieroglyphs
+187: Hatran
+188: Multani
+189: Old_Hungarian
+190: SignWriting
+191: White_Space
+192: Bidi_Control
+193: Join_Control
+194: Dash
+195: Hyphen
+196: Quotation_Mark
+197: Terminal_Punctuation
+198: Other_Math
+199: Hex_Digit
+200: ASCII_Hex_Digit
+201: Other_Alphabetic
+202: Ideographic
+203: Diacritic
+204: Extender
+205: Other_Lowercase
+206: Other_Uppercase
+207: Noncharacter_Code_Point
+208: Other_Grapheme_Extend
+209: IDS_Binary_Operator
+210: IDS_Trinary_Operator
+211: Radical
+212: Unified_Ideograph
+213: Other_Default_Ignorable_Code_Point
+214: Deprecated
+215: Soft_Dotted
+216: Logical_Order_Exception
+217: Other_ID_Start
+218: Other_ID_Continue
+219: STerm
+220: Variation_Selector
+221: Pattern_White_Space
+222: Pattern_Syntax
+223: Unknown
+224: Aghb
+225: AHex
+226: Arab
+227: Armi
+228: Armn
+229: Avst
+230: Bali
+231: Bamu
+232: Bass
+233: Batk
+234: Beng
+235: Bidi_C
+236: Bopo
+237: Brah
+238: Brai
+239: Bugi
+240: Buhd
+241: Cakm
+242: Cans
+243: Cari
+244: Cased_Letter
+245: Cher
+246: CI
+247: Close_Punctuation
+248: Combining_Mark
+249: Connector_Punctuation
+250: Control
+251: Copt
+252: Cprt
+253: Currency_Symbol
+254: CWCF
+255: CWCM
+256: CWL
+257: CWT
+258: CWU
+259: Cyrl
+260: Dash_Punctuation
+261: Decimal_Number
+262: Dep
+263: Deva
+264: DI
+265: Dia
+266: Dsrt
+267: Dupl
+268: Egyp
+269: Elba
+270: Enclosing_Mark
+271: Ethi
+272: Ext
+273: Final_Punctuation
+274: Format
+275: Geor
+276: Glag
+277: Goth
+278: Gran
+279: Gr_Base
+280: Grek
+281: Gr_Ext
+282: Gr_Link
+283: Gujr
+284: Guru
+285: Hang
+286: Hani
+287: Hano
+288: Hatr
+289: Hebr
+290: Hex
+291: Hira
+292: Hluw
+293: Hmng
+294: Hung
+295: IDC
+296: Ideo
+297: IDS
+298: IDSB
+299: IDST
+300: Initial_Punctuation
+301: Ital
+302: Java
+303: Join_C
+304: Kali
+305: Kana
+306: Khar
+307: Khmr
+308: Khoj
+309: Knda
+310: Kthi
+311: Lana
+312: Laoo
+313: Latn
+314: Lepc
+315: Letter
+316: Letter_Number
+317: Limb
+318: Lina
+319: Linb
+320: Line_Separator
+321: LOE
+322: Lowercase_Letter
+323: Lyci
+324: Lydi
+325: Mahj
+326: Mand
+327: Mani
+328: Mark
+329: Math_Symbol
+330: Mend
+331: Merc
+332: Mero
+333: Mlym
+334: Modifier_Letter
+335: Modifier_Symbol
+336: Mong
+337: Mroo
+338: Mtei
+339: Mult
+340: Mymr
+341: Narb
+342: Nbat
+343: NChar
+344: Nkoo
+345: Nonspacing_Mark
+346: Number
+347: OAlpha
+348: ODI
+349: Ogam
+350: OGr_Ext
+351: OIDC
+352: OIDS
+353: Olck
+354: OLower
+355: OMath
+356: Open_Punctuation
+357: Orkh
+358: Orya
+359: Osma
+360: Other
+361: Other_Letter
+362: Other_Number
+363: Other_Punctuation
+364: Other_Symbol
+365: OUpper
+366: Palm
+367: Paragraph_Separator
+368: Pat_Syn
+369: Pat_WS
+370: Pauc
+371: Perm
+372: Phag
+373: Phli
+374: Phlp
+375: Phnx
+376: Plrd
+377: Private_Use
+378: Prti
+379: Punctuation
+380: Qaac
+381: Qaai
+382: QMark
+383: Rjng
+384: Runr
+385: Samr
+386: Sarb
+387: Saur
+388: SD
+389: Separator
+390: Sgnw
+391: Shaw
+392: Shrd
+393: Sidd
+394: Sind
+395: Sinh
+396: Sora
+397: Space_Separator
+398: Spacing_Mark
+399: Sund
+400: Surrogate
+401: Sylo
+402: Symbol
+403: Syrc
+404: Tagb
+405: Takr
+406: Tale
+407: Talu
+408: Taml
+409: Tavt
+410: Telu
+411: Term
+412: Tfng
+413: Tglg
+414: Thaa
+415: Tibt
+416: Tirh
+417: Titlecase_Letter
+418: Ugar
+419: UIdeo
+420: Unassigned
+421: Uppercase_Letter
+422: Vaii
+423: VS
+424: Wara
+425: WSpace
+426: XIDC
+427: XIDS
+428: Xpeo
+429: Xsux
+430: Yiii
+431: Zinh
+432: Zyyy
+433: Zzzz
+434: In_Basic_Latin
+435: In_Latin_1_Supplement
+436: In_Latin_Extended_A
+437: In_Latin_Extended_B
+438: In_IPA_Extensions
+439: In_Spacing_Modifier_Letters
+440: In_Combining_Diacritical_Marks
+441: In_Greek_and_Coptic
+442: In_Cyrillic
+443: In_Cyrillic_Supplement
+444: In_Armenian
+445: In_Hebrew
+446: In_Arabic
+447: In_Syriac
+448: In_Arabic_Supplement
+449: In_Thaana
+450: In_NKo
+451: In_Samaritan
+452: In_Mandaic
+453: In_Arabic_Extended_A
+454: In_Devanagari
+455: In_Bengali
+456: In_Gurmukhi
+457: In_Gujarati
+458: In_Oriya
+459: In_Tamil
+460: In_Telugu
+461: In_Kannada
+462: In_Malayalam
+463: In_Sinhala
+464: In_Thai
+465: In_Lao
+466: In_Tibetan
+467: In_Myanmar
+468: In_Georgian
+469: In_Hangul_Jamo
+470: In_Ethiopic
+471: In_Ethiopic_Supplement
+472: In_Cherokee
+473: In_Unified_Canadian_Aboriginal_Syllabics
+474: In_Ogham
+475: In_Runic
+476: In_Tagalog
+477: In_Hanunoo
+478: In_Buhid
+479: In_Tagbanwa
+480: In_Khmer
+481: In_Mongolian
+482: In_Unified_Canadian_Aboriginal_Syllabics_Extended
+483: In_Limbu
+484: In_Tai_Le
+485: In_New_Tai_Lue
+486: In_Khmer_Symbols
+487: In_Buginese
+488: In_Tai_Tham
+489: In_Combining_Diacritical_Marks_Extended
+490: In_Balinese
+491: In_Sundanese
+492: In_Batak
+493: In_Lepcha
+494: In_Ol_Chiki
+495: In_Sundanese_Supplement
+496: In_Vedic_Extensions
+497: In_Phonetic_Extensions
+498: In_Phonetic_Extensions_Supplement
+499: In_Combining_Diacritical_Marks_Supplement
+500: In_Latin_Extended_Additional
+501: In_Greek_Extended
+502: In_General_Punctuation
+503: In_Superscripts_and_Subscripts
+504: In_Currency_Symbols
+505: In_Combining_Diacritical_Marks_for_Symbols
+506: In_Letterlike_Symbols
+507: In_Number_Forms
+508: In_Arrows
+509: In_Mathematical_Operators
+510: In_Miscellaneous_Technical
+511: In_Control_Pictures
+512: In_Optical_Character_Recognition
+513: In_Enclosed_Alphanumerics
+514: In_Box_Drawing
+515: In_Block_Elements
+516: In_Geometric_Shapes
+517: In_Miscellaneous_Symbols
+518: In_Dingbats
+519: In_Miscellaneous_Mathematical_Symbols_A
+520: In_Supplemental_Arrows_A
+521: In_Braille_Patterns
+522: In_Supplemental_Arrows_B
+523: In_Miscellaneous_Mathematical_Symbols_B
+524: In_Supplemental_Mathematical_Operators
+525: In_Miscellaneous_Symbols_and_Arrows
+526: In_Glagolitic
+527: In_Latin_Extended_C
+528: In_Coptic
+529: In_Georgian_Supplement
+530: In_Tifinagh
+531: In_Ethiopic_Extended
+532: In_Cyrillic_Extended_A
+533: In_Supplemental_Punctuation
+534: In_CJK_Radicals_Supplement
+535: In_Kangxi_Radicals
+536: In_Ideographic_Description_Characters
+537: In_CJK_Symbols_and_Punctuation
+538: In_Hiragana
+539: In_Katakana
+540: In_Bopomofo
+541: In_Hangul_Compatibility_Jamo
+542: In_Kanbun
+543: In_Bopomofo_Extended
+544: In_CJK_Strokes
+545: In_Katakana_Phonetic_Extensions
+546: In_Enclosed_CJK_Letters_and_Months
+547: In_CJK_Compatibility
+548: In_CJK_Unified_Ideographs_Extension_A
+549: In_Yijing_Hexagram_Symbols
+550: In_CJK_Unified_Ideographs
+551: In_Yi_Syllables
+552: In_Yi_Radicals
+553: In_Lisu
+554: In_Vai
+555: In_Cyrillic_Extended_B
+556: In_Bamum
+557: In_Modifier_Tone_Letters
+558: In_Latin_Extended_D
+559: In_Syloti_Nagri
+560: In_Common_Indic_Number_Forms
+561: In_Phags_pa
+562: In_Saurashtra
+563: In_Devanagari_Extended
+564: In_Kayah_Li
+565: In_Rejang
+566: In_Hangul_Jamo_Extended_A
+567: In_Javanese
+568: In_Myanmar_Extended_B
+569: In_Cham
+570: In_Myanmar_Extended_A
+571: In_Tai_Viet
+572: In_Meetei_Mayek_Extensions
+573: In_Ethiopic_Extended_A
+574: In_Latin_Extended_E
+575: In_Cherokee_Supplement
+576: In_Meetei_Mayek
+577: In_Hangul_Syllables
+578: In_Hangul_Jamo_Extended_B
+579: In_High_Surrogates
+580: In_High_Private_Use_Surrogates
+581: In_Low_Surrogates
+582: In_Private_Use_Area
+583: In_CJK_Compatibility_Ideographs
+584: In_Alphabetic_Presentation_Forms
+585: In_Arabic_Presentation_Forms_A
+586: In_Variation_Selectors
+587: In_Vertical_Forms
+588: In_Combining_Half_Marks
+589: In_CJK_Compatibility_Forms
+590: In_Small_Form_Variants
+591: In_Arabic_Presentation_Forms_B
+592: In_Halfwidth_and_Fullwidth_Forms
+593: In_Specials
+594: In_Linear_B_Syllabary
+595: In_Linear_B_Ideograms
+596: In_Aegean_Numbers
+597: In_Ancient_Greek_Numbers
+598: In_Ancient_Symbols
+599: In_Phaistos_Disc
+600: In_Lycian
+601: In_Carian
+602: In_Coptic_Epact_Numbers
+603: In_Old_Italic
+604: In_Gothic
+605: In_Old_Permic
+606: In_Ugaritic
+607: In_Old_Persian
+608: In_Deseret
+609: In_Shavian
+610: In_Osmanya
+611: In_Elbasan
+612: In_Caucasian_Albanian
+613: In_Linear_A
+614: In_Cypriot_Syllabary
+615: In_Imperial_Aramaic
+616: In_Palmyrene
+617: In_Nabataean
+618: In_Hatran
+619: In_Phoenician
+620: In_Lydian
+621: In_Meroitic_Hieroglyphs
+622: In_Meroitic_Cursive
+623: In_Kharoshthi
+624: In_Old_South_Arabian
+625: In_Old_North_Arabian
+626: In_Manichaean
+627: In_Avestan
+628: In_Inscriptional_Parthian
+629: In_Inscriptional_Pahlavi
+630: In_Psalter_Pahlavi
+631: In_Old_Turkic
+632: In_Old_Hungarian
+633: In_Rumi_Numeral_Symbols
+634: In_Brahmi
+635: In_Kaithi
+636: In_Sora_Sompeng
+637: In_Chakma
+638: In_Mahajani
+639: In_Sharada
+640: In_Sinhala_Archaic_Numbers
+641: In_Khojki
+642: In_Multani
+643: In_Khudawadi
+644: In_Grantha
+645: In_Tirhuta
+646: In_Siddham
+647: In_Modi
+648: In_Takri
+649: In_Ahom
+650: In_Warang_Citi
+651: In_Pau_Cin_Hau
+652: In_Cuneiform
+653: In_Cuneiform_Numbers_and_Punctuation
+654: In_Early_Dynastic_Cuneiform
+655: In_Egyptian_Hieroglyphs
+656: In_Anatolian_Hieroglyphs
+657: In_Bamum_Supplement
+658: In_Mro
+659: In_Bassa_Vah
+660: In_Pahawh_Hmong
+661: In_Miao
+662: In_Kana_Supplement
+663: In_Duployan
+664: In_Shorthand_Format_Controls
+665: In_Byzantine_Musical_Symbols
+666: In_Musical_Symbols
+667: In_Ancient_Greek_Musical_Notation
+668: In_Tai_Xuan_Jing_Symbols
+669: In_Counting_Rod_Numerals
+670: In_Mathematical_Alphanumeric_Symbols
+671: In_Sutton_SignWriting
+672: In_Mende_Kikakui
+673: In_Arabic_Mathematical_Alphabetic_Symbols
+674: In_Mahjong_Tiles
+675: In_Domino_Tiles
+676: In_Playing_Cards
+677: In_Enclosed_Alphanumeric_Supplement
+678: In_Enclosed_Ideographic_Supplement
+679: In_Miscellaneous_Symbols_and_Pictographs
+680: In_Emoticons
+681: In_Ornamental_Dingbats
+682: In_Transport_and_Map_Symbols
+683: In_Alchemical_Symbols
+684: In_Geometric_Shapes_Extended
+685: In_Supplemental_Arrows_C
+686: In_Supplemental_Symbols_and_Pictographs
+687: In_CJK_Unified_Ideographs_Extension_B
+688: In_CJK_Unified_Ideographs_Extension_C
+689: In_CJK_Unified_Ideographs_Extension_D
+690: In_CJK_Unified_Ideographs_Extension_E
+691: In_CJK_Compatibility_Ideographs_Supplement
+692: In_Tags
+693: In_Variation_Selectors_Supplement
+694: In_Supplementary_Private_Use_Area_A
+695: In_Supplementary_Private_Use_Area_B
+696: In_No_Block