gensprep reads filtered RFC 3454 files and compiles their
information into a binary form.
The resulting file,
<name>.icu , can then be read directly by ICU, or used by
pkgdata(8) for incorporation into a larger archive or library.
The files read by
gensprep are described in the
FILES section.
OPTIONS
"-h, -?, --help"
Print help about usage and exit.
"-v, --verbose"
Display extra informative messages during execution.
-c\fP, \fB--copyright
Include a copyright notice into the binary data.
-s\fP, \fB--sourcedir source
Set the source directory to
source . The default source directory is specified by the environment variable
ICU_DATA .
-d\fP, \fB--destdir destination
Set the destination directory to
destination . The default destination directory is specified by the environment variable
ICU_DATA .
ENVIRONMENT
ICU_DATA
Specifies the directory containing ICU data. Defaults to
/usr/share/icu/3.4/ . Some tools in ICU depend on the presence of the trailing slash. It is thus
important to make sure that it is present if
ICU_DATA is set.
FILES
The following files are read by
gensprep and are looked for in the
source /misc for rfc3454_*.txt files and in
source /unidata for NormalizationCorrections.txt.
rfc3453_A_1.txt
Contains the list of unassigned codepoints in Unicode version 3.2.0....
rfc3454_B_1.txt
Contains the list of code points that are commonly mapped to nothing....
rfc3454_B_2.txt
Contains the list of mappings for casefolding of code points when Normalization form NFKC is specified....
rfc3454_C_X.txt
Contains the list of code points that are prohibited for IDNA.
NormalizationCorrections.txt
Contains the list of code points whose normalization has changed since Unicode Version 3.2.0.