gennorm reads some of the Unicode Character Database files and compiles their
normalization information into a binary form.
The resulting file,
unorm.dat , can then be read directly by ICU, or used by
pkgdata(8) for incorporation into a larger archive or library.
The files read by
gennorm are described in the
FILES section. If
suffix is passed on the command line, the names of these files will actually
be changed to include a dash followed by
suffix in their basename. For example, the file
UnicodeData.txt would be looked for under the name
UnicodeData-suffix.txt .
OPTIONS
"-h, -?, --help"
Print help about usage and exit.
"-v, --verbose"
Display extra informative messages during execution.
-u\fP, \fB--unicode version
Specify which
version of Unicode the Unicode Character Database refers to.
Defaults to
3.0.0 .
-c\fP, \fB--copyright
Include a copyright notice into the binary data.
-s\fP, \fB--sourcedir source
Set the source directory to
source . The default source directory is specified by the environment variable
ICU_DATA .
-d\fP, \fB--destdir destination
Set the destination directory to
destination . The default destination directory is specified by the environment variable
ICU_DATA .
ENVIRONMENT
ICU_DATA
Specifies the directory containing ICU data. Defaults to
/usr/lib/icu/2.8/ . Some tools in ICU depend on the presence of the trailing slash. It is thus
important to make sure that it is present if
ICU_DATA is set.
FILES
The following files are read by
gennorm and are looked for in the
source directory.
UnicodeData.txt
The main file in the Unicode Character Database. Contains character
properties, combining classes information, decompositions, names,
etc....
DerivedNormalizationProperties.txt
Derived properties useful in dealing with normalization forms.