ch.claudio.lib
Class UCD

java.lang.Object
  extended by ch.claudio.lib.UCD

public class UCD
extends java.lang.Object

This class gives access to data contained in unicode files UnicodeData.txt and Unihan.txt.

Currently it assumes that these files are contained in the zip files as provided on the unicode site at http://www.unicode.org/Public/zipped/5.0.0/. The two zip files are assumed to reside in /var/tmp.

The files are read and cached on demand. Thus the first call of getName or getHanInfo with a large codePoint can take up to half a minute. The information for all found codepoints up to the requested one are cached. Thus requests for a code point smaller than one already used should be handled directly from the cache.

Currently HanInfo contains all string fields representing directly the content found in the file. Some filed shall be replaced in the future by an int or an array of ints.

Version:
$Id:$
Author:
Claudio Nieder

Copyright (C) 2007 Claudio Nieder <private@claudio.ch>, CH-8610 Uster

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 2 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA


Nested Class Summary
static class UCD.HanInfo
          Collect the information for a character in Unihan.txt into one object.
 
Field Summary
static java.util.Map<java.lang.Integer,java.util.List<java.lang.Integer>> byFrequency
          Index which allows to find all characers having a given frequency.
static java.util.Map<java.lang.Integer,java.util.List<java.lang.Integer>> byGrade
          Index which allows to find all characers supposed to be known at a certain grade in Hong Kong primary school.
 
Constructor Summary
UCD()
           
 
Method Summary
static UCD.HanInfo getHanInfo(int codePoint)
          Return some information from Unihan.txt associated with the code point.
static java.lang.String getName(int codePoint)
          Return the name associated to one code point.
static void main(java.lang.String[] args)
          Print license.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

byFrequency

public static final java.util.Map<java.lang.Integer,java.util.List<java.lang.Integer>> byFrequency
Index which allows to find all characers having a given frequency. Most frequent character have frequency 1. The highest possible value (=least frequent) is 5 according to Unihan.txt of UCD version 5.

This data structure is just filled when the information is read during a getHanInfo. If you need it to be complete you need to first call getHanInfo with a high enough value. Currently (UCD version 5) a value of 0x9fff should do it.


byGrade

public static final java.util.Map<java.lang.Integer,java.util.List<java.lang.Integer>> byGrade
Index which allows to find all characers supposed to be known at a certain grade in Hong Kong primary school. Grades range from 1 to 6 according to Unihan.txt of UCD version 5.

This data structure is just filled when the information is read during a getHanInfo. If you need it to be complete you need to first call getHanInfo with a high enough value. Currently (UCD version 5) a value of 0x9fff should do it.

Constructor Detail

UCD

public UCD()
Method Detail

getHanInfo

public static UCD.HanInfo getHanInfo(int codePoint)
Return some information from Unihan.txt associated with the code point.

Parameters:
codePoint -
Returns:
HanInfo object

getName

public static java.lang.String getName(int codePoint)
Return the name associated to one code point.

Parameters:
codePoint -
Returns:
name

main

public static void main(java.lang.String[] args)
Print license.

Parameters:
args - ignored