OW_NAMESPACE::UTF8Utils Namespace Reference


Functions

size_t charCount (const char *utf8str)
 Count the number of UTF-8 chars in the string.
UInt16 UTF8toUCS2 (const char *utf8char)
 Convert one UTF-8 char (possibly multiple bytes) into a UCS2 16-bit char.
String UCS2toUTF8 (UInt16 ucs2char)
 Convert one UCS2 16-bit char into a UTF-8 char (possibly multiple bytes).
UInt32 UTF8toUCS4 (const char *utf8char)
 Convert one UTF-8 char (possibly multiple bytes) into a UCS4 32-bit char.
String UCS4toUTF8 (UInt32 ucs4char)
 Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes).
void UCS4toUTF8 (UInt32 ucs4char, StringBuffer &sb)
 Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes) This version is faster to use in a loop than the version which returns a String.
void UCS4toUTF8 (UInt32 ucs4char, char *p)
Array< UInt16StringToUCS2Common (const String &input, bool throwException)
Array< UInt16StringToUCS2ReplaceInvalid (const String &input)
 Convert a UTF-8 (or ASCII) string into a UCS2 string.
Array< UInt16StringToUCS2 (const String &input)
 Convert a UTF-8 (or ASCII) string into a UCS2 string.
String UCS2ToString (const void *input, size_t inputLength)
 Convert a UCS2 string into a UTF-8 (or ASCII) string.
String UCS2ToString (const Array< UInt16 > &input)
 Convert a UCS2 string into a UTF-8 (or ASCII) string.
String UCS2ToString (const Array< char > &input)
 Convert a UCS2 string into a UTF-8 (or ASCII) string.
int UTF8CharLen (UInt32 ucs4char)
template<typename TransformT>
bool transformInPlace (char *input, TransformT transformer)
template<typename TransformT>
String transform (const char *input, TransformT transformer)
bool toUpperCaseInPlace (char *input)
 Convert the UTF-8 string to upper case.
String toUpperCase (const char *input)
 Convert the UTF-8 string to upper case and return the result.
bool toLowerCaseInPlace (char *input)
 Convert the UTF-8 string to lower case.
String toLowerCase (const char *input)
 Convert the UTF-8 string to lower case and return the result.
int compareToIgnoreCase (const char *str1, const char *str2)
 Compares 2 UTF-8 strings, ignoring any case differences as defined by the Unicode spec CaseFolding.txt file.

Variables

UInt8 SequenceLengthTable [256]
const CaseMapping lowerMappings []
const CaseMapping upperMappings []
const CaseMapping *const lowerMappingsEnd
const CaseMapping *const upperMappingsEnd


Function Documentation

size_t OW_NAMESPACE::UTF8Utils::charCount const char *  utf8str  ) 
 

Count the number of UTF-8 chars in the string.

This may be different than the number of bytes (as would be returned by strlen()). If utf8str is not a valid UTF-8 string, then the result is undefined.

Parameters:
utf8str string in UTF-8 encoding.
Returns:
Number of chars in the string.

Definition at line 101 of file OW_UTF8Utils.cpp.

References OW_ASSERT.

Referenced by OWBI1::String::UTF8Length(), and OW_NAMESPACE::String::UTF8Length().

int OW_NAMESPACE::UTF8Utils::compareToIgnoreCase const char *  str1,
const char *  str2
 

Compares 2 UTF-8 strings, ignoring any case differences as defined by the Unicode spec CaseFolding.txt file.

Parameters:
str1 first string
str2 second string
Returns:
a value less than, equal to, or greater than 0 if str1 is found to be less than, equal to, or greater than str2

Definition at line 42 of file OW_UTF8UtilscompareToIgnoreCase.cpp.

Referenced by OWBI1::String::compareToIgnoreCase(), OW_NAMESPACE::String::compareToIgnoreCase(), OWBI1::String::endsWith(), OW_NAMESPACE::String::endsWith(), OW_dbrecCompare(), and OW_NAMESPACE::registerString().

Array< UInt16 > OW_NAMESPACE::UTF8Utils::StringToUCS2 const String &  input  ) 
 

Convert a UTF-8 (or ASCII) string into a UCS2 string.

Parameters:
input The UTF-8 string
Returns:
An Array of UCS2 characters
Exceptions:
InvalidUTF8Exception if input contains invalid UTF-8 characters.

Definition at line 370 of file OW_UTF8Utils.cpp.

References StringToUCS2Common().

Array<UInt16> OW_NAMESPACE::UTF8Utils::@1::StringToUCS2Common const String &  input,
bool  throwException
[static]
 

Definition at line 262 of file OW_UTF8Utils.cpp.

References OW_NAMESPACE::String::c_str(), OW_NAMESPACE::String::length(), OW_ASSERT, OW_THROW, OW_NAMESPACE::Array< T >::push_back(), and SequenceLengthTable.

Referenced by StringToUCS2(), and StringToUCS2ReplaceInvalid().

Array< UInt16 > OW_NAMESPACE::UTF8Utils::StringToUCS2ReplaceInvalid const String &  input  ) 
 

Convert a UTF-8 (or ASCII) string into a UCS2 string.

Invalid characters will be changed to U+FFFD (the Unicode Replacement character)

Parameters:
input The UTF-8 string
Returns:
An Array of UCS2 characters

Definition at line 364 of file OW_UTF8Utils.cpp.

References StringToUCS2Common().

String OW_NAMESPACE::UTF8Utils::toLowerCase const char *  input  ) 
 

Convert the UTF-8 string to lower case and return the result.

Definition at line 2081 of file OW_UTF8Utils.cpp.

References lowerMappings, lowerMappingsEnd, and transform().

Referenced by OW_NAMESPACE::registerString(), and OW_NAMESPACE::String::toLowerCase().

bool OW_NAMESPACE::UTF8Utils::toLowerCaseInPlace char *  input  ) 
 

Convert the UTF-8 string to lower case.

The string is modified in place. If a character is encountered whose replacement occupies a greater number of bytes than the original, processing will cease and false will be returned. The current implementation does not handle any of the special cases as defined in the Unicode SpecialCasing.txt file, and thus characters will not grow, so currently false will never be returned.

Returns:
true if successful. false if the lower-cased replacement would be larger than the original.

Definition at line 2075 of file OW_UTF8Utils.cpp.

References lowerMappings, lowerMappingsEnd, and transformInPlace().

Referenced by OW_NAMESPACE::String::toLowerCase().

String OW_NAMESPACE::UTF8Utils::toUpperCase const char *  input  ) 
 

Convert the UTF-8 string to upper case and return the result.

Definition at line 2069 of file OW_UTF8Utils.cpp.

References transform(), upperMappings, and upperMappingsEnd.

Referenced by OW_NAMESPACE::registerString(), and OW_NAMESPACE::String::toUpperCase().

bool OW_NAMESPACE::UTF8Utils::toUpperCaseInPlace char *  input  ) 
 

Convert the UTF-8 string to upper case.

The string is modified in place. If a character is encountered whose replacement occupies a greater number of bytes than the original, processing will cease and false will be returned. The current implementation does not handle any of the special cases as defined in the Unicode SpecialCasing.txt file, and thus characters will not grow, so currently false will never be returned.

Returns:
true if successful. false if the upper-cased replacement would be larger than the original.

Definition at line 2063 of file OW_UTF8Utils.cpp.

References transformInPlace(), upperMappings, and upperMappingsEnd.

Referenced by OW_NAMESPACE::String::toUpperCase().

template<typename TransformT>
String OW_NAMESPACE::UTF8Utils::@2::transform const char *  input,
TransformT  transformer
[static]
 

Definition at line 466 of file OW_UTF8Utils.cpp.

References OW_NAMESPACE::StringBuffer::releaseString(), SequenceLengthTable, UCS4toUTF8(), and UTF8toUCS4().

Referenced by toLowerCase(), and toUpperCase().

template<typename TransformT>
bool OW_NAMESPACE::UTF8Utils::@2::transformInPlace char *  input,
TransformT  transformer
[static]
 

Definition at line 426 of file OW_UTF8Utils.cpp.

References SequenceLengthTable, UCS4toUTF8(), UTF8CharLen(), and UTF8toUCS4().

Referenced by toLowerCaseInPlace(), and toUpperCaseInPlace().

String OW_NAMESPACE::UTF8Utils::UCS2ToString const Array< char > &  input  ) 
 

Convert a UCS2 string into a UTF-8 (or ASCII) string.

Parameters:
input An Array of UCS2 characters
Returns:
The UTF-8 string

Definition at line 396 of file OW_UTF8Utils.cpp.

References OW_NAMESPACE::Array< T >::size(), and UCS2ToString().

String OW_NAMESPACE::UTF8Utils::UCS2ToString const Array< UInt16 > &  input  ) 
 

Convert a UCS2 string into a UTF-8 (or ASCII) string.

Parameters:
input An Array of UCS2 characters
Returns:
The UTF-8 string

Definition at line 391 of file OW_UTF8Utils.cpp.

References OW_NAMESPACE::Array< T >::size(), and UCS2ToString().

String OW_NAMESPACE::UTF8Utils::UCS2ToString const void *  input,
size_t  inputLength
 

Convert a UCS2 string into a UTF-8 (or ASCII) string.

Parameters:
input An Array of UCS2 characters
inputLength The size (in bytes) of input.
Returns:
The UTF-8 string

Definition at line 376 of file OW_UTF8Utils.cpp.

References i, and UCS4toUTF8().

Referenced by UCS2ToString().

String OW_NAMESPACE::UTF8Utils::UCS2toUTF8 UInt16  ucs2char  ) 
 

Convert one UCS2 16-bit char into a UTF-8 char (possibly multiple bytes).

Parameters:
ucs2char UCS2 char to convert.
Returns:
The corresponding UTF-8 char.

Definition at line 132 of file OW_UTF8Utils.cpp.

References UCS4toUTF8().

Referenced by OWBI1::Char16::toString(), OW_NAMESPACE::Char16::toString(), and OW_NAMESPACE::Char16::toUTF8().

void OW_NAMESPACE::UTF8Utils::@1::UCS4toUTF8 UInt32  ucs4char,
char *  p
[static]
 

Definition at line 234 of file OW_UTF8Utils.cpp.

void OW_NAMESPACE::UTF8Utils::UCS4toUTF8 UInt32  ucs4char,
StringBuffer &  sb
 

Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes) This version is faster to use in a loop than the version which returns a String.

Parameters:
ucs4char UCS4 char to convert.
sb The corresponding UTF-8 char will be appended to the end of sb.

Definition at line 204 of file OW_UTF8Utils.cpp.

String OW_NAMESPACE::UTF8Utils::UCS4toUTF8 UInt32  ucs4char  ) 
 

Convert one UCS4 32-bit char into a UTF-8 char (possibly multiple bytes).

Parameters:
ucs4char UCS4 char to convert.
Returns:
The corresponding UTF-8 char.

Definition at line 196 of file OW_UTF8Utils.cpp.

References OW_NAMESPACE::StringBuffer::releaseString().

Referenced by main(), transform(), transformInPlace(), UCS2ToString(), and UCS2toUTF8().

int OW_NAMESPACE::UTF8Utils::@2::UTF8CharLen UInt32  ucs4char  )  [static]
 

Definition at line 405 of file OW_UTF8Utils.cpp.

Referenced by transformInPlace().

UInt16 OW_NAMESPACE::UTF8Utils::UTF8toUCS2 const char *  utf8char  ) 
 

Convert one UTF-8 char (possibly multiple bytes) into a UCS2 16-bit char.

Parameters:
utc8char pointer to the UTF-8 char to convert
Returns:
The corresponding UCS2 char. 0xFFFF if utf8char points to an invalid UTF-8 sequence. Not all UTF-8 chars are handled. UTF-8 chars outside the range of a UCS2 char will produce 0xFFFF.

Definition at line 119 of file OW_UTF8Utils.cpp.

References UTF8toUCS4().

Referenced by OWBI1::Char16::Char16(), and OW_NAMESPACE::Char16::Char16().

UInt32 OW_NAMESPACE::UTF8Utils::UTF8toUCS4 const char *  utf8char  ) 
 

Convert one UTF-8 char (possibly multiple bytes) into a UCS4 32-bit char.

Parameters:
utc8char pointer to the UTF-8 char to convert
Returns:
The corresponding UCS4 char. 0xFFFFFFFF if utf8char points to an invalid UTF-8 sequence.

Definition at line 138 of file OW_UTF8Utils.cpp.

References OW_ASSERT, and SequenceLengthTable.

Referenced by transform(), transformInPlace(), and UTF8toUCS2().


Variable Documentation

const CaseMapping OW_NAMESPACE::UTF8Utils::lowerMappings[] [static]
 

Definition at line 496 of file OW_UTF8Utils.cpp.

Referenced by toLowerCase(), and toLowerCaseInPlace().

const CaseMapping* const OW_NAMESPACE::UTF8Utils::lowerMappingsEnd [static]
 

Initial value:

 lowerMappings +
   (sizeof(lowerMappings)/sizeof(lowerMappings[0]))

Definition at line 2021 of file OW_UTF8Utils.cpp.

Referenced by toLowerCase(), and toLowerCaseInPlace().

UInt8 OW_NAMESPACE::UTF8Utils::SequenceLengthTable[256] [static]
 

Initial value:

{
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
    4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0  
}

Definition at line 81 of file OW_UTF8Utils.cpp.

Referenced by StringToUCS2Common(), transform(), transformInPlace(), and UTF8toUCS4().

const CaseMapping OW_NAMESPACE::UTF8Utils::upperMappings[] [static]
 

Definition at line 1254 of file OW_UTF8Utils.cpp.

Referenced by toUpperCase(), and toUpperCaseInPlace().

const CaseMapping* const OW_NAMESPACE::UTF8Utils::upperMappingsEnd [static]
 

Initial value:

 upperMappings +
   (sizeof(upperMappings)/sizeof(upperMappings[0]))

Definition at line 2024 of file OW_UTF8Utils.cpp.

Referenced by toUpperCase(), and toUpperCaseInPlace().


Generated on Thu Feb 9 09:18:00 2006 for openwbem by  doxygen 1.4.6