Objective Systems, Inc.  
Home
About ASN.1
Products
Free Software
Documents
Services
Resources
Resellers
Customers
Careers
About Us
Contact Us
 

Google


Objective Systems, Inc.

UTF-8 String Functions

The UTF-8 string functions handle string operations on UTF-8 encoded strings. More...

Functions

long rtxUTF8ToUnicode (OSCTXT *pctxt, const OSUTF8CHAR *inbuf, OSUNICHAR *outbuf, size_t outbufsiz)
 This function converts a UTF-8 string to a Unicode string (UTF-16).
int rtxValidateUTF8 (OSCTXT *pctxt, const OSUTF8CHAR *inbuf)
 This function will validate a UTF-8 encoded string to ensure that it is encoded correctly.
size_t rtxUTF8Len (const OSUTF8CHAR *inbuf)
 This function will return the length (in characters) of a null-terminated UTF-8 encoded string.
size_t rtxUTF8LenBytes (const OSUTF8CHAR *inbuf)
 This function will return the length (in bytes) of a null-terminated UTF-8 encoded string.
int rtxUTF8CharSize (OS32BITCHAR wc)
 This function will return the number of bytes needed to encode the given 32-bit universal character value as a UTF-8 character.
int rtxUTF8EncodeChar (OS32BITCHAR wc, OSOCTET *buf, size_t bufsiz)
 This function will convert a wide character into an encoded UTF-8 character byte string.
int rtxUTF8DecodeChar (OSCTXT *pctxt, const OSUTF8CHAR *pinbuf, int *pInsize)
 This function will convert an encoded UTF-8 character byte string into a wide character value.
OS32BITCHAR rtxUTF8CharToWC (const OSUTF8CHAR *buf, OSUINT32 *len)
 Thia function will convert a UTF-8 encoded character value into a wide character.
OSUTF8CHARrtxUTF8StrChr (OSUTF8CHAR *utf8str, OS32BITCHAR utf8char)
 This function finds a character in the given UTF-8 character string.
OSUTF8CHARrtxUTF8Strdup (OSCTXT *pctxt, const OSUTF8CHAR *utf8str)
 This function creates a duplicate copy of the given UTF-8 character string.
OSUTF8CHARrtxUTF8Strndup (OSCTXT *pctxt, const OSUTF8CHAR *utf8str, size_t nbytes)
 This function creates a duplicate copy of the given UTF-8 character string.
OSBOOL rtxUTF8StrEqual (const OSUTF8CHAR *utf8str1, const OSUTF8CHAR *utf8str2)
 This function compares two UTF-8 string values for equality.
OSBOOL rtxUTF8StrnEqual (const OSUTF8CHAR *utf8str1, const OSUTF8CHAR *utf8str2, size_t count)
 This function compares two UTF-8 string values for equality.
int rtxUTF8Strcmp (const OSUTF8CHAR *utf8str1, const OSUTF8CHAR *utf8str2)
 This function compares two UTF-8 character strings and returns a trinary result (equal, less than, greater than).
int rtxUTF8Strncmp (const OSUTF8CHAR *utf8str1, const OSUTF8CHAR *utf8str2, size_t count)
 This function compares two UTF-8 character strings and returns a trinary result (equal, less than, greater than).
int rtxUTF8StrToInt (const OSUTF8CHAR *utf8str, OSINT32 *pvalue)
 This function converts the given null-terminated UTF-8 string to an integer value.
int rtxUTF8StrnToInt (const OSUTF8CHAR *utf8str, size_t nbytes, OSINT32 *pvalue)
 This function converts the given part of UTF-8 string to an integer value.

Detailed Description

The UTF-8 string functions handle string operations on UTF-8 encoded strings.

This is the default character string data type used for encoded XML data. UTF-8 strings are represented in C as strings of unsigned characters (bytes) to cover the full range of possible single character encodings.


Function Documentation

int rtxUTF8CharSize OS32BITCHAR  wc  ) 
 

This function will return the number of bytes needed to encode the given 32-bit universal character value as a UTF-8 character.

Parameters:
wc 32-bit wide character value.
Returns:
Number of bytes needed to encode as UTF-8.

OS32BITCHAR rtxUTF8CharToWC const OSUTF8CHAR buf,
OSUINT32 len
 

Thia function will convert a UTF-8 encoded character value into a wide character.

Parameters:
buf Pointer to UTF-8 character value.
len Pointer to integer to receive decoded size (in bytes) of the UTF-8 character value sequence.
Returns:
Converted wide character value.

int rtxUTF8DecodeChar OSCTXT pctxt,
const OSUTF8CHAR pinbuf,
int *  pInsize
 

This function will convert an encoded UTF-8 character byte string into a wide character value.

Parameters:
pctxt A pointer to a context structure.
pinbuf Pointer to UTF-8 byte sequence to be decoded.
pInsize Number of bytes that were consumed (i.e. size of the character).
Returns:
32-bit wide character value.

int rtxUTF8EncodeChar OS32BITCHAR  wc,
OSOCTET buf,
size_t  bufsiz
 

This function will convert a wide character into an encoded UTF-8 character byte string.

Parameters:
wc 32-bit wide character value.
buf Buffer to receive encoded UTF-8 character value.
bufsiz Size of the buffer ot receive the encoded value.
Returns:
Completion status of operation:
  • 0 = success,
  • negative return value is error.

size_t rtxUTF8Len const OSUTF8CHAR inbuf  ) 
 

This function will return the length (in characters) of a null-terminated UTF-8 encoded string.

Parameters:
inbuf A pointer to the null-terminated UTF-8 encoded string.
Returns:
Number of characters in string. Note that this may be different than the number of bytes as UTF-8 characters can span multiple-bytes.

size_t rtxUTF8LenBytes const OSUTF8CHAR inbuf  ) 
 

This function will return the length (in bytes) of a null-terminated UTF-8 encoded string.

Parameters:
inbuf A pointer to the null-terminated UTF-8 encoded string.
Returns:
Number of bytes in the string.

OSUTF8CHAR* rtxUTF8StrChr OSUTF8CHAR utf8str,
OS32BITCHAR  utf8char
 

This function finds a character in the given UTF-8 character string.

It is similar to the C strchr function.

Parameters:
utf8str Null-terminated UTF-8 string to be searched.
utf8char 32-bit Unicode character to find.
Returns:
Pointer to to the first occurrence of character in string, or NULL if character is not found.

int rtxUTF8Strcmp const OSUTF8CHAR utf8str1,
const OSUTF8CHAR utf8str2
 

This function compares two UTF-8 character strings and returns a trinary result (equal, less than, greater than).

It is similar to the C strcmp function.

Parameters:
utf8str1 UTF-8 string to be compared.
utf8str2 UTF-8 string to be compared.
Returns:
-1 if utf8str1 is less than utf8str2, 0 if the two string are equal, and +1 if the utf8str1 is greater than utf8str2.

OSUTF8CHAR* rtxUTF8Strdup OSCTXT pctxt,
const OSUTF8CHAR utf8str
 

This function creates a duplicate copy of the given UTF-8 character string.

It is similar to the C strdup function. Memory for the duplicated string is allocated using the rtxMemAlloc function.

Parameters:
pctxt A pointer to a context structure.
utf8str Null-terminated UTF-8 string to be duplicated.
Returns:
Pointer to duplicated string value.

OSBOOL rtxUTF8StrEqual const OSUTF8CHAR utf8str1,
const OSUTF8CHAR utf8str2
 

This function compares two UTF-8 string values for equality.

Parameters:
utf8str1 UTF-8 string to be compared.
utf8str2 UTF-8 string to be compared.
Returns:
TRUE if equal, FALSE if not.

int rtxUTF8Strncmp const OSUTF8CHAR utf8str1,
const OSUTF8CHAR utf8str2,
size_t  count
 

This function compares two UTF-8 character strings and returns a trinary result (equal, less than, greater than).

In this case, a maximum count of the number of bytes to compare can be specified. It is similar to the C strncmp function.

Parameters:
utf8str1 UTF-8 string to be compared.
utf8str2 UTF-8 string to be compared.
count Number of bytes to compare.
Returns:
-1 if utf8str1 is less than utf8str2, 0 if the two string are equal, and +1 if the utf8str1 is greater than utf8str2.

OSUTF8CHAR* rtxUTF8Strndup OSCTXT pctxt,
const OSUTF8CHAR utf8str,
size_t  nbytes
 

This function creates a duplicate copy of the given UTF-8 character string.

It is similar to the rtxUTF8Strdup function except that it allows the number of bytes to convert to be specified. Memory for the duplicated string is allocated using the rtxMemAlloc function.

Parameters:
pctxt A pointer to a context structure.
utf8str UTF-8 string to be duplicated.
nbytes Number of bytes from utf8str to duplicate.
Returns:
Pointer to duplicated string value.

OSBOOL rtxUTF8StrnEqual const OSUTF8CHAR utf8str1,
const OSUTF8CHAR utf8str2,
size_t  count
 

This function compares two UTF-8 string values for equality.

It is similar to the rtxUTF8StrEqual function except that it allows the number of bytes to compare to be specified.

Parameters:
utf8str1 UTF-8 string to be compared.
utf8str2 UTF-8 string to be compared.
count Number of bytes to compare.
Returns:
TRUE if equal, FALSE if not.

int rtxUTF8StrnToInt const OSUTF8CHAR utf8str,
size_t  nbytes,
OSINT32 pvalue
 

This function converts the given part of UTF-8 string to an integer value.

It is assumed the string contains only numeric digits and whitespace. It is similar to the C atoi function except that the result is returned as a separate argument and an error status value is returned if the conversion cannot be performed successfully.

Parameters:
utf8str UTF-8 string to convert. Not necessary to be null-terminated.
nbytes Size in bytes of utf8Str.
pvalue Pointer to integer to receive result
Returns:
Status: 0 = OK, negative value = error

int rtxUTF8StrToInt const OSUTF8CHAR utf8str,
OSINT32 pvalue
 

This function converts the given null-terminated UTF-8 string to an integer value.

It is assumed the string contains only numeric digits and whitespace. It is similar to the C atoi function except that the result is returned as a separate argument and an error status value is returned if the conversion cannot be performed successfully.

Parameters:
utf8str Null-terminated UTF-8 string to convert
pvalue Pointer to integer to receive result
Returns:
Status: 0 = OK, negative value = error

long rtxUTF8ToUnicode OSCTXT pctxt,
const OSUTF8CHAR inbuf,
OSUNICHAR outbuf,
size_t  outbufsiz
 

This function converts a UTF-8 string to a Unicode string (UTF-16).

The Unicode string is stored as an array of 16-bit characters (unsigned short integers).

Parameters:
pctxt A pointer to a context structure.
inbuf UTF-8 string to convert.
outbuf Output buffer to receive converted Unicode data.
outbufsiz Size of the output buffer in bytes.
Returns:
Completion status of operation:
  • number of octets put in the output buffer,
  • negative return value is error.

int rtxValidateUTF8 OSCTXT pctxt,
const OSUTF8CHAR inbuf
 

This function will validate a UTF-8 encoded string to ensure that it is encoded correctly.

Parameters:
pctxt A pointer to a context structure.
inbuf A pointer to the null-terminated UTF-8 encoded string.
Returns:
Completion status of operation:
  • 0 = success,
  • negative return value is error.


This file was last modified on 8 Jan 2007.
XBinder, Version 1.1.9