Character String Types

All 8-bit character character-string types are derived from the C character pointer (const char*) base type. This pointer is used to hold a null-terminated C string for encoding/decoding. For encoding, the string can either be static (i.e., a string literal or address of a static buffer) or dynamic. The decoder allocates dynamic memory from within its context to hold the memory for the string. This memory is released when the rtxMemFree function is called.

The useful character string types in ASN.1 are as follows:

   UTF8String        ::=  [UNIVERSAL 12]  IMPLICIT OCTET STRING
   NumericString     ::=  [UNIVERSAL 18]  IMPLICIT IA5String
   PrintableString   ::=  [UNIVERSAL 19]  IMPLICIT IA5String
   T61String         ::=  [UNIVERSAL 20]  IMPLICIT OCTET STRING
   VideotexString    ::=  [UNIVERSAL 21]  IMPLICIT OCTET STRING
   IA5String         ::=  [UNIVERSAL 22]  IMPLICIT OCTET STRING
   UTCTime           ::=  [UNIVERSAL 23]  IMPLICIT GeneralizedTime
   GeneralizedTime   ::=  [UNIVERSAL 24]  IMPLICIT IA5String
   GraphicString     ::=  [UNIVERSAL 25]  IMPLICIT OCTET STRING
   VisibleString     ::=  [UNIVERSAL 26]  IMPLICIT OCTET STRING
   GeneralString     ::=  [UNIVERSAL 27]  IMPLICIT OCTET STRING
   UniversalString   ::=  [UNIVERSAL 28]  IMPLICIT OCTET STRING
   BMPString         ::=  [UNIVERSAL 30]  IMPLICIT OCTET STRING
   ObjectDescriptor  ::=  [UNIVERSAL 7]  IMPLICIT GraphicString

Of these, all are represented by const char * pointers except for the BMPString, UniversalString, and UTF8String types.

The BMPString type is a 16-bit character string for which the following structure is used:

   typedef struct {
      OSUINT32 nchars;
      OSUNICHAR* data;
   } Asn116BitCharString;

The OSUNICHAR type used in this definition represents a Unicode character (UTF-16) and is defined to be a C unsigned short type.

See the rtBMPToCString, rtBMPToNewCString, and the rtCToBMPString run-time function descriptions for information on utilities that can convert standard C strings to and from BMP string format.

The UniversalString type is a 32-bit character string for which the following structure is used:

   typedef struct {
      OSUINT32 nchars;
      OS32BITCHAR* data;
   } Asn132BitCharString;

The OS32BITCHAR type used in this definition is defined to be a C unsigned int type.

See the rtUCSToCString, rtUCSToNewCString, and the rtCToUCSString run-time function descriptions for information on utilities that can convert standard C strings to and from Universal Character Set (UCS-4) string format. See also the rtUCSToWCSString and rtWCSToUCSString for information on utilities that can convert standard wide character string to and from UniversalString type.

The UTF8String type is represented as a string of unsigned characters using the OSUTF8CHAR data type. This type is defined to be unsigned char. This makes it possible to use the characters in the upper range of the UTF-8 space as positive numbers. The contents of this string type are assumed to contain the UTF-8 encoding of a character string. For the most part, standard C character string functions such as strcpy, strcat, etc. can be used with these strings with some type casting.

Utility functions are provided for working with UTF-8 string data. The UTF-8 encoding for a standard ASCII string is simply the string itself. For Unicode strings represented in C/C++ using the wide character type (wchar_t), the run-time functions rtxUTF8ToWCS and rtxWCSToUTF8 can be used for converting to and from UTF-8 format. The function rtxValidateUTF8 can be used to ensure that a given UTF-8 encoding is valid. See the C/C++ Run-Time Library Reference Manual for a complete description of these functions.