Thoughts on the current state of ASN.1 and XML technologies.

Archive for category ASN1C

Improved TBCD and BCD Support for Java/C#

Our next release of ASN1C (v 7.3) will include improved support for TBCD and BCD strings for Java and C#.  In short, for these types, the toString/ToString function will return the TBCD/BCD interpretation of the octets, rather than merely their hexadecimal representation.  This also improves the print functionality.

TBCD stands for Telephony Binary-coded Decimal and BCD stands for Binary-coded Decimal.  So, what are TBCD and BCD strings?  They are OCTET STRINGS in which a series of digits or telephony digits are encoded, using one nibble (4 bits) per digit.  There isn’t an authoritative definition, but there are a few standards out there that provide definitions of TBCD or BCD and these are what we’ve followed, as described below.

As you will see in the following descriptions, TBCD and BCD strings are similar.  The differences are 1) the set of digit characters and 2) the ordering of the nibbles within the bytes.

TBCD Strings

For TBCD, we follow 3GPP 29.002.  This is also the document that happens to be referenced in the section on TBCD in the Wikipedia entry for BCD.  Here is how 29.002 defines TBCD:

TBCD-STRING ::= OCTET STRING
-- This type (Telephony Binary Coded Decimal String) is used to
-- represent several digits from 0 through 9, *, #, a, b, c, two
-- digits per octet, each digit encoded 0000 to 1001 (0 to 9),
-- 1010 (*), 1011 (#), 1100 (a), 1101 (b) or 1110 (c); 1111 used
-- as filler when there is an odd number of digits.
-- bits 8765 of octet n encoding digit 2n
-- bits 4321 of octet n encoding digit 2(n-1) +1

To summarize the characteristics for 3GPP 29.002 TBCD-STRING:
• Uses characters 0-9,*,#,a-c.
• F nibble is used as filler when there is an odd number of digits.
• The low nibble contains the first digit.

Note that 3GPP 24.008 §10.5.4.7 “Called party BCD number” specifies the same encoding, though it simply refers to it as “BCD”.  24.008 also calls for BCD in §10.5.1.4 “Mobile Identity” (for the  IMSI, IMEI and IMEISV), (presumably) meaning BCD as defined in §10.5.4.7, i.e. TBCD.

BCD Strings

For BCD, we have followed the TAP3 (GSM TD.57) specification of BCD.  Here is how they define BCD:

-- The BCDString data type (Binary Coded Decimal String) is used to represent 
-- several digits from 0 through 9, a, b, c, d, e. 
-- Two digits are encoded per octet. The four leftmost bits of the octet represent 
-- the first digit while the four remaining bits represent the following digit. 
-- A single f must be used as a filler when the total number of digits to be 
-- encoded is odd. 
-- No other filler is allowed.

To summarize the characteristics of TAP3 BCDString:

• Uses characters 0-9,a-e.
• F nibble is used as filler when there is an odd number of digits.
• The high nibble contains the first digit.

ITU-T Q.825 TBCD-STRING

Q.825 was another candidate for a definition of TBCD strings.  At this point, we haven’t added special support for it.  This section merely points out the differences between Q.825 TBCD-STRING and 3GPP 29.002 TBCD-STRING.  The differences are:

  • In Q.825, TBCD-STRING is defined as part of an OCTET STRING, not as a standalone type.  Prior to the TBCD-STRING content, the OCTET STRING contains an odd/even indicator octet, and, in some cases, another octet.
  • Q.825 orders the nibbles differently.
  • Q.825 uses the F nibble to mark the end of the TBCD string (“end of pulsing signal-ST”)
  • Q.825 uses the 0 nibble as filler when there is an odd number of digits.  So, in some cases, a zero nibble is merely filler, but in other cases it is a ‘0’ digit.  [Note: we’re not sure why Q.825 specifies that filler is required when there is an odd number of digits; it seem it should be required when there is an even number of digits.  By our reading, “123” would map to 0x123F (no filler), while “1234” would map to 0x12340F (filler).]

 

No Comments

Using -prttostrm (print-to-stream) vs -prttostr (print-to-string) in generated C code

The capability to print the contents of binary-encoded data in a human-readable form has always been an often used feature of our code generation products.  We have had since the beginning the standard -print code generation option for generating code that would print the contents of generated data structures to standard output.  These are straight-forward and simple to use.  But it was not long after that users wanted a way to print to other mediums, most commonly to a text buffer (i.e character array) so that the printed data could be sent to other places, for example, a display window within a GUI.

For this, the “print-to-string” capability was added.  The command-line option -prttostr was added for this purpose and the generated functions allowed a text buffer and size to be passed to receive the printed data.

This worked OK and was fine for printing small amounts of data.  But what we found was that users frequently wanted to print very large data structures using this capability and this led to very slow performance.  The primary reason for this was because in order to append to the buffer that was passed, the end of the string buffer had to be found and the only way to do this lacking any other state information was to make a call to the string length (strlen) run-time function.  If this was done over and over on a very large buffer containing a very large string, it quickly became a compute-intensive operation.

In order to remedy this, we introduced the “print-to-stream” capability.  This provided more flexibility in printing as a user-defined callback function could be declared that would be invoked to handle the printing of each individual data item.  Users could provide their own user-defined data structures to the callback functions making it possible to maintain state between operations.  One example of this would be in keeping track of where the end of a string buffer was after each print operation making the appending of additional data much faster.

We declared “print-to-string” to be deprecated in favor of the new “print-to-stream” capability, but what we have found 15 years down the road is print-to-string still being used.  A few possible reasons for this are 1) that is the way some users started doing it and did not want to change, and 2) it is simpler to use these functions as all that is necessary is to pass a string buffer directly to the function instead of having to design a callback function.

To address the latter point, as part of our next ASN1C release next year, we will provide a built-in callback function that can be used to model print-to-string using print-to-stream.  If users don’t want to wait that long, the code for the new callback is shown below:

void rtxPrintStreamToStringCB
(void* pPrntStrmInfo, const char* fmtspec, va_list arglist)
{
   OSRTStrBuf* bufp = (OSRTStrBuf*) pPrntStrmInfo;
   if (bufp->bufsize > bufp->endx) {
      vsnprintf (&bufp->strbuf[bufp->endx], bufp->bufsize - bufp->endx,
                 fmtspec, arglist);
      bufp->endx += strlen (&bufp->strbuf[bufp->endx]);
   }
}

The OSRTStrBuf structure is defined as follows:

typedef struct {
   char* strbuf;
   OSSIZE bufsize;
   OSSIZE endx;
} OSRTStrBuf;

The code can then be inserted into a C program to print a populated type structure:

OSRTStrBuf strBufDescr;
char strbuf[10240];
...
/* Set up print-to-stream callback to write to character string buffer */
strBufDescr.strbuf = strbuf;
strBufDescr.bufsize = sizeof(strbuf);
strBufDescr.endx = 0;

rtxSetPrintStream (&ctxt, &rtxPrintStreamToStringCB, (void*)&strBufDescr);

asn1PrtToStrm_<type> (&ctxt, "Data", &data);

In this code snippet, <type> would be replaced with the type name of the data structure to be printed. The result of the call would be the data printed to the strbuf character array. Be sure to create a large enough array to hold all of the printed data. If not large enough, the data will be truncated.

No Comments

Re-using Decoded Items in a Subsequent Encoding

This blog post attempts to provide advice about re-using items from a decoded message in a subsequent encoding of a different message.

Let’s look at the employee sample in the c/sample_ber/employee directory of the ASN1C SDK. The ASN.1 specification for this sample is fairly simple and looks like this:

<code>
Employee DEFINITIONS ::= BEGIN
EXPORTS;

PersonnelRecord ::= [APPLICATION 0] IMPLICIT SET {
   Name,
   title [0] IA5String,
   number EmployeeNumber,
   dateOfHire [1] Date,
   nameOfSpouse [2] Name,
   children [3] IMPLICIT SEQUENCE OF ChildInformation
}

ChildInformation ::= SET {
   Name,
   dateOfBirth [0] Date
}

Name ::= [APPLICATION 1] IMPLICIT SEQUENCE {
   givenName IA5String,
   initial IA5String,
   familyName IA5String
}

EmployeeNumber ::= [APPLICATION 2] IMPLICIT INTEGER

Date ::= IA5String

END
</code>

Now, suppose the ChildInformation piece and the Name piece need to be used in a different message, called a FamilyRecord, that is going to be encoded after a PersonnelRecord message is decoded. We can change the ASN.1 specification that defines the PersonnelRecord so it looks like this:

<code>
Employee DEFINITIONS AUTOMATIC TAGS ::= BEGIN

EXPORTS;

IMPORTS Name, ChildInformation FROM Common;

PersonnelRecord ::= SET {
   employeeName Name,
   title IA5String,
   number EmployeeNumber,
   dateOfHire Date,
   nameOfSpouse Name,
   children SEQUENCE OF ChildInformation
}

EmployeeNumber ::= INTEGER

Date ::= IA5String

END
</code>

So we can see now that instead of defining ChildInformation and Name, the specification imports them from a module named Common. The other changes are that we are using an explicit element name called employeeName just to make things neater, and we are using AUTOMATIC TAGS for sanity preservation.

The Common module would look like this:

<code>
Common DEFINITIONS AUTOMATIC TAGS ::= BEGIN

EXPORTS ChildInformation, Name;

ChildInformation ::= SET {
   childName Name,
   dateOfBirth Date
}

Name ::= SEQUENCE {
   givenName IA5String,
   initial IA5String,
   familyName IA5String
}

END
</code>

And we also need to create a specification that defines FamilyRecord:

<code>
Family DEFINITIONS AUTOMATIC TAGS ::= BEGIN

EXPORTS;

IMPORTS Name, ChildInformation FROM Common;

FamilyRecord ::= SET {
nameOfSpouse Name,
ageOfSpouse INTEGER (18..MAX),
children SEQUENCE OF ChildInformation
}

END
</code>

So we have a module named Common that defines ChildInformation and Name. And we have two other modules, named Employee (which defines the PersonnelRecord PDU) and Family (which defines the FamilyRecord PDU) that both make use of these two definitions in the Common module.

Now, suppose we have a need to write some C code that decodes a PersonnelRecord and uses the name of the employee’s spouse and and the information about the employee’s children in a new encoding of a FamilyRecord. Below is a complete C program that can accomplish this. The sections that I’m going to talk about in a little more detail are indicated with numbers in square brackets, e.g., [1], [2], [3], etc.

<code>
/*
This program does the following:
Reads and decodes an already-encoded PersonnelRecord message.
Uses pieces of that decoded PersonnelRecord to populate the structures for a new FamilyRecord message.
Encodes that FamilyRecord message.
*/

#include "Employee.h"
#include "Family.h"
#include "rtxsrc/rtxDiag.h"
#include "rtxsrc/rtxFile.h"

#define MAXMSGLEN (1024)

int main()
{
   PersonnelRecord tEmployee;
   FamilyRecord tFamily;
   OSCTXT tDecodeContext, tEncodeContext;

   /* Receives the encoded (i.e., not yet decoded) PersonnelRecord message from the message.dat file */
   OSOCTET* pachEmployeeMessage;

   /* Receives the encoded FamilyRecord message from this program's encode call */
   OSOCTET achFamilyMessage[MAXMSGLEN];

   /* Receives a pointer to the encoded FamilyRecord message in order to print it and then write it to a file */
   OSOCTET *pachFamilyMessage;

   OSSIZE iLength;
   int iStatus;
   FILE* ptOutputFile;
   const char szInputFileName[] = "EmployeeMessage.dat";
   const char szOutputFileName[] = "FamilyMessage.dat";
   OSBOOL bTrace = TRUE, bVerbose = FALSE;

   /* Initialize the context structure for the decoding. */
   if (rtInitContext (&amp;tDecodeContext) != 0) { /* [1] */
      printf ("Error initializing decode context\n");
      return -1;
   }
   rtxSetDiag (&amp;tDecodeContext, bVerbose);

   /* Read the input file into a memory buffer. */
   iStatus = rtxFileReadBinary (&amp;tDecodeContext, szInputFileName, &amp;pachEmployeeMessage, &amp;iLength); /* [2] */
   if (0 != iStatus) {
      printf ("Error opening %s for read access\n", szInputFileName);
      return -1;
   }
   iStatus = xd_setp64 (&amp;tDecodeContext, pachEmployeeMessage, iLength, 0, 0, 0);
   if (0 != iStatus) {
      rtxErrPrint (&amp;tDecodeContext);
      return iStatus;
   }

   /* Clear the structures that will receive the decoded message. */
   asn1Init_PersonnelRecord (&amp;tEmployee);

   /* Decode the PersonnelRecord message. */
   iStatus = asn1D_PersonnelRecord (&amp;tDecodeContext, &amp;tEmployee, ASN1EXPL, 0); /* [3] */
   if (0 == iStatus) {
      if (bTrace) {
         printf ("Decode of PersonnelRecord was successful\n");
         printf ("Decoded record:\n");
         asn1Print_PersonnelRecord ("Employee", &amp;tEmployee);
      }
   }
   else {
      printf ("decode of PersonnelRecord failed\n");
      rtxErrPrint (&amp;tDecodeContext);
      return -1;
   }

   /* Now use the spouse's name and the children's names in a new FamilyRecord message. */

   /* Initialize the context structure for the encoding. */
   iStatus = rtInitContext (&amp;tEncodeContext); /* [4] */
   if (0 != iStatus) {
      printf ("encoding context initialization failed\n");
      rtxErrPrint (&amp;tEncodeContext);
      return iStatus;
   }
   rtxSetDiag (&amp;tEncodeContext, bVerbose);

   /* Populate the structures for the FamilyRecord message. */ /* [5] */
   tFamily.nameOfSpouse = tEmployee.nameOfSpouse;
   tFamily.ageOfSpouse = 30;
   tFamily.children = tEmployee.children;

   /* Encode the FamilyRecord message. */
   xe_setp (&amp;tEncodeContext, achFamilyMessage, sizeof(achFamilyMessage));
   if ((iLength = asn1E_FamilyRecord (&amp;tEncodeContext, &amp;tFamily, ASN1EXPL)) &gt; 0) /* [6] */
   {
      pachFamilyMessage = xe_getp (&amp;tEncodeContext);
      if (bTrace) {
         if (XU_DUMP (pachFamilyMessage) != 0)
         printf ("dump of ASN.1 message failed.");
      }
   }
   else {
      rtxErrPrint (&amp;tEncodeContext);
      return iLength;
   }

   /* Write the encoded message out to the output file */ /* [7] */

   if (0 != (ptOutputFile = fopen (szOutputFileName, "wb"))) {
      fwrite (pachFamilyMessage, 1, iLength, ptOutputFile);
      fclose (ptOutputFile);
   }
   else {
      printf ("Error opening %s for write access\n", szOutputFileName);
      return -1;
   }

   /* Now free up our contexts. */ /* [8] */
   rtFreeContext (&amp;tDecodeContext);
   rtFreeContext (&amp;tEncodeContext);

   return 0;
}
</code>

In part [1] we’re initializing a context structure for decoding.

In part [2] we’re reading a file that contains an encoded PersonnelRecord. The byte array pachEmployeeMessage will have the bytes of the encoded message.

In part [3] we’re decoding the PersonnelRecord into the tEmployee structure.

In part [4] we’re initializing a context structure for encoding. Note that we’re using different context structures for decoding and encoding.

Part [5] is the crucial part. Here we’re populating the members of the tFamily structure before we use it to encode a FamilyRecord message. For two of those members we’re using members of the tEmployee structure, which contains the decoded information from the PersonnelRecord message. In both cases the members are structures in the generated C code, so the assignment results in a shallow copy of the structure from tEmployee to tFamily. So all pointers within the tEmployee structures stay the same in the tFamily structures. The crucial part to remember here is that the memory used for the decoding of the PersonnelRecord message (i.e., the tEmployee structure) must remain intact until we’re completely done with the tFamily structure, since the tFamily structure now has pointers to that memory.

In part [6] we’re encoding a FamilyRecord message using the tFamily structure that we just populated in part [5].

In part [7] we’re writing the encoded FamilyRecord message out to a file.

In part [8] we’re freeing the two contexts that we used, one for decoding and one for encoding. As pointed out in part [5] it’s crucial that the context, and hence the memory, used for the decoding remain intact until we’re completely done with the encoding, since the structure used for the encoding has pointers to the memory used for the decoding.

No Comments

Compact code generation in ASN1C

ASN.1 is used in a lot of different areas and a new area that is within the Internet of Things (IoT).  In particular Narrowband IoT (NB-IoT) uses ASN.1 UPER-based messaging.

One characteristic of these devices is they are small, so code size is critical.  We have been working on ways to make our ASN1C generated code and run-time libraries as compact as possible for applications such as these.   In our latest ASN1C v7.2.1 patch release, we are now including a new set of compact libraries for Linux.  These can be found in the c/lib_compact directories.  They are built with gcc using maximum space optimization settings and with a lot of non-critical code stripped out.  The compact libraries are roughly 25% smaller than the standard libraries.

In addition to using the compact libraries, additional steps can be taken to reduce the size of the generated code.  We touched on some of these in a past blog post entitled “Optimizing PER Encoding and Code Footprint“.  We would also recommend using the following command-line options (the equivalent GUI option is in parentheses):

  • -compact  (Generate compact code)
  • -noinit  (uncheck the Generate Initialization Functions checkbox)
  • -noenumconvert (do not generate enum-to-string conversion functions – should only be enabled if print functions are generated)

Other options that you may or may not be able use:

  • -lax (Do not generate constraint checks)
  • -strict-size (Interpret size constraints strictly)

If all of these measures are employed, users could potentially see the size of their application reduced by one half or more.

 

No Comments

ASN1C 7.2 Improved Comment Handling

In version 7.2, we improved our handling of ASN.1 comments, as follows.

  • When using the “Pretty-print ASN.1” (-asn1)  option, comments from type assignments and elements (SEQUENCE/SET/CHOICE components) are now included in the output.  Previously, pretty-printed ASN.1 did not include any ASN.1 comments in the output.
  • When generating C/C++ code, we previously put ASN.1 comments only for types into the C/C++ comments.  We now include ASN.1 comments from elements as well.
  • When writing comments, we now try to preserve the position of the comment as it appeared in the ASN.1.  We formerly printed all comments before the type assignment with which we associated the comment, even if the comment actually appeared after the type assignment.

When we output ASN.1 comments and ASN.1 syntax, whether the context is pretty-printed ASN.1 or C/C++ comments, we are not simply writing out everything as it appeared in the input.  This means we have to associate comments with syntax.  Since ASN.1 comments don’t have a syntactic relationship to other parts of the ASN.1 syntax, such associations involve a heuristic.  In the example below, the comment is potentially associated with either BigNumber or SmallNumber, though common practice suggests it’s most likely related to SmallNumber.

BigNumber ::= INTEGER (500..1000)

-- This is the type to use for speeds

SmallNumber ::= INTEGER (0..30)

Here’s a rough description of the heuristic rules we use:

  • If the start of a comment comes after some other ASN.1 syntax appearing on the same line, the comment is considered related to that syntax.
  • If the start of a comment is preceded, on the same line, only by whitespace, the comment may be related either to syntax that precedes or succeeds the comment.
    • If the comment is followed by a type assignment or an element, the first such comment that is not indented, relative to the type assignment or element, is associated with that type assignment or element.  Successive comments are also associated with the same item, regardless of indentation.
    • Any comments that preceded the first non-indented comment (all of which are indented) are associated with something which precedes those comments.  If these comments are immediately preceded by a type assignment or an element, they are associated with that type assignment or element.  In any case, they will not be associated with an element or type assignment that follows those comments.

Some examples:

-- comment for Person
Person ::= SEQUENCE {
   -- comment for age
   age INTEGER, -- another comment for age
      -- yet another comment for age
   -- comment for name
   name UTF8String
} -- another comment for Person
   -- yet another comment for Person

-- comment for Winnings
Winnings::= INTEGER (500..1000)

It is possible that these heuristics will associate a comment differently than a human reader would have.  Consider this example:

BigNumber ::= INTEGER (500..1000)

   -- SmallNumber is used for speeds

SmallNumber ::= INTEGER (0..30)

Because of the indentation, the comment will be associated with BigNumber, but it obviously actually relates to SmallNumber.  Since we try to preserve location when printing, we’ll print the comment after the definition of BigNumber, which can give the reader a hint that the comment might actually relate to something else (if the content of the comment were different, this might not be so obvious to the reader).

No Comments