Synopsis
char *step_STEPToUTF8(const char *instring)
Purpose
Read a string token and perform STEP to UTF-8 conversion
Description
Given a input string, return a string after doing international language conversion. This routine will convert to the UTF-8 encoding. This function uses the unl subsystem. unl_Initialize() must be called prior to using this function.
This routine will handle STRINGS according to section 7.3.3 of Part 21 of STEP. Some of this information is repeated here. There are 4 possible representations:
STEP calls out some encodings for STRINGS in the Part 21 exchange file. There are four different methods:
1 - Standard ASCII encodings (basic alphabet) (section 7.3.3) - Decimal equivalents 31 through 126 inclusive of ASCII (ISO 8859-1). Note that it is impossible to represent the control characters (newline, tab) using this method.
2 - Encoding the full alphabet of ISO8859. (See section 7.3.3.1) This encoding uses a "\S" to indicate that the next character should be interpreted with the 8th bit on (or'd 128 to the decimal equivalent). This encoding is not supported yet.
3 - UNICODE or ISO 10646 (Section 7.3.3.2). This is how STEP does international char. Once the row/column for a given character has been found from the UNICODE tables, we simply encode it into a sequence of 4 character hexadecimal format with an ANNOUNCER to indicate that this series is a special encoding, until terminate the directive. This ANNOUNCER is the special characters "\X2", and directive terminating is "\X0". This works for all 16 bit characters. Currently, UNICODE only has mappings for 16 bit characters. However, it does allow for extensions into the 32 bit world, but so far, there are not any characters in those planes yet. Note: incorrectly used "\X2" case, which used as similar to "\X", is also handled here.
So for a character at row 0/column 0 we would get the following mapping: "\X20000\X0" For a character at row 255/column 255 we would get the following mapping: "\X2\FFFF\X0"
For a character at row 0/column 255 we would get: "\X200FF\X0"
For 2 characters at row 0/column 255 we would get: "\X200FF00FF\X0"
4 - ARBITRARY HEX - (Section 7.3.3.3) - This just allows a hexadecimal representation for characters. This is how you can send control characters, (carriage returns, tabs) in a strings. NL (newline) is actually a integer value of 10 from the ascii table. So we encode it as a hexadecimal "0A". We also use the "/X/" to ANNOUNCE that the next two characters are using the ARBITRARY method for encoding "/X/0A"
Input
instring
the input string to decode
Return
If successful, returns the input string if nothing changed, or a dynamically allocated string; otherwise, returns NULL.