Hello,
since I had problems with german umlauts when receiving (and probably also when sending) SMS, I looked today in the gsm translating part in libosmocore.
I first noticed a problem, when sending "NETZ Number" to "4636" in O2 network, you get a response SMS saying in which network the number is located. Using osmocombb, I get:
[SMS from 66399] Sehr geehrter Kunde, die angefragte Nummer ist im Netz von o2 aktiv (Angabe ohne Gew§hr).
So the umlaut "ae" is converted to "§".
The problem seems to be in the convert table in src/shared/libosmocore/src/gsm/gsm_utils.c, which is not bijective. "Ae" in gsm7 is "0x7b" and must be converted to latin1 "0xe4".
Looking in gsm_7bit_alphabet at place "0xe4", the is "0x7b". But there is also an "0x7b" at place "0xa7", which comes first and is indeed "§".
So the question is, are there mistakes in the table or is there a reason for the double entries?
Thanks
Tim
On Sat, 14 Jan 2012, Tim Ehlers wrote:
Hello again,
src/shared/libosmocore/src/gsm/gsm_utils.c, which is not bijective. "Ae" in gsm7 is "0x7b" and must be converted to latin1 "0xe4".
Looking in gsm_7bit_alphabet at place "0xe4", the is "0x7b". But there is also an "0x7b" at place "0xa7", which comes first and is indeed "§".
maybe its a good idea not to use one table for encoding and decoding. I introduced a reverse table for decoding gsm7 to latin1 like this:
--- osmocom-bb.120111.ori/src/shared/libosmocore/src/gsm/gsm_utils.c 2012-01-10 17:06:40.000000000 +0100 +++ osmocom-bb.120111/src/shared/libosmocore/src/gsm/gsm_utils.c2012-01-14 19:03:25.000000000 +0100 @@ -83,34 +83,65 @@ * more complex */ static unsigned char gsm_7bit_alphabet[] = { - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0a, 0xff, 0xff, 0x0d, 0xff, - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, - 0xff, 0xff, 0x20, 0x21, 0x22, 0x23, 0x02, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, - 0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, - 0x3c, 0x3d, 0x3e, 0x3f, 0x00, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, - 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, - 0x5a, 0x3c, 0x2f, 0x3e, 0x14, 0x11, 0xff, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, - 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, - 0x78, 0x79, 0x7a, 0x28, 0x40, 0x29, 0x3d, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, - 0xff, 0xff, 0x0c, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5e, 0xff, 0xff, - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x40, 0xff, 0x01, 0xff, - 0x03, 0xff, 0x7b, 0x7d, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5c, 0xff, 0xff, 0xff, 0xff, 0xff, - 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5b, 0x7e, 0x5d, 0xff, 0x7c, 0xff, 0xff, 0xff, - 0xff, 0x5b, 0x0e, 0x1c, 0x09, 0xff, 0x1f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5d, - 0xff, 0xff, 0xff, 0xff, 0x5c, 0xff, 0x0b, 0xff, 0xff, 0xff, 0x5e, 0xff, 0xff, 0x1e, 0x7f, - 0xff, 0xff, 0xff, 0x7b, 0x0f, 0x1d, 0xff, 0x04, 0x05, 0xff, 0xff, 0x07, 0xff, 0xff, 0xff, - 0xff, 0x7d, 0x08, 0xff, 0xff, 0xff, 0x7c, 0xff, 0x0c, 0x06, 0xff, 0xff, 0x7e, 0xff, 0xff + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x0a, 0xff, 0xff, 0x0d, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0x20, 0x21, 0x22, 0x23, 0x02, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, + 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, + 0x00, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, + 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0xfe, 0xfe, 0xfe, 0xfe, 0x11, + 0x27, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, + 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0xfe, 0xfe, 0xfe, 0xfe, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0x40, 0xff, 0x01, 0x24, 0x03, 0xff, 0x5f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x60, + 0xff, 0xff, 0xff, 0xff, 0x5b, 0x0e, 0x1c, 0x09, 0xff, 0x1f, 0xff, 0xff, 0xff, 0xff, 0xff, 0x60, + 0xff, 0x5d, 0xff, 0xff, 0xff, 0xff, 0x5c, 0xff, 0x0b, 0xff, 0xff, 0xff, 0x5e, 0xff, 0xff, 0x1e, + 0x7f, 0xff, 0xff, 0xff, 0x7b, 0x0f, 0x1d, 0xff, 0x04, 0x05, 0xff, 0xff, 0x07, 0xff, 0xff, 0xff, + 0xff, 0x7d, 0x08, 0xff, 0xff, 0xff, 0x7c, 0xff, 0x0c, 0x06, 0xff, 0xff, 0x7e, 0xff, 0xff, 0xff +}; + +static unsigned char gsm_7bit_alphabet_rev[] = { + 0x40, 0xa3, 0x24, 0xa5, 0xe8, 0xe9, 0xf9, 0xec, + 0xf2, 0xc7, 0x0a, 0xd8, 0xf8, 0x0d, 0xc5, 0xe5, + 0xff, 0x5f, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0x20, 0xc6, 0xe6, 0xdf, 0xc9, + 0x20, 0x21, 0x22, 0x23, 0xa4, 0x25, 0x26, 0x27, + 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f, + 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, + 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, + 0xa1, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, + 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, + 0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, + 0x58, 0x59, 0x5a, 0xc4, 0xd6, 0xd1, 0xdc, 0xa7, + 0xbf, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, + 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, + 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77, + 0x78, 0x79, 0x7a, 0xe4, 0xf6, 0xf1, 0xfc, 0xe0, + + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0x5e, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0x20, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7b, + 0x7d, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5c, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0x5b, 0x7e, 0x5d, 0xff, 0x7c, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0x65, 0xff, 0xff, 0xff };
/* GSM 03.38 6.2.1 Character lookup for decoding */ static int gsm_septet_lookup(uint8_t ch) { - int i = 0; - for (; i < sizeof(gsm_7bit_alphabet); i++) { - if (gsm_7bit_alphabet[i] == ch) - return i; - } - return -1; + char c = gsm_7bit_alphabet_rev[ch]; + if (c) { + return c; + } else { + return -1; } }
/* Compute the number of octets from the number of septets, for instance: 47 septets needs 41,125 = 42 octets */ @@ -152,7 +183,7 @@ /* this is an extension character */ if(rtext[i] == 0x1b && i + 1 < septet_l){ tmp = rtext[i+1]; - *(text++) = gsm_7bit_alphabet[0x7f + tmp]; + *(text++) = gsm_7bit_alphabet_rev[0x7f + tmp]; i++; continue; }
Maybe someone could have a look on the code, if I missed something? Since we are using latin1, I don't know how to convert characters like € and others not in latin1. The € sign is converted to e in the above code.
cheers
Tim
baseband-devel@lists.osmocom.org