[PATCH] Decoding UCS2 USSD

List overview All Threads
Download

newer

older

[PATCH] gsm_utils.c: in 7bit...

Sept 19, 8pm / Osmocom Berlin User...

Pavel Baturko

19 Sep 2012 19 Sep '12

6:44 a.m.

Hi,

I create patch with added decoding of UCS2 USSD messages. Text in UCS2 is printed on 'mobile' console and status is sent to vty since there is no support of wide characters in vty. Added constants to decode GSM 7bit text when DCS is specified as 0x00, 0x01. I tested it with Russian language with several providers and all works ok.

Thanks, Pavel

Attachments:

attachment.html (text/html — 452 bytes)
ussd-ucs2.patch (application/octet-stream — 11.0 KB)

Show replies by date

Holger Hans Peter Freyther

19 Sep 19 Sep

11:02 a.m.

On Tue, Sep 18, 2012 at 08:44:27PM +0000, Pavel Baturko wrote:

...

Hi,

I create patch with added decoding of UCS2 USSD messages. Text in UCS2 is printed on 'mobile' console and status is sent to vty since there is no support of wide characters in vty. Added constants to decode GSM 7bit text when DCS is specified as 0x00, 0x01. I tested it with Russian language with several providers and all works ok.

Dear Pavel,

thanks a lot for your patch. I think a library should not call setlocale as it will impact the entire application. I think our only other realistic option is to use iconv (and stub it out for osmocomBB). Do you think you could try to create such a patch?

thanks holger

Peter Stuge

12:12 p.m.

Holger Hans Peter Freyther wrote:

...

I think a library should not call setlocale as it will impact the entire application. I think our only other realistic option is to use iconv (and stub it out for osmocomBB). Do you think you could try to create such a patch?

I'd suggest http://www.lemoda.net/c/ucs2-to-utf8/ and simply output UTF-8.

Maybe output an error if setlocal(LC_CTYPE,NULL) does not report that a UTF-8 locale is being used.

//Peter

Pavel Baturko

20 Sep 20 Sep

7:49 a.m.

Thanks for advises! New version: if received text in UCS2, convert it to UTF-8, then check if console supports UTF-8, if yes - print message in vty as usual, if not - print message in hex.

~Pavel

On Wed, Sep 19, 2012 at 2:12 AM, Peter Stuge peter@stuge.se wrote:

...

Holger Hans Peter Freyther wrote:

...
I think a library should not call setlocale as it will impact the entire application. I think our only other realistic option is to use iconv (and stub it out for osmocomBB). Do you think you could try to create such a patch?

I'd suggest http://www.lemoda.net/c/ucs2-to-utf8/ and simply output UTF-8.

Maybe output an error if setlocal(LC_CTYPE,NULL) does not report that a UTF-8 locale is being used.

//Peter

Peter Stuge

8 p.m.

Thanks Harald for checking the license thing out, although I think the code is so trivial that it's difficult to claim copyright. Let's see.

Thanks Pavel for the patch! I have some comments:

Pavel Baturko wrote:

...

From 4fcf1f7afc29053833e524cc35c15dd57ebbbf64 Mon Sep 17 00:00:00 2001 From: pab pab@pab

Please set the git config user.name and user.email options correctly:

git config --global user.name 'Pavel Baturko' git config --global user.email your@email.here

...

Date: Thu, 20 Sep 2012 01:36:28 +0000 Subject: [PATCH] Add USSD UCS2 support

There is no commit message. I don't know why this would be OK. Please write a few relevant sentences about the change *in* the commit message, rather than in the email you send where you attach the commit message. The commit message is the correct place for the information, since that is the canonical storage facility for the code. The mailing list is not.

...

+++ b/src/host/layer23/src/mobile/gsm480_ss.c @@ -24,6 +24,9 @@ #include <stdio.h> #include <string.h> #include <stdlib.h> +#include <wchar.h> +#include <wctype.h> +#include <locale.h>

Are these files always guaranteed to be available, or are some configure.ac checks required in this change?

...

@@ -439,7 +442,7 @@ static int gsm480_tx_invoke(struct gsm_trans *trans, struct msgb *msg, /* Pre-pend the invoke ID */ msgb_push_TLV1(msg, GSM0480_COMPIDTAG_INVOKE_ID, trans->ss.invoke_id);

/* Wrap this up as invoke vomponent */

/* Wrap this up as invoke component */ if (msg_type == GSM0480_MTYPE_FACILITY) msgb_wrap_with_TL_asn(msg, GSM0480_CTYPE_RETURN_RESULT); else

@@ -592,7 +595,7 @@ int ss_send(struct osmocom_ms *ms, const char *code, int new_trans) LOGP(DSS, LOGL_INFO, "Existing transaction.\n"); return gsm480_tx_ussd(trans, GSM0480_MTYPE_FACILITY, code); }

Please remove all unrelated changes from the commit. I am strongly in favor of cleaning up code, but it needs to be done in separate commits, not mixed in with other changes. Mixing changes makes life very difficult for anyone who might need to bisect the code. It is completely unneccessary. It's also really ugly.

...

@@ -751,14 +755,27 @@ static int gsm480_rx_ussd(struct gsm_trans *trans, const uint8_t *data, LOGP(DSS, LOGL_NOTICE, "DSC tag too short\n"); return -EINVAL; }

if (data[0] != ASN1_OCTET_STRING_TAG || tag_len != 1) { LOGP(DSS, LOGL_NOTICE, "Expecting DSC tag\n"); return -EINVAL; }
if (tag_data[0] != 0x0f) {
LOGP(DSS, LOGL_NOTICE, "DSC not 0x0f\n");
return -EINVAL;
switch (tag_data[0])

{
case DCS_GSM7DEAFULT:
case DCS_GSM7DEAFULT2:
case DCS_0f:
	dcs_charset = CGB_GEN_CHARSET7;
	break;

I really like the kernel style for switch:

switch (foo) { case BAR: case BAZ: quux(); break; }

Ie. moving the opening brace, and using one less tab inside.

...

default:
	LOGP(DSS, LOGL_NOTICE, "DSC <%d> unknown\n", tag_data[0]);
	return -EINVAL;
}
len -= tag_data - data + tag_len; data = tag_data + tag_len;

The added newline seems an unrelated change.

...

@@ -767,25 +784,81 @@ static int gsm480_rx_ussd(struct gsm_trans *trans, const uint8_t *data, LOGP(DSS, LOGL_NOTICE, "Text tag too short\n"); return -EINVAL; }

if (data[0] != ASN1_OCTET_STRING_TAG) { LOGP(DSS, LOGL_NOTICE, "Expecting text tag\n"); return -EINVAL; }

The one above too.

...

num_chars = tag_len * 8 / 7;

/* Prevent a mobile-originated buffer-overrun! */

if (num_chars > sizeof(text) - 1)
num_chars = sizeof(text) - 1;
text[sizeof(text) - 1] = '\0';

gsm_7bit_decode(text, tag_data, num_chars);

for (i = 0; text[i]; i++) {
if (text[i] == '\r')
	text[i] = '\n';
if (dcs_charset == CGB_GEN_CHARSET7)

{

Please pay attention to the existing style in code that you work on, and make sure that you follow it exactly. You put the opening brace on the following line, while the rest of the code does not do this.

In any case, please change this if (x == foo) else if (x == bar) code to use switch ().

...

else if (dcs_charset == CGB_GEN_CHARSET16)
{
```
// USC2 charset, see ISO/IEC 10646
```
```
const char *curr_locale;
```

uint8_t hex_or_str; // 1 - print string in utf-8, 0 -print raw hex

The hex_or_str variable name is absolutely horrible. Please change it to something that also conveys the meaning of the value; something self-descriptive. One suggestion would be have_utf8_locale.

...

```
int j = 0;
```

num_chars = tag_len / 2; // since 1 char = 2 bytes

This discards the last surplus byte if there is one. Good!

...

// determine if console locale is UTF-8 and we can print message

```
// if not - print raw hex UTF-8 data
```
```
curr_locale = setlocale(LC_ALL, "");
```

Passing an empty string as locale is an invasive operation, which is why I suggested to pass NULL. Why do you use the empty string instead of NULL?

...

hex_or_str = (strstr(curr_locale, "UTF-8")) ? 1 : 0;

I'd suggest variable = strstr() != NULL;

...

if (!hex_or_str) printf("Received USSD (UFT-8, hex):\n");

Three comments for a single line of code:

First the style, I don't think that conditions and their code is on the same line anywhere else in the code. Following existing style.

Second, I don't believe other parts of the code write directly to any file descriptors, but rather they all use logging macros. Follow existing style.

Third, in the patch there are several occurences of the "UFT-8" typo.

...

```
for (i = 0; i < num_chars; i++) {
```
```
	unsigned char utf8[4];
```
```
	int ucs2, res;
```

	ucs2 = ((tag_data[2*i]<<8)|tag_data[2*i+1]);

Look at how other code uses whitespace and follow existing style.

...

```
	res = ucs2_to_utf8(ucs2, utf8);
```

I would either just include the code here, or write a new function which suits our purposes which converts the full string.

...

```
	if (res > 0) {
```
```
		if (hex_or_str) {
```
```
			memcpy(text+j, utf8, res);
```
```
			j += res;
```
```
		}
```
```
		else {
```
```
			int j;
```

			for (j=0; j<res; j++) printf("%02x", utf8[j]);

```
			printf(" ");
```
```
		}
```
```
	}
```

I still think printf() is not used anywhere else in the code.

And the code makes absolutely no sense. In one case there is a memcpy(), in the other case there is printf().

Also, why is there a memcpy at all - please just write the data to the correct location in the first place. Always strive for zero copy. Thank you.

...

```
if (hex_or_str) {
```
```
	text[j] = '\0';
```
```
}
```
```
else {
```
```
	printf("\n");
```

	sprintf(text, "USSD not printed(locale not UFT-8)");

```
}
```
```
gsm480_ss_result(trans->ms, text, 0);
```

Did you investigate if this function will handle UTF-8 data correctly? I sure don't know that.

...

@@ -935,6 +1008,7 @@ static int gsm480_rx_result(struct gsm_trans *trans, const uint8_t *data, LOGP(DSS, LOGL_NOTICE, "Invoke ID too short\n"); return -EINVAL; }

if (data[0] != GSM0480_COMPIDTAG_INVOKE_ID || tag_len != 1) { LOGP(DSS, LOGL_NOTICE, "Expecting invoke ID\n"); return -EINVAL;

This seems to be an unrelated change.

...

@@ -960,6 +1034,7 @@ static int gsm480_rx_result(struct gsm_trans *trans, const uint8_t *data, LOGP(DSS, LOGL_NOTICE, "Sequence tag too short\n"); return -EINVAL; }

if (data[0] != GSM_0480_SEQUENCE_TAG) { LOGP(DSS, LOGL_NOTICE, "Expecting Sequence Tag, trying " "Operation Tag\n");

And this.

...

@@ -974,6 +1049,7 @@ operation: LOGP(DSS, LOGL_NOTICE, "Operation too short\n"); return -EINVAL; }

if (data[0] != GSM0480_OPERATION_CODE || tag_len != 1) { LOGP(DSS, LOGL_NOTICE, "Expecting Operation Code\n"); return -EINVAL;

And this.

...

@@ -1223,8 +1299,8 @@ static int gsm480_mmss_ind(int mmss_msg, struct gsm_trans *trans, /* pull the MMSS header */ msgb_pull(msg, sizeof(struct gsm48_mmxx_hdr));
LOGP(DSS, LOGL_INFO, "(ms %s) Received est/data '%u'\n", ms->name,
msg_type);
LOGP(DSS, LOGL_INFO, "(ms %s) Received est/data '%u'='%X'\n", ms->name,
msg_type, msg_type);
switch (msg_type) { case GSM0480_MTYPE_RELEASE_COMPLETE:

And this.

...

+++ b/src/shared/libosmocore/include/osmocom/gsm/gsm_utils.h

...

@@ -143,9 +145,51 @@ enum gsm_chan_t { GSM_LCHAN_TCH_H, GSM_LCHAN_UNKNOWN, GSM_LCHAN_CCCH,

GSM_LCHAN_PDTCH, _GSM_LCHAN_MAX

};

Seems unrelated.

...

+/* Data Coding Schemes (DCS), 23.038, 4 */ +#define CGB_GEN 0x00 /* 00xxxxxxb, General Data Coding indication */ +#define CGB_GEN_COM 0x20 /* xx1xxxxxb, text is uncompressed */ +#define CGB_GEN_UNCOM 0x00 /* xx0xxxxxb, text is compressed (23.042) */ +#define CGB_GEN_CLASS 0x10 /* xxx1xxxxb, bits 1 to 0 have a message class meaning */ +#define CGB_GEN_NOCLASS 0x00 /* xxx0xxxxb, bits 1 to 0 are reserved and have no message class */ +#define CGB_GEN_CLASS0 0x00 /* xxxxxx00b, Class 0 */ +#define CGB_GEN_CLASS1 0x01 /* xxxxxx01b, Class 1, Default meaning: ME-specific */ +#define CGB_GEN_CLASS2 0x02 /* xxxxxx10b, Class 2, (U)SIM specific message */ +#define CGB_GEN_CLASS3 0x03 /* xxxxxx11b, Class 3, Default meaning: TE specific (27.005) */ +#define CGB_GEN_CHARSET7 0x00 /* xxxx00xxb, GSM 7 bit default alphabet */ +#define CGB_GEN_CHARSET8 0x04 /* xxxx01xxb, 8 bit data */ +#define CGB_GEN_CHARSET16 0x08 /* xxxx10xxb, UCS2 (16bit) */ +#define CGB_GEN_CHARSETR 0x0C /* xxxx11xxb, Reserved */

+#define CGB_AUTODEL 0x40 /* 01xxxxxxb, Message Marked for Automatic Deletion Group */ +#define CGB_RESERVED 0x8B /* 10001011b, Reserved */

+#define CGB_MWI_DISC 0xC0 /* 1100xxxxb, Message Waiting Indication Group: Discard Message */ +#define CGB_MWI_STOR 0xD0 /* 1101xxxxb, Message Waiting Indication Group: Store Message */ +#define CGB_MWI_INACT 0x00 /* xxxx0xxxb, Set Indication Inactive */ +#define CGB_MWI_ACT 0x08 /* xxxx1xxxb, Set Indication Active */ +#define CGB_MWI_RESERVED 0x00 /* xxxxx0xxb, Reserved */ +#define CGB_MWI_VMAIL 0x00 /* xxxxxx00b, Voicemail Message Waiting */ +#define CGB_MWI_FAX 0x01 /* xxxxxx01b, Fax Message Waiting */ +#define CGB_MWI_EMAIL 0x02 /* xxxxxx10b, Electronic Mail Message Waiting */ +#define CGB_MWI_OTHER 0x03 /* xxxxxx11b, Other Message Waiting */ +#define CGB_MWI_STOR2 0xE0 /* 1110xxxxb, as CGB_MWI_STOR, but text in UCS2 */

+#define CGB_DCMC 0xF0 /* 1111xxxxb, Data coding/message class */ +#define CGB_DCMC_RESERVED 0x08 /* xxxx0xxxb, Reserved */ +#define CGB_DCMC_ALPH7 0x00 /* xxxxx0xxb, GSM 7 bit default alphabet */ +#define CGB_DCMC_ALPH8 0x04 /* xxxxx0xxb, 8 bit alphabet */

+/* Default 7 bit GSM */ +#define DCS_GSM7DEAFULT (CGB_GEN | CGB_GEN_UNCOM | CGB_GEN_NOCLASS | CGB_GEN_CHARSET7) +/* As default, but last bit is 1.

It makes no sense but some providers sends this type of DCS */

+#define DCS_GSM7DEAFULT2 (CGB_GEN | CGB_GEN_UNCOM | CGB_GEN_NOCLASS | CGB_GEN_CHARSET7 | CGB_GEN_CLASS1) +/* This constant was initialy in code */ +#define DCS_0f (CGB_GEN | CGB_GEN_UNCOM | CGB_GEN_NOCLASS | CGB_GEN_CHARSETR | CGB_GEN_CLASS3)

I hope a better name than DSC_0f can be found for it! Please talk to the developer who added the constant, to understand what it is about. Use git blame to find out who it is.

...

+/* Autodel 16 bit UCS2 */ +#define DCS_ADEL_USC2 (CGB_AUTODEL| CGB_GEN_UNCOM | CGB_GEN_NOCLASS | CGB_GEN_CHARSET16)

#endif

Lots of new defines. They're not all required, but I do like that they are all there now.

...

+++ b/src/shared/libosmocore/src/gsm/gsm_utils.c @@ -102,6 +102,30 @@ static unsigned char gsm_7bit_alphabet[] = { 0xff, 0x7d, 0x08, 0xff, 0xff, 0xff, 0x7c, 0xff, 0x0c, 0x06, 0xff, 0xff, 0x7e, 0xff, 0xff };

+/* Convert UCS2 char to UTF8 char */ +int ucs2_to_utf8 (int ucs2, unsigned char * utf8) +{

If you do end up keeping this function then even if it is too trivial to be copyrightable and/or the author doesn't care I think it is polite to at the very least reference it's origin.

Thanks for your effort - I hope you'll spin a new improved patch!

//Peter

Holger Hans Peter Freyther

22 Sep 22 Sep

7:57 p.m.

On Wed, Sep 19, 2012 at 09:49:29PM +0000, Pavel Baturko wrote:

...

Thanks for advises! New version: if received text in UCS2, convert it to UTF-8, then check if console supports UTF-8, if yes - print message in vty as usual, if not - print message in hex.

Hi Pavel,

thanks for updating the patch. Could you remove the calls to setlocale and move the actual ucs->utf8 folding into a new subroutine.

thanks

holger

Pavel Baturko

29 Sep 29 Sep

9:06 p.m.

Hi All,

I fixed issues. Some comments:

...

Are these files always guaranteed to be available, or are some

configure.ac checks required in this change? I'm not sure that checking is required because 'locale.h' is in standard C library. Please correct me if I mistaken.

...

Did you investigate if this function will handle UTF-8 data correctly? I

sure don't know that. UTF-8 text is now not going from gsm480_rx_ussd function since terminal with telnet which is connected to mobile app can not support UTF-8 and there is no way to check it.

...

I hope a better name than DSC_0f can be found

Author of gsm480_ss.c unfortunately does not remember why 0x0f constant is used but if it's works and text is decoded as GSM 7bit - I changed name to default3, hope it's ok.

And now I have 3 variants of patch:

ussd-ucs2.setlocale.patch - with setlocale function. locale is set to "Environment's default locale" with setlocale(LC_ALL, "") and then restored to previous. Calling of setlocale(LC_ALL, NULL) returns "C" (Minimal "C" locale) and this is not what I want, so I have to manually set locale to Env default and check if it's UTF-8. As I understand this variant is not desirable.

ussd-ucs2.pipe.patch - locale is obtained from system shell with 'locale' command. I can't think another method to check locale except calling 'locale' or function 'setlocale'. If someone knows - please let me know.

ussd-ucs2.hex.patch - compromise variant (no external programs calls, no functions that affects whole app). Just convert text from UCS2 to UTF-8 and print hex representation of the string.

Thanks, Pavel

On Sat, Sep 22, 2012 at 1:57 PM, Holger Hans Peter Freyther < holger@freyther.de> wrote:

...

On Wed, Sep 19, 2012 at 09:49:29PM +0000, Pavel Baturko wrote:

...
Thanks for advises! New version: if received text in UCS2, convert it to UTF-8, then check if console supports UTF-8, if yes - print message in

vty

...
as usual, if not - print message in hex.

Hi Pavel,

thanks for updating the patch. Could you remove the calls to setlocale and move the actual ucs->utf8 folding into a new subroutine.

thanks
    holger

Pavel Baturko

21 Jan 21 Jan

7:16 a.m.

Hi list,

Some time ago I proposed patch for supporting UCS2 USSD decoding, there was a discussion, but then unfortunately I have not received a response. Now I'm back to this subject and made small changes to decode UCS2 SMS by reusing code from USSD. But before sending new patch for SMS (and resend updated patch for USSD) I'd like to re-ask my last question in this thread: what to do with UTF-8 text after converting from UCS2? I sent patches with 3 variants: 1) Use setlocale to get available locales and set utf if possible and then print message. 2) Call external *nix command "locale" to check if utf is available and then print message. 3) Just print hex (with osmo_hexdump e.g.) I prefer #2, #1 may be also ok, #3 is bad because local (non-English) GSM providers often send service SMS/USSD messages in UCS2 and in that case reading hex instead of text is awful, also reading usual SMS from other MSs in hex uncomfortable too. Another problem is that using all of these methods makes impossible to read received SMS/USSD in VTY because AFAIU there is no way to check if console with VTY will be ok with UTF. Maybe it's possible to add flag to indicate that VTY can receive text in UTF? This flag may also cancel auto-detection of terminal's locale and shift responsibility to the user of this flag. But I'm not sure where this flag should be.. In my tests I just send decoded UTF text to VTY and all works fine. Another variant is inform user in VTY that message is printed in mobile log.

So, what variant to choose when checking if terminal supports UTF? And how to interact with VTY in case of UTF strings?

Thanks, Pavel

Harald Welte

20 Sep 20 Sep

6:23 p.m.

On Wed, Sep 19, 2012 at 04:12:57AM +0200, Peter Stuge wrote:

...

Holger Hans Peter Freyther wrote:

...
I think a library should not call setlocale as it will impact the entire application. I think our only other realistic option is to use iconv (and stub it out for osmocomBB). Do you think you could try to create such a patch?

I'd suggest http://www.lemoda.net/c/ucs2-to-utf8/ and simply output UTF-8.

interesting, but unfortunately comes without any indication of license. I will contact the author.

-- - Harald Welte laforge@gnumonks.org http://laforge.gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6)

Harald Welte

8:55 p.m.

Hi all,

On Thu, Sep 20, 2012 at 10:23:51AM +0200, Harald Welte wrote:

...

...
I'd suggest http://www.lemoda.net/c/ucs2-to-utf8/ and simply output UTF-8.

interesting, but unfortunately comes without any indication of license. I will contact the author.

He just responded to me that we can use the function under the terms of GNU GPL v2 or later.

Regards, Harald

4911

Age (days ago)

4913

Last active (days ago)

baseband-devel@lists.osmocom.org

9 comments

4 participants

tags (0)

participants (4)

Harald Welte
Holger Hans Peter Freyther
Pavel Baturko
Peter Stuge