SMS: UCS-2 concatenation with UTF16 emoticons

Keith keith at rhizomatica.org
Mon Nov 12 11:21:56 UTC 2018


Hi all,

I noticed that SMS with emoticons on the boundary of the concatenated
segments are not displaying correctly on the destination handset.

* - imagine the disaster when that kiss-blowing smiley face thing at the
end of your SMS turns out as a diamond with a ? for the recpipient! OMG,
the potential butterfly effect is too much to think about... :)) 

So... This is my analysis:

We have 140 bytes for the message, less the 6 bytes for the user header
data in the case of concatenated SMS , leaving 134 bytes.

It's well know that this means 67 characters per segment for an SMS
using UCS-2 encoding.

But, it we fill the message with emoticons that are using 4 bytes per
'character', then we have space for 134/4 = 33 and a half. Ooops.

Still, the destination handset should reassemble the message and stick
the two "halves" of the emoticon together, right? - except I'm not
observing this.

To rule out us doing something wrong in osmo, I wonder might somebody
else (who has an unlimited SMS package) from a commercial provider try
sending some crafted SM from one emoji-enabled phone to another,
something like this:

😱abcdefghijklmnopqrstuvwxyz123456789abcdefghijklmnopqrstuvwxyz123😱56789

That's likely going to get mangled by mailing list or your mua, so it is:

[4 byte
emoji]abcdefghijklmnopqrstuvwxyz123456789abcdefghijklmnopqrstuvwxyz123[4
byte emoji]56789

You could also try various messages filled with emojis. Of course, if
you bring up your osmo network with SMPP-mirror you can watch the trace
in wireshark, you'll see when the last emoji gets chopped in two. You
could try the same message on osmo and your commerical network.

If it's actually a problem of the phones, you should get

[4 byte
emoji]abcdefghijklmnopqrstuvwxyz123456789abcdefghijklmnopqrstuvwxyz123[two
diamonds with ?]56789


Thanks!

Keith.





More information about the OpenBSC mailing list