Bug #5808
closedAdiIRC $dll return not stopping on NUL
0%
Description
Additionally, I can't even post the error on here.
Updated by JD Byrnes about 1 month ago
I've got this weird issue...
Background:
There's a particular IRCX client by Microsoft that has a weird encoding issue where it encodes UTF16 as UTF8 (in this case, *** gets erroneously encoded as "í ¼í¼Ÿ")
It's probably a bit much to write the encode/decode in mSL, so I figured I'd write a DLL in rust.
The issue is, when I'm writing to data, the \0 doesn't seem to work as I'd expect and end the string. I thought it was an error in my code, but then I got AdiIRC to display the $asc for the 20th byte (my return string is 19 chars) and sure enough, $asc = 0. Am I missing something, or am I expected to null all of the memory passed in with AdiIRC? I don't use mIRC anymore, but I'm pretty sure a null byte would end the return.
alias testdll { var %fn = "C:\Users\jd\Projects\test\target\debug\test.dll" var %result = $dll(%fn, procname, Jí ¼í¼ŸDThis is a test.) echo -a Result: %result - Len $len(%result) ASC $asc($mid(%result, 20, 1)) }
Result: J***DThis is a test.[]st. - Len 23 ASC 0 Result: J***DThis is a test. - Len 19 ASC Result: J***DThis is a test.[]t. - Len 23 ASC 0
Note [] = \0 (replaced it since it wasn't usable in the browser, but shows up on AdiIRC as a square)
AdiIRC v4.4 on Win11 (x64) with mUnicode = true.
Updated by JD Byrnes about 1 month ago
The four occurrences of "***" are me trying to insert a Cherry Blossom Emoji.
It turns out that this issue tracker has an internal error when it's used.
Updated by Per Amundsen about 1 month ago
I believe both adiirc and mirc allows \0 in some cases when mUnicode is true. I don't remember the details, I think it's something to do with some 4 byte unicode characters having \0 in some cases.
If you could share a simple source for the dll call in rust, I can do some tests and compare with mirc.
redmine was updated recently and it broke some things, it was a real pain making it work at all, I will look into the posting bug soon.
Updated by JD Byrnes 30 days ago · Edited
Per Amundsen wrote in #note-3:
If you could share a simple source for the dll call in rust, I can do some tests and compare with mirc.
https://github.com/ircclient-dev/adiirc-test (hello world, binaries: https://github.com/ircclient-dev/adiirc-test/releases/tag/v0.1.0
Data from AdiIRC: The quick brown fox jumped over the lazy dog
Data from DLL: Hello World\0
Expected Result: Hello, World!
Actual Result: Hello, World!\0n fox jumped over the lazy dog
I expect the Actual Result to be the actual memory, but I'd expect AdiIRC to stop at \0.
Update: Tested on mIRC and the result is "Hello, World!" only.
AFAIK, A Unicode (UTF-16) string can't contain \0. If you were reading a UTF-16 string (native) as UTF-8 or something else, that might be a different situation.
Updated by Per Amundsen 29 days ago
Thanks a lot, I have compiled it for 32 bit and tested it with both mirc/adiirc. mirc does indeed stop at null byte in this example, but I believe there is a pattern where mirc does not stop at null bytes, related to some 2 or 4 byte unicode or utf16 characters which can contain null bytes.
I used to have a example with this type of character, but I haven't been able to find it yet.
I can make a new build with a temporary $udll() or something, if you need this behavior right away.
Updated by JD Byrnes 28 days ago · Edited
Thanks a lot, I have compiled it for 32 bit and tested it with both mirc/adiirc. mirc does indeed stop at null byte in this example, but I believe there is a pattern where mirc does not stop at null bytes, related to some 2 or 4 byte unicode or utf16 characters which can contain null bytes.
There's 65535 possible combinations for a UTF-16 character (not including surrogates), so I tried all of them in a while loop followed by a \0 and mIRC stopped for every single one.
I think you've mixed things up a little. You can have a null byte in a UTF-16 char - ie. the upper 9 bits of an ASCII char will be 0s, resulting in a complete null byte (8 bits).
Windows stores data natively in UTF-16, so naturally you'll have null bytes in strings - but never a null character (both high and low bytes are 0) as it's the string terminator.
You might find a combination of two bytes with something like (bytes) [1, 0, 0, 1] but never [0, 0, 1, 1] or [1, 1, 0, 0] inside strings - that's why it's important to read the UTF-16 string in blocks of 16 bits.
For now, I'm writing null bytes over any data AdiIRC has left behind (see https://github.com/ircclient-dev/irc-dll-template-rs/blob/5a689e96815f85c09dd44d35cea49c2033889e4d/src/lib.rs#L37), but this isn't good practice.
I can make a new build with a temporary $udll() or something, if you need this behavior right away.
Thankyou, but there's really no need, it'd make things more hacky than the already ugly code I've added above.
Update: I just checked and for 4 byte UTF-16 using surrogates: The high surrogate area = U+D800 - U+DBFF and the low surrogate area = U+DC00 - U+DFFF. Neither can be 0000
Updated by Per Amundsen 28 days ago
- Category set to Scripting
- Status changed from New to Resolved
- Assignee set to Per Amundsen
I was just not explaining the problem very well, but I think I found a solution that should work, it will be in the next version. It might be a very long time for the next version, so let me know if you need the fix before then.
Thanks for the testing. much appreciated.