Project

General

Profile

Actions

Bug #5808

closed
JB PA

AdiIRC $dll return not stopping on NUL

Bug #5808: AdiIRC $dll return not stopping on NUL

Added by JD Byrnes 10 months ago. Updated 10 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Scripting
Target version:
Start date:
01/22/2025
Due date:
% Done:

0%

Estimated time:
Operative System:
All
Regression:
No

Description

Additionally, I can't even post the error on here.

JB Updated by JD Byrnes 10 months ago Actions #1

I've got this weird issue...

Background:
There's a particular IRCX client by Microsoft that has a weird encoding issue where it encodes UTF16 as UTF8 (in this case, *** gets erroneously encoded as "🌟")
It's probably a bit much to write the encode/decode in mSL, so I figured I'd write a DLL in rust.

The issue is, when I'm writing to data, the \0 doesn't seem to work as I'd expect and end the string. I thought it was an error in my code, but then I got AdiIRC to display the $asc for the 20th byte (my return string is 19 chars) and sure enough, $asc = 0. Am I missing something, or am I expected to null all of the memory passed in with AdiIRC? I don't use mIRC anymore, but I'm pretty sure a null byte would end the return.

alias testdll {
  var %fn = "C:\Users\jd\Projects\test\target\debug\test.dll" 
  var %result = $dll(%fn, procname, J🌟DThis is a test.)
  echo -a Result: %result - Len $len(%result) ASC $asc($mid(%result, 20, 1))
}
Result: J***DThis is a test.[]st. - Len 23 ASC 0
Result: J***DThis is a test. - Len 19 ASC
Result: J***DThis is a test.[]฀t. - Len 23 ASC 0

Note [] = \0 (replaced it since it wasn't usable in the browser, but shows up on AdiIRC as a square)
AdiIRC v4.4 on Win11 (x64) with mUnicode = true.

JB Updated by JD Byrnes 10 months ago Actions #2

The four occurrences of "***" are me trying to insert a Cherry Blossom Emoji.

It turns out that this issue tracker has an internal error when it's used.

PA Updated by Per Amundsen 10 months ago Actions #3

I believe both adiirc and mirc allows \0 in some cases when mUnicode is true. I don't remember the details, I think it's something to do with some 4 byte unicode characters having \0 in some cases.

If you could share a simple source for the dll call in rust, I can do some tests and compare with mirc.

redmine was updated recently and it broke some things, it was a real pain making it work at all, I will look into the posting bug soon.

JB Updated by JD Byrnes 10 months ago · Edited Actions #4

Per Amundsen wrote in #note-3:

If you could share a simple source for the dll call in rust, I can do some tests and compare with mirc.

https://github.com/ircclient-dev/adiirc-test (hello world, binaries: https://github.com/ircclient-dev/adiirc-test/releases/tag/v0.1.0

Data from AdiIRC: The quick brown fox jumped over the lazy dog
Data from DLL: Hello World\0

Expected Result: Hello, World!
Actual Result: Hello, World!\0n fox jumped over the lazy dog

I expect the Actual Result to be the actual memory, but I'd expect AdiIRC to stop at \0.

Update: Tested on mIRC and the result is "Hello, World!" only.

AFAIK, A Unicode (UTF-16) string can't contain \0. If you were reading a UTF-16 string (native) as UTF-8 or something else, that might be a different situation.

PA Updated by Per Amundsen 10 months ago Actions #5

Thanks a lot, I have compiled it for 32 bit and tested it with both mirc/adiirc. mirc does indeed stop at null byte in this example, but I believe there is a pattern where mirc does not stop at null bytes, related to some 2 or 4 byte unicode or utf16 characters which can contain null bytes.

I used to have a example with this type of character, but I haven't been able to find it yet.

I can make a new build with a temporary $udll() or something, if you need this behavior right away.

JB Updated by JD Byrnes 10 months ago · Edited Actions #6

Thanks a lot, I have compiled it for 32 bit and tested it with both mirc/adiirc. mirc does indeed stop at null byte in this example, but I believe there is a pattern where mirc does not stop at null bytes, related to some 2 or 4 byte unicode or utf16 characters which can contain null bytes.

There's 65535 possible combinations for a UTF-16 character (not including surrogates), so I tried all of them in a while loop followed by a \0 and mIRC stopped for every single one.

I think you've mixed things up a little. You can have a null byte in a UTF-16 char - ie. the upper 9 bits of an ASCII char will be 0s, resulting in a complete null byte (8 bits).
Windows stores data natively in UTF-16, so naturally you'll have null bytes in strings - but never a null character (both high and low bytes are 0) as it's the string terminator.

You might find a combination of two bytes with something like (bytes) [1, 0, 0, 1] but never [0, 0, 1, 1] or [1, 1, 0, 0] inside strings - that's why it's important to read the UTF-16 string in blocks of 16 bits.

For now, I'm writing null bytes over any data AdiIRC has left behind (see https://github.com/ircclient-dev/irc-dll-template-rs/blob/5a689e96815f85c09dd44d35cea49c2033889e4d/src/lib.rs#L37), but this isn't good practice.

I can make a new build with a temporary $udll() or something, if you need this behavior right away.

Thankyou, but there's really no need, it'd make things more hacky than the already ugly code I've added above.

Update: I just checked and for 4 byte UTF-16 using surrogates: The high surrogate area = U+D800 - U+DBFF and the low surrogate area = U+DC00 - U+DFFF. Neither can be 0000

PA Updated by Per Amundsen 10 months ago Actions #7

  • Category set to Scripting
  • Status changed from New to Resolved
  • Assignee set to Per Amundsen

I was just not explaining the problem very well, but I think I found a solution that should work, it will be in the next version. It might be a very long time for the next version, so let me know if you need the fix before then.

Thanks for the testing. much appreciated.

PA Updated by Per Amundsen 10 months ago Actions #8

  • Status changed from Resolved to Closed
Actions

Also available in: PDF Atom