Project

General

Profile

Actions

Bug #5808

closed

AdiIRC $dll return not stopping on NUL

Added by JD Byrnes about 1 month ago. Updated 28 days ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Scripting
Target version:
Start date:
01/22/2025
Due date:
% Done:

0%

Estimated time:
Operative System:
All
Regression:
No

Description

Additionally, I can't even post the error on here.

Actions #1

Updated by JD Byrnes about 1 month ago

I've got this weird issue...

Background:
There's a particular IRCX client by Microsoft that has a weird encoding issue where it encodes UTF16 as UTF8 (in this case, *** gets erroneously encoded as "🌟")
It's probably a bit much to write the encode/decode in mSL, so I figured I'd write a DLL in rust.

The issue is, when I'm writing to data, the \0 doesn't seem to work as I'd expect and end the string. I thought it was an error in my code, but then I got AdiIRC to display the $asc for the 20th byte (my return string is 19 chars) and sure enough, $asc = 0. Am I missing something, or am I expected to null all of the memory passed in with AdiIRC? I don't use mIRC anymore, but I'm pretty sure a null byte would end the return.

alias testdll {
  var %fn = "C:\Users\jd\Projects\test\target\debug\test.dll" 
  var %result = $dll(%fn, procname, J🌟DThis is a test.)
  echo -a Result: %result - Len $len(%result) ASC $asc($mid(%result, 20, 1))
}
Result: J***DThis is a test.[]st. - Len 23 ASC 0
Result: J***DThis is a test. - Len 19 ASC
Result: J***DThis is a test.[]฀t. - Len 23 ASC 0

Note [] = \0 (replaced it since it wasn't usable in the browser, but shows up on AdiIRC as a square)
AdiIRC v4.4 on Win11 (x64) with mUnicode = true.

Actions #2

Updated by JD Byrnes about 1 month ago

The four occurrences of "***" are me trying to insert a Cherry Blossom Emoji.

It turns out that this issue tracker has an internal error when it's used.

Actions #3

Updated by Per Amundsen about 1 month ago

I believe both adiirc and mirc allows \0 in some cases when mUnicode is true. I don't remember the details, I think it's something to do with some 4 byte unicode characters having \0 in some cases.

If you could share a simple source for the dll call in rust, I can do some tests and compare with mirc.

redmine was updated recently and it broke some things, it was a real pain making it work at all, I will look into the posting bug soon.

Actions #4

Updated by JD Byrnes 30 days ago · Edited

Per Amundsen wrote in #note-3:

If you could share a simple source for the dll call in rust, I can do some tests and compare with mirc.

https://github.com/ircclient-dev/adiirc-test (hello world, binaries: https://github.com/ircclient-dev/adiirc-test/releases/tag/v0.1.0

Data from AdiIRC: The quick brown fox jumped over the lazy dog
Data from DLL: Hello World\0

Expected Result: Hello, World!
Actual Result: Hello, World!\0n fox jumped over the lazy dog

I expect the Actual Result to be the actual memory, but I'd expect AdiIRC to stop at \0.

Update: Tested on mIRC and the result is "Hello, World!" only.

AFAIK, A Unicode (UTF-16) string can't contain \0. If you were reading a UTF-16 string (native) as UTF-8 or something else, that might be a different situation.

Actions #5

Updated by Per Amundsen 29 days ago

Thanks a lot, I have compiled it for 32 bit and tested it with both mirc/adiirc. mirc does indeed stop at null byte in this example, but I believe there is a pattern where mirc does not stop at null bytes, related to some 2 or 4 byte unicode or utf16 characters which can contain null bytes.

I used to have a example with this type of character, but I haven't been able to find it yet.

I can make a new build with a temporary $udll() or something, if you need this behavior right away.

Actions #6

Updated by JD Byrnes 28 days ago · Edited

Thanks a lot, I have compiled it for 32 bit and tested it with both mirc/adiirc. mirc does indeed stop at null byte in this example, but I believe there is a pattern where mirc does not stop at null bytes, related to some 2 or 4 byte unicode or utf16 characters which can contain null bytes.

There's 65535 possible combinations for a UTF-16 character (not including surrogates), so I tried all of them in a while loop followed by a \0 and mIRC stopped for every single one.

I think you've mixed things up a little. You can have a null byte in a UTF-16 char - ie. the upper 9 bits of an ASCII char will be 0s, resulting in a complete null byte (8 bits).
Windows stores data natively in UTF-16, so naturally you'll have null bytes in strings - but never a null character (both high and low bytes are 0) as it's the string terminator.

You might find a combination of two bytes with something like (bytes) [1, 0, 0, 1] but never [0, 0, 1, 1] or [1, 1, 0, 0] inside strings - that's why it's important to read the UTF-16 string in blocks of 16 bits.

For now, I'm writing null bytes over any data AdiIRC has left behind (see https://github.com/ircclient-dev/irc-dll-template-rs/blob/5a689e96815f85c09dd44d35cea49c2033889e4d/src/lib.rs#L37), but this isn't good practice.

I can make a new build with a temporary $udll() or something, if you need this behavior right away.

Thankyou, but there's really no need, it'd make things more hacky than the already ugly code I've added above.

Update: I just checked and for 4 byte UTF-16 using surrogates: The high surrogate area = U+D800 - U+DBFF and the low surrogate area = U+DC00 - U+DFFF. Neither can be 0000

Actions #7

Updated by Per Amundsen 28 days ago

  • Category set to Scripting
  • Status changed from New to Resolved
  • Assignee set to Per Amundsen

I was just not explaining the problem very well, but I think I found a solution that should work, it will be in the next version. It might be a very long time for the next version, so let me know if you need the fix before then.

Thanks for the testing. much appreciated.

Actions #8

Updated by Per Amundsen 28 days ago

  • Status changed from Resolved to Closed
Actions

Also available in: Atom PDF