Project

General

Profile

Bug #1933

closed

url parser issues

Added by Dmytro Borys about 6 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Interface
Target version:
Start date:
02/25/2015
Due date:
% Done:

0%

Estimated time:
Operative System:
All
Regression:
No

Description

A lot of times when using certain IRC in-channel search services, they return the URL adorned by some kind of parentheses as a decoration on both sides of the URL string. Most of the time, AdiIRC matches such trailing characters as a part of the URL and upon clicking it, tries to load a webpage with such symbol(s) included, which usually ends in 404 or 403 error.

I'm attaching a screenshot in which you can see such behavior. When surrounding an URL with [], the closing bracket gets treated as the part of URL in 2 of 3 cases. Also, sometimes people who care for punctuation post something like "Hey guys, check this link: http:\\blah.blah. It's my new website!". The dot at the end of the url is entended as a sentence separator, not part of the URL but gets included into it by the parser anyway. In my opinion, regex patterns such as "\.\s" or "\.$" should be excluded from the url string by default since there are much more cases of mistakenly treating sentence end dot as a part of an url than there are actual URLs which actually look like that.

This is related to the latest version of the client for 64-bit Windows.


Files

ss+(2015-02-25+at+12.44.26).png (29 KB) ss+(2015-02-25+at+12.44.26).png Dmytro Borys, 02/25/2015 12:44 AM
#1

Updated by Per Amundsen about 6 years ago

  • Category set to Interface
  • Status changed from New to Assigned
  • Assignee set to Per Amundsen
  • Target version set to 1.9.6

I have this on TODO already, but it's a little complicated since "])." are valid URL characters after a /slash, will get around to it.

#2

Updated by Jonathan Kay about 5 years ago

Just to add on top of these examples already listed, I frequently see links (usually from bots) which include the greater-than sign, such as:

<SomeBot> Results: Google Nothing <http://www.google.com/search>

This situation results in the same problem, the link includes the ending > and as expected, navigating to it gives a 404.

#3

Updated by Per Amundsen about 5 years ago

  • Status changed from Assigned to Resolved
  • Target version changed from 1.9.6 to 2.3

I wrote a new parser which will be in next beta, it will remove trailing ") > ] }" if there is a leading one, and a trailing ".".

#4

Updated by Per Amundsen about 5 years ago

  • Status changed from Resolved to Closed

Also available in: Atom PDF