Project

General

Profile

Demo of $calcint() identifier valid as of Build 2018/11/18

Added by Paul Janson about 6 years ago

/*
{
  FNV1a non-cryptographic hash. By maroon. Demonstrating the big integer math of $calcint()
  and the extension of $base to work with larger numbers. $base and $calcint can now accurately
  use input strings much longer than most people would reasonably ever expect to use.

  $calcint() requires ALL terms MUST be an integer, and the output value is always an integer too.
  If any term has a non-zero fraction, that entire term is handled as if it's zero.
  //echo -a $calcint(4 + 1.1 ) is 4 because the fraction causes 1.1 to be handled as a zero.
  $calcint(4 / 1.1) is same as if $calcint(4 / 0)
  $calcint(4.0) does not have a non-zero fraction, so is handled as 4 not 0

  {
    $base now accepts extemely large input numbers without rounding error or returning $null/zero.
    $base now accepts base 1-64 instead of only 1-36

    Note: just as $base(number,10,32) is not the same base32 as $encode(number,a),
    using $base(number,10,64) is not the same base64 as $encode(number,m)
    $base uses the same 64 characters as in the mime/base64 alphabet does, but in a different order.
    For $base 2-36, $base has always used 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
    For $base 37-64, the alphabet extends to appending the lower-case alphabet then +/
    That means any usage of bases 37-64 handles A-Z and a-z as case-sensitive
    $base alphabet 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/
    mime/base64 =  ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

    $base now lets you store hashes as shorter strings, and is able to handle longer strings without rounding errors:
    //echo -a $base($sha512(abc),16,64)
  }

  For calculation of the 1024-bit FNV1a hash, it needs to be able to:
  (1) XOR a 1024-bit number by an 8 bit number.
  (2) MULTIPLY a 1024-bit number by a 681-bit number
  (3) calculate 1705-bit result of (#2) MOD 2^1024
  for flag h:
  (4) Translate a 1024-bit number from Base10 to Base16
  ( $base now supports these large integers too )
  for flag x:
  (5) Divide a 1024-bit number by various 2^N exponents as large as 2^1023
  (6) Calculate 2^1023 then subtract 1 from it
  (7) XOR a 1024-bit number by a 511-bit number
  (8) Perform LOGICAL AND between 1024-bit number and (item#6)
}
*/
alias fnv1a-1024 { return $fnv1a-generic($1,$2,$3,$4,1024,$0) }
alias fnv1a-512  { return $fnv1a-generic($1,$2,$3,$4, 512,$0) }
alias fnv1a-256  { return $fnv1a-generic($1,$2,$3,$4, 256,$0) }
alias fnv1a-128  { return $fnv1a-generic($1,$2,$3,$4, 128,$0) }
alias fnv1a-64   { return $fnv1a-generic($1,$2,$3,$4,  64,$0) }
alias fnv1a-32   { return $fnv1a-generic($1,$2,$3,$4,  32,$0) }
/*
{
  Syntax: $fnv1a-X([text|binvar|filename],[bits1-X],[mode 0-2],[flags hmx])
  ... where X is any of 32/64/128/256/512/1024 i.e. $fnv1a-32() $fnv1a-1024()
  Each hash uses different constants and prime numbers, so $fnv1a-32 is NOT the 1st 32 bits of $fnv1a-64

  $1 = input string to be hashed
  $2 = bits of hash returned. valid 1-32 for fnv1a-32, 1-64 for fnv1a-64 etc
  .    default/$null is full bits for that hashsize. Fewer bits has less value. i.e. 1 = always 0 or 1
  $3 mode: (default = 0)
  if 0 then $1 is a text string (or if blank or not used)
  if 1 then $1 is name of &binvar
  if 2 then $1 is filename
  $4 flags (default = $null)
  h = returns answer in hex instead of decimal
  m = Brett Mulvey's modification (valid for 32-bit only, figure fnv4 at papa.bretmulvey.com)
  x = use xor-folding when $2 < full hash's bits (See $xorfold() alias below)

  For more info on the FNV1a hash itself:
  https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
  http://www.isthe.com/chongo/tech/comp/fnv/

  FNV1a is *NOT* intended as a cryptographic hash. It is intended as a hash for use by hash tables for
  widely distributing items (keys) into different 'buckets', especially those having similar strings.
  While in machine code $FNV1a-512 is much faster than $sha512(), the overhead of scripting the hash
  makes this alias much slower than $sha512()

  32-bit and 64-bit test vectors at https://tools.ietf.org/html/draft-eastlake-fnv-06

  Examples:
  //echo -a $fnv1a-32(foobar,,0,h) test vector should match 0xbf9cf968
  //bset -t &v 1 foobar | bset &v -1 0 | echo -a $fnv1a-32(&v,,1,h) test vector should match 0x0c1c9eb8
  //echo -a $fnv1a-64(foobar,,0,h) test vector should match 0x85944171f73967e8
  //bset -t &v 1 foobar | bset &v -1 0 | echo -a $fnv1a-64(&v,,1,h) test vector should match 0x34531ca7168b8f38

  Note: I haven't found any official test vectors for the longer FNV hashes. The closest is an implementation at
  https://github.com/jslicer/FNV-1a/tree/master/Fnv1aTests which posts a few input/output pairs. These aliases
  agree with all their test vectors, except for a couple where their vector has the highest nibble wrong, possibly
  due to their handling numbers as if signed. In cases where this alias disagrees with that github project, I
  have calculated manually using Googol+ at http://www.atelierweb.com/products/googol/ and verified that AdiIRC's
  calculations were the correct ones.

  Above included examples of using mode=1 where a binary variable is the string being hashed.

  mode=2 reads a disk file into a binary variable to calculate the hash from there. Note that even though FNV1a is
  a fast hash, it's still slow when performed by a high-level scripting language. This next command can freeze
  your client a couple minutes while hashing more than a megabyte.
  //var %ticks $ticks | echo -a the FNV1a-32 hash of your client is $fnv1a-32($mircexe,,2,h) ticks: $calc($ticks - %ticks)

  The syntax is the same for all 6 hash lengths, except that the 'm' flag is ignored except at the 32 bit length.
  //echo -a $fnv1a-128(test,,,h) is the same as $fnv1a-128(test,,,hm) because m is ignored if non-32

  The m flag is valid only for 32-bit, and uses the mixing steps recommented by Brett Mulvey to improve avalanche
  //echo -a $fnv1a-32(foobar,,0,hm) returns 950A6281
  For reasons explained at his site, I have not yet included the mixing step for the other hashes. The longer hashes
  have primes whose bit-length is longer compared to the hash's length, so strings must be fairly long until the
  hash no longer contains a long string of zero bits. In the next example, there must be 47 a's before the last
  remnants of the long string of zero bits is gone. If the initialization offset-basis were zero, it would need 77 a's.
  //echo -a $fnv1a-1024($str(a,16),,,h)

  Note how little variation there is in the top 5 hex digits when 'm' isn't used, and how the lowest bit alternates odd/even:
  //var %a $asc(a) | while (%a isnum $asc(a) $+ - $+ $asc(z)) { echo -a hash of $chr(%a) is $fnv1a-32($chr(%a),,,h) with mod is $fnv1a-32($chr(%a),,,hm) | inc %a }

  When reducing the bits in the output, xor-folding ensures the 'thrown away' bits affect the remaining bits
  //echo -a $fnv1a-32(foobar,32,,h) vs $fnv1a-32(foobar,24,,h) vs $fnv1a-32(foobar,24,,hx)

  Another way to have a 32-bit hash is to use the 64-bit variant then xor-fold bits 1-32 against bits 33-64
  //echo -a $fnv1a-64(foobar,64,,h) vs $fnv1a-64(foobar,32,,h) vs $fnv1a-64(foobar,32,,hx)

  If output is in hex, the string is zero-padded to the requested hash length
  //echo -a $fnv1a-32(abcde,,,hm) vs $fnv1a-32(abcde,28,,hm) vs $fnv1a-32(abcde,20,,hm)

  $fnv1a-32(string,32) can be used interchangeably with the same syntax used by $hash(string,32)
  $hash only has the strength of 24 bits, and was a very poor hash as explained at https://en.wikichip.org/wiki/mirc/identifiers/$hash
  These examples use hex output to demonstrate bit lengths, but is not required to use that.
  //echo -a $fnv1a-32(test,,,h) is the equivalent of $base( $fnv1a-32(test) ,10,16,8)
}

This alias uses the method of XORFOLD recommended at the FNV website
{
  if x bits needed < half total bits:  hash = (((hash>>x) ^ hash) & TINY_MASK(x));
  else                                 hash =   (hash>>x) ^ (hash &  BIG_MASK(x));
  mask is binary string of x 1's
}
; $1 is the hash string as base10 integer, should be in the range 0 through 2^$3 -1
. $2 is # bits of the hash to return. should be > 1 and < $3
. $3 is bits of full hash, ie 32/64/128
*/

alias -l xorfold {
  if ( $2 < $calc($3 / 2)) { var %a $and($xor( $calcint($1 / 2^$2) , $1 )  , $calcint(2^$2 -1) ) }
  else { var %a $xor( $calcint($1 / 2^$2) , $and($1,$calcint(2^$2 -1)) ) }
  return %a
}
/*
{
  The fnv1a-generic alias is not intended to be used directly, but is instead sharing similar
  code for all the various bit lengths.

  $1 = input string to be hashed
  $2 = bits of hash to return (valid: 2-hash_length)
  $3 = mode: 0 (default) $1 is text, 1: $1 is &binvar 2: $1 is filename
  $4 = if contains h, then return hash as hex string instead of base10 integer
  .    if contains m, and $5 is 32, then use the Brett Mulvey modification
  .    other bit lengths would need to use different unknown m-shifts
  .    if contains x, then if $2 is >= 1 and $2 < $5, use xor-folding instead of shortening via bit-mask
  $5 = hash lengths valid: 32 64 128 256 512 1024
  $6 = original fnv1a-X's count of parameters used. invalid if 0 or >6
}
*/

alias -l fnv1a-generic {
  if ($builddate < 2018/11/18 UTC) { var %err requires build $v2 or greater | goto syntax }
  ; syntax msg if not used as /command or used as $identifier instead of $identifier()
  if (!$isid || !$6) goto syntax | if ($6 > 4) { var %err unexpected parameters | goto syntax }
  var %err , %bits , %token $findtok(32 64 128 256 512 1024,$5,1,32) , %strlen $5 / 4 , %binvar &maroon.fnv1a
  if (!%token) { var %err invalid hash length $5 | goto syntax }
  var %offset_basis $gettok(2166136261 14695981039346656037 144066263297769815596495629667062367629 100029257958052580907070968620625704837092796014241193945225284501741471925557 9659303129496669498009435400716310466090418745672637896108374329434462657994582932197716438449813051892206539805784495328239340083876191928701583869517785 14197795064947621068722070641403218320880622795441933960878474914617582723252296732303717722150864096521202355549365628174669108571814760471015076148029755969804077320157692458563003215304957150157403644460363550505412711285966361610267868082893823963790439336411086884584107735010676915,%token,32)
  var %prime $gettok(16777619 1099511628211 309485009821345068724781371 374144419156711147060143317175368453031918731002211 35835915874844867368919076489095108449946327955754392558399825615420669938882575126094039892345713852759 5016456510113118655434598811035278955030765345404790744303017523831112055108147451509157692220295382716162651878526895249385292291816524375083746691371804094271873160484737966720260389217684476157468082573,%token,32)
  var %mod $calcint(2^$5)

  if ($2 != $null) {
    if (0 $+ $2 !isnum 2- $+ $5) { var %err invalid $2 vs $v2 bits | goto syntax }
    var %strlen $ceil($calc($2 /4))
  }
  var %mode 0 $+ $regsubex($3,/[^0-9]/g,) | if (!$istok(0 00 01 02,0 $+ $3,32)) { var %err invalid mode $3 | goto syntax }

  ; next line can be removed when bug fixed: //if (00 != 0) echo -a this is a bug
  var %mode $calc(%mode)

  if (%mode == 0) {
  noop $regsubex(temp,$1,,, [ %binvar ] ) }
  elseif (%mode == 1) {
    if (!$bvar($1)) { var %err missing $1 binvar | goto syntax }
    var %binvar $1
  }
  elseif (%mode == 2) {
    if (!$isfile($1)) { var %err missing $1 file | goto syntax }
    bread $qt($1) 0 $file($1).size %binvar
    if ($file($1).size != $bvar(%binvar,0)) { var %err unable to read $1 file | goto syntax }
  }
  var %len $bvar(%binvar,0) , %i 0 , %hash %offset_basis

  while (%i < %len) {
    !inc %i
    !var %hash $xor(%hash,$bvar(%binvar,%i))
    !var %hash $calcint( (%hash * %prime) % %mod )
  }
  if (($5 == 32) && (m isin $4)) {
    /*
    hash += hash << 13; (same as hash * 8193)
    hash ^= hash >> 7;  (same as hash / 128. $xor ignores fraction so don't need $int even if $calcint created them)
    hash += hash << 3;  (same as hash * 9)
    hash ^= hash >> 17; (same as hash / 131072)
    hash += hash << 5;  (same as hash * 33)
    */
    var %hash $calcint((%hash * 8193) % 4294967296)
    var %hash $calcint(($xor(%hash,$calcint(%hash /128))    *    9) % 4294967296)
    var %hash $calcint(($xor(%hash,$calcint(%hash /131072)) *   33) % 4294967296)
  }

  if ($2 isnum) { if (x isin $4) var %hash $xorfold(%hash,$2,$5) | else var %hash $and(%hash,$base($str(1,$2),2,10)) }
  if (h isin $4) return $base(%hash,10,16,%strlen) | return %hash
  :syntax
  echo -sc info * $ $+ fnv1a- $+ $5 $+ (text|binvar|filename,[2- $+ $5 $+ ],[0-2],[h $+ $iif($5 == 32,mx,x) $+ ]) %err
} ; by maroon 2018