Working out the THPS2 PS1 filenames

Discussion about Pre-THUG1 things can go here, such as modding, etc
iamgreaser
Posts: 2
Joined: Sat Aug 05, 2017 7:59 pm

Working out the THPS2 PS1 filenames

Postby iamgreaser » Sat Aug 05, 2017 8:56 pm

First post so it would be unwise of me to post a link right now. I'm bruteforcing the one filename I don't have, and this will take a long time (it's a file from THPS1). You can find it on my GitHub Gist as "THPS2 PS1 (almost) complete filename list".

So for those who don't know, most of the actual data is in cd.wad and the metadata that actually tells you where everything is is cd.hed, which consists of a bunch of 3-uint32_t tuples with the filename_hash, the offset_into_the_wad, and the length. The upside is it's nice and fast to look things up. The downside is it's a nuisance to rip as you lack the filenames.

Here's the hash algorithm. It's a broken version of CRC-32 which may have been "obfuscated"... but for some reason they still kept the usual 0xEDB88320 polynomial so it sticks out like a sore thumb anyway.

Code: Select all

uint32_t namehash(const char *s)
{
   uint32_t csum = (uint32_t)(int32_t)-1;
   for(int i = 0; s[i] != '\x00'; i++) {
      uint32_t a1 = (uint32_t)(uint8_t)s[i];

      // Convert to lower case
      if(a1 >= 'A' && a1 <= 'Z') {
         a1 += 32;
      }

      // Broken CRC-32 implementation
      //
      // Normal CRC-32:
      // * Shift right, if carry then XOR by 0xEDB88320
      //
      // THPS2 Special! Wow! edition:
      // * Rotate left, shift original XOR'd val right, if carry then XOR by 0xEDB88320

      uint32_t v0 = (a1 ^ csum);
      uint32_t bits = (v0 & 0xFF);
      // Reference implementation
      for(int j = 0; j < 8; j++)
      {
         // Rotate left
         csum = ((csum>>31)&1)|(csum<<1);

         // Then apply the polynomial
         if((bits & 1) != 0) {
            csum ^= 0xEDB88320;
         }

         // Shift along
         bits >>= 1;
      }
   }

   return csum;
}


Also, huge shoutouts to Graymatter LTI for including a whole bunch of juicy stuff in the PC version that really shouldn't be there, like, ooh, the build logs and the complete symbol table for the PS1 demo version, and all the definitions used in the TRG files which makes reverse-engineering the maps that much easier. But on a more humble note, they at least kept the filenames fully intact on the PC version, which was invaluable for a few sets of filenames.

EDIT: Oops, the comment said "rotate right" - it's a left rotate.
Last edited by iamgreaser on Mon Aug 07, 2017 2:51 am, edited 1 time in total.

Morten1337
Site Admin
Posts: 308
Joined: Mon Mar 01, 2010 2:23 pm
Location: Norway

Re: Working out the THPS2 PS1 filenames

Postby Morten1337 » Sun Aug 06, 2017 7:54 am

I think this hash function is called crc32-b.
http://www.hackersdelight.org/hdcodetxt/crc.c.txt

iamgreaser
Posts: 2
Joined: Sat Aug 05, 2017 7:59 pm

Re: Working out the THPS2 PS1 filenames

Postby iamgreaser » Mon Aug 07, 2017 3:04 am

crc32b in that file is just an implementation of the usual zlib crc32.

It looks suspiciously like they took a working version of CRC-32, then decided to rotate the output to the left instead of shifting it to the right. The math is completely ruined but at least it keeps a somewhat decent level of entropy.

Oh, and now that you know I'm not a filthy spambot, here's the filename list. Have fun.

https://gist.github.com/iamgreaser/ee48 ... 8494703730


Return to “Legacy THPS”

Who is online

Users browsing this forum: No registered users and 1 guest