OpenGL MD5 cracker

Brute-force MD5 hashes on a GPU by using OpenGL vertex shaders for parallelism. The keyspace can be salt-prefixed and restricted to a certain character range.

While working on a related project, the thought of playing around with OpenGL vertex shaders for GPGPU computing came up. As an easy example, I chose exhaustive md5 keyspace search – brute-forcing hashes seems a natural fit, as it consists of many but easy tasks that are virtually independent from each other.

An md5 checksum computation on short strings with output comparison requires only few data to be transferred, but a high amount of (unfortunately, discrete) calculations. However, it allows independent runs in parallel and the general md5 algorithm is implementable with little effort.

In order to maximize parallelism and minimize memory transfer, the search space is split into chunks of the length of available shader instances and all vertex shaders run on the same input data with their own unique offset. Hashes are computed in bulk operations and the CPU part thus checks for a successful match only from time to time, possibly backtracking to the actual result.

Cracking MD5 hashes

Building is simply done via make and requires headers and libraries for the -lGLEW, -lglfw, and -lGL linker flags (installable e.g. via apt-get install libglew-dev libglfw3-dev). Optionally, -lcrypto is needed for a built-in startup selftest.

Run the resulting binary on a hash in 32 byte hex representation as argument. Progress will be shown periodically.

./ogl_md5 "81dc9bdb52d04dc20036dbd8313ed055"
looking for db9bdc81 c24dd052 d8db3600 55d03e31...
will use 35978 vertices
will use 75 chars, 32068201/2405115075 aaaaa/_____ min/max hi/lo
performing self-test...
starting...
checking 'yGDYe' (179890000,0+179890000)...
FOUND: will re-draw 5000 rounds
FOUND: will check result
FOUND: feedback 2147492596, shader 8948, round 677, offset 24357106,0+24357106+8948
FOUND: result '1234'
took 214msec

Please note that salting (static plaintext prefix) and custom character ranges are basically supported but not yet exposed for simple command-line usage. Hard-coding should be easily doable, though.

Example results

Typically, 109 hashes per second (1G/s) can be achieved on a machine equipped with a Core i5-2400 and a GTX 1050 Ti. For example, when allowing 75 different chars, everything up to 6 digits will take a few minutes at max, up to 8 digits will already take a week or two.

This is not a big leap in the long run when considering plain CPU-only implementations – but could surely be further optimized, i.e. as a GTX 560 DCII runs similarly fast. Further restricting the character search range by prior knowledge is thus advisable.

Code & Download