20096 total geeks with 3178 solutions
Recent challengers:
 Welcome, you are an anonymous user! [register] [login] Get a yourname@osix.net email address 

Articles

GEEK

User's box
Username:
Password:

Forgot password?
New account

Shoutbox
Domuk
No, not an issue with the PHP - I was responding to "AJAX not being cross site is annoying"
MaxMouse
Really? i thought that would only be important if the user had some kind of control over where the XML came from, if you hard code it (As in a PHP file) wouldn't that eliminate XSS attacks?
Domuk
Yes, but very, very necessary. AJAX requests run in the context of the browser, there'd be no security if it was cross-domain .
MaxMouse
AJAX not being cross site is annoying, all other scripts can be used in that way, having to resort to PHP to patch it is a shame.
SAJChurchey
thx MaxMouse

Donate
Donate and help us fund new challenges
Donate!
Due Date: Nov 30
November Goal: $40.00
Gross: $0.00
Net Balance: $0.00
Left to go: $40.00
Contributors


News Feeds
The Register
MySpace makes peace
with Indies
Nvidia previews
next-gen Fermi GPUs
Potty-mouths
charged for Comcast
hijack
Microsoft
Silverlight - now
with hidden Windows
bias
Apple cult leader
emails outside
world
Sony demos monster
3D TV
Wrecking CRU:
hackers cause
massive climate
data breach
Skinny Acer
notebook delivers
six-day battery
life
VTOL gyro-copter
flying car mates
with killer robot
Oracle begs EC for
more time
Slashdot
iPhone Owners
Demand To See Apple
Source Code
Proton Beams Sent
Around the LHC
Microsoft"s Lack of
Nightly Builds For
IE
Some Claim Android
App Store Worse
Than iPhone"s
Climatic Research
Unit Hacked, Files
Leaked
Aging Nuclear
Stockpile Good For
Decades To Come
Netbooks Have
Higher Failure Rate
Than Laptops
Xbox Live Class
Action Being
Investigated
Patent Issued For
Podcasting
Linus Torvalds For
Nobel Peace Prize?
Article viewer

The mysterious MD5 algorithm

Written by:paranoia
Published by:paranoia
Published on:2004-03-28 22:55:15
Topic:Security
Search OSI about Security.More articles by paranoia.
 viewed 19673 times send this article printer friendly

Digg this!
    Rate this article :
The MD5 algorithm is quite possibly the most widely used digest algorithm out there. So of course, being the geek you are, you want to know how it works. Read on.

The MD5 algorithm is quite possibly the most widely used digest algorithm out there. So of course, being the geek you are, you want to know how it works. Read on.

First you must think of your message, not as a sequence of bytes, but as a sequence of bits (but only for a short while). The MD5 algorithm will accept messages that are any arbitrary number of bits long. However, so that the algorithm can process the data, it begins by padding the message to a length it can handle. This length just happens to be any number such that length mod 512 is equal to 448, or 64 bits short of being a multiple of 512. The message is padded by first appending a 1 bit to the message, and then enough 0 bits to make the message the proper length. The 1 bit is always added, so even if the message is already the proper length, it will be padded (a message can be padded with anywhere from 1 to 512 bits).

The next step is to calculate the length of the message (before padding). This number is then appended as the last 64 bits of the message, making the message length a multiple of 512. If the message happens to be greater than 2^64 bits long, then the least significant 64 bits of the message are used (length mod 2^64).

The MD5 algorithm uses 4 state variables, each of which is a 32 bit integer (an unsigned long on most systems). These variables are sliced and diced and are (eventually) the message digest. The variables are initialized as follows:

A = 0x67452301
B = 0xEFCDAB89
C = 0x98BADCFE
D = 0x10325476


Now on to the actual meat of the algorithm: the main part of the algorithm uses four functions to thoroughly goober the above state variables. Those functions are as follows:

F(X,Y,Z) = (X & Y) | (~(X) & Z)
G(X,Y,Z) = (X & Z) | (Y & ~(Z))
H(X,Y,Z) = X ^ Y ^ Z
I(X,Y,Z) = Y ^ (X | ~(Z))


Where &, |, ^, and ~ are the bit-wise AND, OR, XOR, and NOT operators (respectively) that all C programmers should be familiar with.

These functions, using the state variables and the message as input, are used to transform the state variables from their initial state into what will become the message digest. For each 512 bits of the message, the following is performed (this is only pseudo-code, don’t try to compile it):

/* Group the 512 bit message into 16 different 32 bit chunks */
For j = 0 to 15 do
  Set X[j] to MessageBits[j*32] through MessageBits[j*32 + 31]
end

/* Store the digest variables out of harms way for the time being */
AA = A
BB = B
CC = C
DD = D

/* Round 1. */
/* Let [abcd k s i] denote the operation:
   a = b + ((a + F(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
 [ABCD 0 7 1] [DABC 1 12 2] [CDAB 2 17 3] [BCDA 3 22 4]
 [ABCD 4 7 5] [DABC 5 12 6] [CDAB 6 17 7] [BCDA 7 22 8]
 [ABCD 8 7 9] [DABC 9 12 10] [CDAB 10 17 11] [BCDA 11 22 12]
 [ABCD 12 7 13] [DABC 13 12 14] [CDAB 14 17 15] [BCDA 15 22 16]

/* Round 2. */
/* Let [abcd k s i] denote the operation:
   a = b + ((a + G(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
 [ABCD 1 5 17] [DABC 6 9 18] [CDAB 11 14 19] [BCDA 0 20 20]
 [ABCD 5 5 21] [DABC 10 9 22] [CDAB 15 14 23] [BCDA 4 20 24]
 [ABCD 9 5 25] [DABC 14 9 26] [CDAB 3 14 27] [BCDA 8 20 28]
 [ABCD 13 5 29] [DABC 2 9 30] [CDAB 7 14 31] [BCDA 12 20 32]

/* Round 3. */
/* Let [abcd k s t] denote the operation
   a = b + ((a + H(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
 [ABCD 5 4 33] [DABC 8 11 34] [CDAB 11 16 35] [BCDA 14 23 36]
 [ABCD 1 4 37] [DABC 4 11 38] [CDAB 7 16 39] [BCDA 10 23 40]
 [ABCD 13 4 41] [DABC 0 11 42] [CDAB 3 16 43] [BCDA 6 23 44]
 [ABCD 9 4 45] [DABC 12 11 46] [CDAB 15 16 47] [BCDA 2 23 48]

/* Round 4. */
/* Let [abcd k s t] denote the operation
   a = b + ((a + I(b,c,d) + X[k] + T[i]) <<< s). */
/* Do the following 16 operations. */
 [ABCD 0 6 49] [DABC 7 10 50] [CDAB 14 15 51] [BCDA 5 21 52]
 [ABCD 12 6 53] [DABC 3 10 54] [CDAB 10 15 55] [BCDA 1 21 56]
 [ABCD 8 6 57] [DABC 15 10 58] [CDAB 6 15 59] [BCDA 13 21 60]
 [ABCD 4 6 61] [DABC 11 10 62] [CDAB 2 15 63] [BCDA 9 21 64]

/* And finally update the state variables */
A+=AA
B+=BB
C+=CC
D+=DD


(If you are confused on what ‘<<<’ does, let it be known that it represents a bit-wise rotation to the left, e.g. 110011 <<< 2 = 001111)

After this step, the message digest is stored in the state variables (A, B, C, and D). To get it into the hexadecimal form you are used to seeing, output the hex values of each the state variables, least significant byte first. For example, if after the digest:

A = 0x01234567;
B = 0x89ABCDEF;
C = 0x1337D00D
D = 0xA5510101


Then the message digest would be:

67452301EFCDAB890DD03713010151A5


And that’s how you calculate MD5 digests of, well, of whatever you want to calculate MD5 digests of.

Additional information on the MD5 algorithm can be found in the MD5 RFC (#1321), which I used as my one and only reference in writing this article. In fact, my article’s sole purpose seems to be in slightly simplifying the RFC (that and to serve as a way to get my MD5 class out to the world...). Anyway, RFC #1321 can be found at http://www.faqs.org/rfcs/rfc1321.html.

Accompanying code for my article can be obtained by sending me an e-mail or pm asking for it (nicely). Although if it is possible, it would probably be easier on everyone if the source was hosted on the site somewhere. This code is also largely copied from the RFC on MD5, although I encapsulated it in a class and cleaned it up slightly for readability reasons. I also added a few comments on some of the stuff in the header, just to clarify the purpose. The class has been tested and is RFC compatible. I also included a small example program (code only) to demonstrate how to use the class. I hold no responsibility for anything you do with this program or for anything this program does to you. The code was created and compiled successfully using MSVC++ 6, for which I have included a project file and workspace. For the rest of you, some tweaking may be necessary for it to work.

Did you like this article? There are hundreds more.

Comments:
Obscurity
2004-03-29 00:08:50
Lengthy, abit hard to understand for a novice to anything related to encryption (myself), but I thought it was a bloody good read. Good job.
paranoia
2004-03-29 01:35:24
yeah...md5 can be a bit complicated...not so much for the algorithm (which imho is relatively simple) but just because of the large number of steps...and now i just realized something is wrong with my description...

for everyone wondering what the T[] array is, it is an array of length 64, where T[i] is equal to 4294967296 times abs(sin(i)), where i is in radians...that should clear that up (these values are actually hardcoded into the md5 algo in the code).

Also, if the algorithm still seems confusing, the code might make more sense. that code can be found at: http://xbox.lenaxia.net/paranoia/MD5Class.zip although if the admins would be so kind it might someday be hosted here
Obscurity
2004-03-29 02:47:07
Ah thanks paranoia, greatly appreciated ;)
bb
2004-03-29 17:10:22
i'll host the code when i get onto a damn connection which allows me to ssh. or some other friendly admin could of course ;-) in fact, i'll organise cvs code access for you so you can simply add files!
cyben
2004-03-30 08:38:33
Damn good read.
I've often wondered how MD5 worked. I had a hard time following it though. What step is it that makes it undecypherable? Why can't you just do the steps backwards?
paranoia
2004-03-30 19:47:15
the big long ugly step makes it undecipherable...mostly because it isn't a cipher.

you don't actually hide the message anywhere, you just use the message bits as one of several parameters to the multiple state variable hashing functions

(technically you could do the steps backwards, but that would require knowing the message text, making the whole process sorta pointless)
oscinis
2004-03-30 22:20:00
great article. good introduction to hashing for me..
but how are md5's cracked?
paranoia
2004-03-30 22:48:09
since there is no known way to reverse the md5 hash, the only way to obtain the original text is by finding the md5 digests of potential solutions and comparing those to the known digest (i.e: search until you find a text such that md5(text)="known digest")
aidan
2004-03-31 19:04:31
Where did you get QH/QuickHash.h from, and what does "using namespace QuickHash;" do?
paranoia
2004-03-31 20:49:55
QuickHash is a md5 hash library that i got when i searched on google. QH/ is the directory i put it in. QuickHash is the namespace used by the library (so i don't have to type 'QuickHash::' in front of every function name)
niki
2004-04-01 00:35:57
the 9th and 10th character in the sample output are backwards(EF vs FE)

ps. I like your use of the term goober
paranoia
2004-04-01 04:22:36
i don't know what you're talking about niki...it looks fine to me...just kidding...i fixed it...nice catch. and thank you for noticing my extensive use of some very technical vocabulary
caesar4
2004-04-07 23:32:17
since md5 can be optimized to be quite fast, wouldn it be practical to md5 a string array and when searching, md5 the search string, and compare it to the hashes in the array?
fugjostle
2004-05-21 12:09:01
I think its important to add to the article that MD5 is becoming increasing insecure because its prone to collisions and should be replaced with SHA-1 when you next get the chance. Other than that, great article :)
Nissemand
2005-05-01 16:57:31
how would i make the "bit-wise rotation" in C?
aidan
2005-05-02 15:29:42
<<<
sefo
2005-05-03 13:31:09
<<< is shift left isn't it? (SHL)

For a rotation (right or left):
http://www.osix.net/modules/article/?id=320
aidan
2005-05-07 11:26:25
In c/c++ at least, << and >> are shift left and right, but i think some compilers support <<< and >>> as rotate left and right. I couldn't find any references to >>> adnd <<< on the web just now but i'm sure i've used them in that way!
sefo
2005-05-09 15:30:42
That would be nice if you could remember the compiler that supports those <<< >>> operators.
Because I never managed to make the code in the above article work.
Geek_Freek
2005-06-07 07:26:17
I recommend "Applied cryptography" by Bruce Schneier 4 anyone interested in cryptography.. Damn good book..
jimv
2005-06-14 16:06:58
Nice, basic intro.

<Shameless plug>
Check out this URL also if you'd like an additional, slightly more descriptive article on MD5 (portable C++ code) - which isn't really that mysterious!

<http://www.codeguru.com/Cpp/Cpp/algorithms/general/article.php/c7399>

</shameless plug>
bb
2005-06-14 18:18:04
better still wrap it in bbcode for that nice clickability
paranoia
2005-06-28 11:13:47
i just realized that the hosting for that MD5 class got toasted, so i put a similar (slightly changed) version in my OSIDrive. and yes, the example usage code i wrote does contain a function that takes an old c-style string as a parameter and returns a new string form, making me totally inconsistent. but it's just an example, the other code should be more awesome.

http://www.osix.net/modules/folder/index.php?tid=7559&action=vf
anilg
2006-01-30 19:45:13
MD5 attacked.. collision attack have been successful in MD5.. 90 mins or so.

DO NOT USE THIS FOR YOUR HASHING NEEDS. Go gor SHA-160 (256 recommended)
Domuk
2006-01-31 01:44:20
My understanding of the collision attack is that everyone's overreacting, and that it can only be used where you can alter the original message to add 'garbage', and it's not a big deal for 'normal' messages, especially things like <20 char passwords. Isn't this the case?
anilg
2006-01-31 03:02:50
Yes.. the applicatons and protocols currenly using MD5 remain fairly secure.... but why walk on a wet swamp when a clean path is available.
Anonymous
2006-04-23 13:34:02
Excellent read. I've been trying to write my own MD5 function in VB6 for a while (don't laugh) and this has really taken it down into something I can *almost* understand.
Anonymous
2007-03-20 02:12:01
Kevin Dawson (kevin@ezrs.com)
------------------------------------------------
Well, there is no rotate left (<<<) operator in my compiler. so i defined my own MACRO to implement it. it's like:

#define ROTATE_LEFT(x, n) ( ( x << n ) | ( x >> pow(2, (sizeof(x)) - n ) )

What we basically are doing is shift the bits of x by n places. due to this however, the result loses the first n bits (MSBs) of x and the last n bits (LSBs) are just 0s. What we need to do is replace these last n 0s in the result with the first n bits of x. So, we also right shift x to move the first n bits into the right MSB position. Then we do a bitwise OR of the two shifted results. e.g.

0011 0101 << 3 = 1010 1000
0011 0101 >> 5 = 0000 0001 [because 8-3=5]
---------------------------
Logical OR = 1010 1001
---------------------------

See, the result is is equal to
0011 0101 <<< 3 = 1010 1001

Note: pow() is a library function used to calculate number of bits in x. pow() requires you to #include <math>
Anonymous
2007-03-20 02:16:04
Kevin Dawson (kevin@ezrs.com)
------------------------------------------------

PS: It's better to replace x and n with (x) and (n) while defining the macro. It's a good practice.

#define ROTATE_LEFT(x, n) ( ( (x) << (n) ) | ( (x) >> pow(2, (sizeof(x)) - (n) ) )
Anonymous
2007-03-20 02:23:54
I want to correct something here. We don't need to use pow here at all. What we need is to replace pow(2, sizeof(x)) with 8*sizeof(x).

So the macro becomes:
#define ROTATE_LEFT(x, n) ( ( (x) << (n) ) | ( (x) >> 8*sizeof(x) - (n) ) )

I beg your pardon for the mistake.
Anonymous
2007-03-27 11:02:09
How can I get the Binary Code for a Hex number
and ow to know what long (incrypt) is it
Anonymous
2007-04-24 23:25:04
Kevin,
If x is signed and negative, the right shift will drag ones into the space where you're ORing the left shifted data. You will not get what you're expecting.
- Mike <http://pnmx.com/>
Anonymously add a comment: (or register here)
(registration is really fast and we send you no spam)
BB Code is enabled.
Captcha Number:


Blogs: (People who have posted blogs on this subject..)
MaxMouse
PSP on Mon 7th Sep 10am
I was going to write an article on PSP NIDS, but when i started doing it, it felt as if it dropped a little short of what i wanted it to be, and wasn't particularly long (or interesting to people not associated with the PSP Scene). I did write about it
halsten
Backdoor.W32.Small.PF Analysis on Mon 7th Jan 3am
A long time has passed since my first analysis paper, but here is another one. This time it’s short and small. The package contain all the necessary files to get you started on understanding the malware. I hope it’s better than my last paper. You can chec
halsten
Malware Analysis on Sun 5th Aug 3am
Hello all, in here (http://iamhalsten.thecoderblogs.com/200 7/07/23/malware-analysis/) you can find my latest analysis paper for a malware I've analyzed. The paper is extensively and comprehensively documented. Have fun reading it. -- halsten http://i
sefo
AVG's Restore File As... on Wed 30th Aug 1pm
It is possible to restore infected files from the vault to the 'computer' using the option 'Restore File As'. So I restored as 'blah.xyz' the wmf file AVG found the other day and I put it on the desktop. My surprise was to discover that AVG restored

Test Yourself: (why not try testing your skill on this subject? Clicking the link will start the test.)
Cryptography by TroPe

This test will cover Symmetric cryptography, public keys, key management, and some questions on cryptanalysis. If you know a little something about Crypt stuff, give this test a shot!

Related Links:
I Love You Virüsünün Ayr?nt?l? Analizi
Virüsler Ve Trojanlar Hakk?da Bilgiler
IBM highlights new threats
Directory Traversal - Fundamentals
[Deutsch] Backdooring PAM
Backdooring PAM
[Français] Sec Administration: How to Sanitize Media
Cross Site Scripting Fundamentals
[Français] Form Vulnerability 101
[Français] Exploits 101
En Tehlikeli Virüsler Hakk?nda
DOS
[Bahasa Indonesia] Symantec SimWorms: Apakah itu?
MyDoom Mutates Again!
Hacking a bank (dexia.be) , not as difficult as you thought it was.
QNX Password Recovery From The Hashes
Port Listesi
[Español] Remote File Inclusion o RFI
[Deutsch] Verketten von Proxys
The Power of 802.11
Configuring Cisco for IP-enabled Frame Relay
Autopsy of a Hack: NetBios Hacking
From Zero to Root
IIS Exploitation
Intrusion Detection
IP Firewall - ipfw
Cryptography 101
[Español] Algoritmos de encriptado mas comunes
Common Encryption Algorithms
Polyalphabetic Ciphers
Digital Paranoia: Let"s encrypt everything as a simple white space


     
Your Ad Here
 
Copyright Open Source Institute, 2006