27176 total geeks with 3531 solutions
Recent challengers:
 Welcome, you are an anonymous user! [register] [login] Get a yourname@osix.net email address 

Articles

GEEK

User's box
Username:
Password:

Forgot password?
New account

Shoutbox
MaxMouse
It's Friday... That's good enough for me!
CodeX
non stop lolz here but thats soon to end thanks to uni, surely the rest of the world is going good?
stabat
how things are going guys? Here... boring...
CodeX
I must be going wrong on the password lengths then, as long as it was done on ECB
MaxMouse
lol... the key is in hex (MD5: of the string "doit" without the "'s) and is in lower case. Maybe i should have submitted this as a challenge!

Donate
Donate and help us fund new challenges
Donate!
Due Date: Nov 30
November Goal: $40.00
Gross: $0.00
Net Balance: $0.00
Left to go: $40.00
Contributors


News Feeds
The Register
Sony cuff-puter to
do one thing
smartwatches can"t:
Give you DAYS of
hot wrist action
Rosetta science
team thinks Philae
might come to life
in the spring
Google"s whois
results say it"s a
lousy smut searcher
Bitcoin laid bare:
boffins beat
anonymity
The gender
imbalance in IT is
real, ongoing and
ridiculous
NSF goes cloudy
with US$16 million
super funding
Global Commission
on Internet
Governance wobbles
into IANA debate
A DRONE OF THEIR
OWN: GoPro tabbed
for cameracopter
launch
Adobe Reader
sandbox popped says
Google researcher
Hacker dodges FOUR
HUNDRED YEARS in
cooler for SCANNING
sites
Slashdot
UK Announces Hybrid
Work/Study
Undergraduate
Program To Fill
Digital Gap
BT Blocking Private
Torrent Sites?
Consortium Roadmap
Shows 100TB Hard
Drives Possible By
2025
Health Advisor:
Ebola Still
Spreading, Worst
Outbreak We"ve Ever
Seen
Ask Slashdot: Best
Biometric
Authentication
System?
Voting Machines
Malfunction: 5,000
Votes Not Counted
In Kansas County
Bitcoin Is Not
Anonymous After All
BlackBerry Will Buy
Your iPhone For
$550
Jackie Chan Discs
Help Boost Solar
Panel Efficiency
Fly With the
Brooklyn Aerodrome
(Video)
Article viewer

Sorting in "linear" time with Radix Sort



Written by:sfabriz
Published by:sefo
Published on:2005-06-17 11:36:50
Topic:Dot.Net
Search OSI about Dot.Net.More articles by sfabriz.
 viewed 28924 times send this article printer friendly

Digg this!
    Rate this article :
This time is about radix sorting and the fact that we can sort in linear time if we have to order a set of keys of fixed max length.

The purpose of this article is to trace the guidelines of a brilliant sorting algorithm: the Radix Sort. This algorithm needs to use a sorting algorithm inside itself and we'll use counting sort so let's begin with a short explanation of how counting sort works.
Let's say you have an int array declared like this:

int[] c=new int[10];


filled with numbers from 0 to 3 and say they are like this: {3,2,0,2,0,1,2,3,1,2} How can we sort this with just one pass through the array? Very easy, we count how many 0,1,2 and 3 there are in the array and store the number in another array and then we refill it with the elements put in order. Like this:

int[] count=new int[4]; // count[i] is set = 0 foreach i when declaring
int[] pref=new int[4]; // this is a support array that calcs the prefixes
int[] t=new int[c.Length]; // target array

for (int i=0; i<c.Length; i++) {
    count[c[i]]++;
}

pref[0]=0;
for (int i=1; i<count.Length; i++) {
    pref[i]=pref[i-1]+count[i-1];
}

for (int i=0; i<t.Length; i++) {
    t[pref[c[i]]++]=c[i];
}


After the scanning, the count array will be like this: {2,2,4,2} (2 0's, 2 1's, 4 2's and 2 3's) and the pref array will be like {0,2,4,8} which means that we must put the 0's stating from 0 (and then incrementing it to 1), the 1's starting from 2, the 2's starting from 4 and the 3's starting from 8. The target array t will be then filled like this: {0,0,1,1,2,2,2,2,3,3} resulting to be sorted.
Why can we do this so fast? because we know the numbers we are ordering are in this gap [0,3].

So, we can order an array filled with small values very quickly using counting sort. Let's see what we can do if numbers are not so small.
Radix sort comes helping us. Consider a number with 20 digits as the maximum number our keys can be. Sorting with couting sort will require a huge computation so let's divide this number in 5 groups of four digits each and sort them separately.
Some of you will be thinking already to start from the most significan group of digits and proceed toward the less significant one but we'll do the opposite. We'll start from the bottom! We'll order the lowest group first, then the second one, and so on till we reach the highest value digits. Incredibly (it's counterintuitive) at each step, because we use a stable sorting like counting sort, our numbers will be ordered correctly.

Let see an example before digging into the code: our number here from 20 digits divided in 5 groups of 4 digits each becomes a 3 digit number divide in 3 groups of 1 digit each. (It's easy to see it working)

c.Length = 8
    **_ *_* _**
329 720 720 329
457 355 329 355
675 436 436 436
839 457 839 457
436 657 355 657
720 329 457 720
355 839 657 839


As you can see we order first the last digit, then the second one and so on till the first one.
If you don't understand this algorithm at this point don't worry, it's not you're stupid. It's just that this works in a very uncommon way and some people need a little bit more thinking to get used to it (me included).

Ok now let's see the pseudo code for radix sort:

// d is the number of digits
// l is total digits of maximum key
// hence l/d is the number of our groups

RadixSort(c, d) {
    for i<-1 to Ceiling(l/d)
    do use a stable sort to sort group i of array c numbers
}


Which means, start from the bottom and group-by-group, sort the array sorting on the groups one at a time, until you reach the most significan group. By then the array will be sorted.

And now let's see what this translates into using C#:


// array a is to be sorted and it's implicitly passed by reference to the RadixSort method

private void RadixSort(int[] a) {
    
    // our helper array
    int[] t=new int[a.Length];
    
    // number of bits our group will be long
    int r=4; // try to set this also to 2, 8 or 16 to see if it is quicker or not
    
    // number of bits of a C# int (sizeof works with unsafe code so I did this way...)
    int b=(int)Math.Ceiling(Math.Log(int.MaxValue,2))+1;
    
    // counting and prefix arrays (note dimensions 2^r which is the number of all possible values of a r-bit number)
    int[] count=new int[1<<r];
    int[] pref=new int[1<<r];
    
    // number of groups
    int groups=(int)Math.Ceiling((double)b/(double)r);
    
    // the mask to identify groups
    int mask = (1<<r)-1;
    
    // the algorithm:
    for (int c=0, shift=0; c<groups; c++, shift+=r) {
        // reset count array
        for (int j=0; j<count.Length; j++) {
            count[j]=0;
        }
        // counting elements of the c-th group
        for (int i=0; i<a.Length; i++) {
            count[(a[i]>>shift)&mask]++;
        }
        // calculating prefixes
        pref[0]=0;
        for (int i=1; i<count.Length; i++) {
            pref[i]=pref[i-1]+count[i-1];
        }
        // from a[] to t[] elements ordered by c-th group
        for (int i=0; i<a.Length; i++) {
            t[pref[(a[i]>>shift)&mask]++]=a[i];
        }
        // a[]=t[] and start again until the last group
        a=(int[])t.Clone();
    }
    // a is sorted
}


With a clever choice of the number of digits per group this algorithm is very very fast. In fact its running time is O(n). Running time of quicksort varies from O(n) (best case) to O(n^2) worst case with an average of O(n*logn). Mergesort is O(n*logn) too.

For a deeper understanding of how all this works check "Introduction to Algorithms" (Cormen, Leiserson, Rivest).

Did you like this article? There are hundreds more.

Comments:
cyph1e
2005-06-18 11:49:45
No-one commented this yet? I think it was a good article, the first time I met radix sorting.
muzzy
2005-06-20 07:26:00
I think it's quite optimistic to say O(n) as the running time, since it greatly depends on the data to be sorted. O(n log m) is more correct, where m stands for the size of value space. If you only have to sort small range of numbers, you won't need to make many passes through the algorithm. However, in the generic case, you'll have to do many passes through the data which can end up costing more than quicksort.

It's a good algorithm to know and works very well in some special cases, sorting short text strings being one rather interesting and often met case.
sfabriz
2005-06-20 10:58:14
I'm sorry you're wrong ;-). When the keys have a fixed max length m, log(m) becomes a constant hidden in O(n) notation. It's by the definition of O(n):
f(x)=O(n) if exists a function g(x) for it is true that f(x)<=c*g(x) for x>=x0. In this case log(m) is the c constant so you can't say this is O(nlogm).
Also, considering that numbers in a computer are usually stored with a number of bits not over 64 (rarely more that 64) you can always apply this algorithm on int or long on a common pc on most languages.
sfabriz
2005-06-20 10:59:35
Oh, obviously g(x)=a*x+b with a,b belonging to Z.
muzzy
2005-06-22 09:01:37
Well, yes, considering it a constant makes the algorithm O(n). However, that's like saying that if you know the amount of items beforehand, the algorithm is O(1). Well, yes, sure it is, but what's the point?

The algorithm can be applied to different situations, with different parameters. I think the range of values isn't any less of a variable than amount of items, when it comes to observing the algorithm from an abstract perspective. Any time that you choose to apply it to a specific case, the complexity is reduced by the details imposed by the case, and thus you get O(n) for 32bit integer sorting, for example. Please consider the case of sorting text strings for a counter example.
sfabriz
2005-06-22 09:34:20
It's the same again. I understand you point man and I have to say you're smart, but you should have noticed that the word linear in the title is quoted. The reason it's quoted is that obviously, if you have non fixed length keys or keys too big this algo is not so fast, but there is no need to remark this when I am the first to say it in the article. This algo works surprisingly well when sorting common (...up to 80 bits e.g.) numbers and short strings. For bigger numbers or longer strings you obviously use quicksort. Also, knowing the amount of items before won't reduce the complexity to O(1). The complexity it's always O(n) because with a 1-cpu machine you have to see all the values at least once. It's two different things hiding constants in the O() notation and finding complexities using it. You can't say it's O(nlogm) or O(1). Once stated that the keys have a fixed max length, the complexity is O(n). I'm not the one who says that. It's prof. Rivest who says that. Prof. Rivest who? The "R" of RSA! ;-)
muzzy
2005-06-22 10:56:33
Damnit, I checked the article again and I totally missed the comment about fixed maximum length. And if you have fixed amount of items it's O(1) and not O(n) because n is no longer a variable. I didn't read the article very carefully and thought you were talking about complexity of radixsort generally, in which case I consider O(n) to be optimistic since it disregards the varying amount of possible values :)

Ohwell, looks like I'm wasting both of ours time here :p
sfabriz
2005-06-22 11:14:42
Hehe, yes. Probably is better to read completely an article before making a comment. Anyway, it's good to discuss things if you disagree with something. I leave you with this question: you insist saying that if n is fixed then complexity is O(1). But if m is fixed and n also is fixed, why would you calculate complexity? Complexity is to be calculated when there is at least one var changing right? Cya!
muzzy
2005-06-22 14:41:57
Well, I thought I read the article. Anyway, the actual content of the data is still variable if even both n and m are fixed, so O(1) is to say that it's constant-time, independent of the values to sort.
paranoia
2005-06-23 09:59:19
This is quite a good introduction to radix sort, and also a nice comment war where, as far as the ground each of you set up for yourself, you're both right. The radix sort algorithm *is* O(n) for the domain of integers where m is known and log(m) reduces to a constant. But the algorithm is also O(n log m) as a general sorting algorithm, with the domain of variable length strings being an excellent example.

You are also both correct on the O(1) debate: there is no need to calculate complexity when all the variables are fixed (as sfabriz says) *because* the runtime is O(1) (no matter what the values are in an array of fixed length and maximum bit size, it runs in the same amount of time -- as muzzy says). The only thing relevant about having a complexity of O(1) in this case is in comparison to other sorting algorithms, which can have varying runtimes based on the data in the array. The O(1) time just means that radix sort doesn't have a best or worst case within that domain (and actually all domains, if I've understood correctly), everything is just average.

To finish this whole post off (it got longer than I expected), I must point out to sfabriz that muzzy probably based his comments on the tone of your article and the very end part after the code, both of which *do* give the reader a feeling that radix sort will run in O(n) time for all sorting domains. But yes, the article does deal with integer sorting alone, and as far as it goes is absolutely correct. Maybe somebody should just put a disclaimer at the end (or beginning) to alert readers that using the radix sorting algorithm in a more general domain *will* lead to a runtime of O(n log m), with the special deal of O(n) existing only for domains where the maximum size is fixed. ...Or maybe all these comments already did that...

But again, excellent article, and actually a very interesting debate on runtime complexity. It's nice to see that people around here have an understanding of these things.
Anonymous
2007-04-12 10:13:57
Radix sort is efficient for most input ranges. Its main drawback is the additional O(n) (or O(n*log(m))) space required to do the counting sort. It also fails to beat an average-case n*log(n) algorithm like quicksort for cases like full names, where 20-character strings can be expected.
Anonymous
2008-10-08 23:26:51
If I wrote that algorithm, I would give credit to WikiBook "Algorithm Implementation". Also, if I don't want to credit I make sure my implementation differs from that in the book, instead here author just copy pasted the pseudo code from there without even removing the mistakes like:
// number of bits our group will be long

Anyways this was a good read.
sfabriz
2008-11-09 10:42:49
Check here http://en.wikibooks.org/w/index.php?title=Algorithm_implementation/Sorting/Radix_sort&action=history to see the publish date of the wiki article and compare it to my article's one. You'll see who has copied and who didn't.
SAJChurchey
2008-11-09 14:18:55
Indeed, this article existed years before this article did. sfabriz, you should bug them about citing this article, if you're so inclined. You should to clear up problems like these and to receive credit where credit is due.
sfabriz
2008-11-14 09:14:40
I would have told them something if it was an algorithm they copied from me, but this is just an implementation of an algorithm so I don't bother too much. I didn't even know until I saw the comment left by that user who was thinking that I copied. It's finally happened though... Some code of mine is on wikipedia, and that means wikipedia is no longer a good source for code :-P
Anonymous
2008-12-03 11:01:10
The big omega for an algorithm actually takes into account the size of the input, not just the number of elements in the input. Thus, its the sum of the number of bits for every input that matters.
This is important in theory. NP-complete problems such as subset sum can be done in polynomial time otherwise. If you don't remember this, and you will make an ass of yourself when you go to claim your millenium prize and didn't know its input size not number of elements.
Anonymous
2010-03-05 22:19:36
I have made little modifications (1. some of the variables can be extracted to constants. 2. I gave it the same signature as OrderBy<T> in LINQ so that it can be used in lambda expressions)
Tested it and it works faster than .NET quicksort starting from 64 elements in collection.
The only limitation is that it does not sort correctly when array contains both positive and negative values. This is due to bitwise counting sort implementation. I'm not sure what is the best way to support negative values in such algorithm. A simple workaround is to split an array to two arrays in one pass - negative and positive , sort each of them and then append positive to negative and return the result.
Anonymously add a comment: (or register here)
(registration is really fast and we send you no spam)
BB Code is enabled.
Captcha Number:


Blogs: (People who have posted blogs on this subject..)
bb
ASP.NET RadioButton GroupName when inside a Repeater on Sun 10th Jun 8am
I was thankful on finding this nugget of code, which makes the groupname work out when slamming in radiobuttons in an asp.net repeater. http://www.codeguru.com/csharp/csharp/cs _controls/custom/article.php/c12371/


     
Your Ad Here
 
Copyright Open Source Institute, 2006