 |
 |
 |
 |
| | No-one commented this yet? I think it was a good article, the first time I met radix sorting. |
|
 |
| I think it's quite optimistic to say O(n) as the running time, since it greatly depends on the data to be sorted. O(n log m) is more correct, where m stands for the size of value space. If you only have to sort small range of numbers, you won't need to make many passes through the algorithm. However, in the generic case, you'll have to do many passes through the data which can end up costing more than quicksort.
It's a good algorithm to know and works very well in some special cases, sorting short text strings being one rather interesting and often met case. |
|
 |
| I'm sorry you're wrong ;-). When the keys have a fixed max length m, log(m) becomes a constant hidden in O(n) notation. It's by the definition of O(n):
f(x)=O(n) if exists a function g(x) for it is true that f(x)<=c*g(x) for x>=x0. In this case log(m) is the c constant so you can't say this is O(nlogm).
Also, considering that numbers in a computer are usually stored with a number of bits not over 64 (rarely more that 64) you can always apply this algorithm on int or long on a common pc on most languages. |
|
 |
| | Oh, obviously g(x)=a*x+b with a,b belonging to Z. |
|
 |
| Well, yes, considering it a constant makes the algorithm O(n). However, that's like saying that if you know the amount of items beforehand, the algorithm is O(1). Well, yes, sure it is, but what's the point?
The algorithm can be applied to different situations, with different parameters. I think the range of values isn't any less of a variable than amount of items, when it comes to observing the algorithm from an abstract perspective. Any time that you choose to apply it to a specific case, the complexity is reduced by the details imposed by the case, and thus you get O(n) for 32bit integer sorting, for example. Please consider the case of sorting text strings for a counter example. |
|
 |
| |
 |
| | It's the same again. I understand you point man and I have to say you're smart, but you should have noticed that the word linear in the title is quoted. The reason it's quoted is that obviously, if you have non fixed length keys or keys too big this algo is not so fast, but there is no need to remark this when I am the first to say it in the article. This algo works surprisingly well when sorting common (...up to 80 bits e.g.) numbers and short strings. For bigger numbers or longer strings you obviously use quicksort. Also, knowing the amount of items before won't reduce the complexity to O(1). The complexity it's always O(n) because with a 1-cpu machine you have to see all the values at least once. It's two different things hiding constants in the O() notation and finding complexities using it. You can't say it's O(nlogm) or O(1). Once stated that the keys have a fixed max length, the complexity is O(n). I'm not the one who says that. It's prof. Rivest who says that. Prof. Rivest who? The "R" of RSA! ;-) |
|
 |
| Damnit, I checked the article again and I totally missed the comment about fixed maximum length. And if you have fixed amount of items it's O(1) and not O(n) because n is no longer a variable. I didn't read the article very carefully and thought you were talking about complexity of radixsort generally, in which case I consider O(n) to be optimistic since it disregards the varying amount of possible values :)
Ohwell, looks like I'm wasting both of ours time here :p |
|
 |
| | Hehe, yes. Probably is better to read completely an article before making a comment. Anyway, it's good to discuss things if you disagree with something. I leave you with this question: you insist saying that if n is fixed then complexity is O(1). But if m is fixed and n also is fixed, why would you calculate complexity? Complexity is to be calculated when there is at least one var changing right? Cya! |
|
 |
| | Well, I thought I read the article. Anyway, the actual content of the data is still variable if even both n and m are fixed, so O(1) is to say that it's constant-time, independent of the values to sort. |
|
 |
| This is quite a good introduction to radix sort, and also a nice comment war where, as far as the ground each of you set up for yourself, you're both right. The radix sort algorithm *is* O(n) for the domain of integers where m is known and log(m) reduces to a constant. But the algorithm is also O(n log m) as a general sorting algorithm, with the domain of variable length strings being an excellent example.
You are also both correct on the O(1) debate: there is no need to calculate complexity when all the variables are fixed (as sfabriz says) *because* the runtime is O(1) (no matter what the values are in an array of fixed length and maximum bit size, it runs in the same amount of time -- as muzzy says). The only thing relevant about having a complexity of O(1) in this case is in comparison to other sorting algorithms, which can have varying runtimes based on the data in the array. The O(1) time just means that radix sort doesn't have a best or worst case within that domain (and actually all domains, if I've understood correctly), everything is just average.
To finish this whole post off (it got longer than I expected), I must point out to sfabriz that muzzy probably based his comments on the tone of your article and the very end part after the code, both of which *do* give the reader a feeling that radix sort will run in O(n) time for all sorting domains. But yes, the article does deal with integer sorting alone, and as far as it goes is absolutely correct. Maybe somebody should just put a disclaimer at the end (or beginning) to alert readers that using the radix sorting algorithm in a more general domain *will* lead to a runtime of O(n log m), with the special deal of O(n) existing only for domains where the maximum size is fixed. ...Or maybe all these comments already did that...
But again, excellent article, and actually a very interesting debate on runtime complexity. It's nice to see that people around here have an understanding of these things. |
|
 |
| Radix sort is efficient for most input ranges. Its main drawback is the additional O(n) (or O(n*log(m))) space required to do the counting sort. It also fails to beat an average-case n*log(n) algorithm like quicksort for cases like full names, where 20-character strings can be expected.
|
|
 |
| If I wrote that algorithm, I would give credit to WikiBook "Algorithm Implementation". Also, if I don't want to credit I make sure my implementation differs from that in the book, instead here author just copy pasted the pseudo code from there without even removing the mistakes like:
// number of bits our group will be long
Anyways this was a good read. |
|
 |
| |
 |
| | Indeed, this article existed years before this article did. sfabriz, you should bug them about citing this article, if you're so inclined. You should to clear up problems like these and to receive credit where credit is due. |
|
 |
| | I would have told them something if it was an algorithm they copied from me, but this is just an implementation of an algorithm so I don't bother too much. I didn't even know until I saw the comment left by that user who was thinking that I copied. It's finally happened though... Some code of mine is on wikipedia, and that means wikipedia is no longer a good source for code :-P |
|
 |
| The big omega for an algorithm actually takes into account the size of the input, not just the number of elements in the input. Thus, its the sum of the number of bits for every input that matters.
This is important in theory. NP-complete problems such as subset sum can be done in polynomial time otherwise. If you don't remember this, and you will make an ass of yourself when you go to claim your millenium prize and didn't know its input size not number of elements. |
|
 |
| I have made little modifications (1. some of the variables can be extracted to constants. 2. I gave it the same signature as OrderBy<T> in LINQ so that it can be used in lambda expressions)
Tested it and it works faster than .NET quicksort starting from 64 elements in collection.
The only limitation is that it does not sort correctly when array contains both positive and negative values. This is due to bitwise counting sort implementation. I'm not sure what is the best way to support negative values in such algorithm. A simple workaround is to split an array to two arrays in one pass - negative and positive , sort each of them and then append positive to negative and return the result. |
|
 |
| |
 |
 |
 |
 |
Anonymously add a comment: (or register