Jump to content

Talk:Disjoint-set data structure

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

disjoint set

[edit]

This is also known as MF-Set (Merge-Find-Set), isn't it?--216.244.232.1 03:37, 18 Oct 2004 (UTC)

I was not aware of this. Thanks. I've added this information and the appropriate redirects (merge-find set, merge-find-set, merge-find data structure). Derrick Coetzee 05:45, 18 Oct 2004 (UTC)

Some references

[edit]

The first description of the disjoint-sets forest idea is found in:

Bernard A. Galler and Michael J. Fisher. "An improved equivalence algorithm", Communications of the ACM, Volume 7, Issue 5 (May 1964), pages 301-303.

Further reference:

Zvi Galil and Giuseppe F. Italiano. "Data structures and algorithms for disjoint set union problems", ACM Computing Surveys, Volume 23, Issue 3 (September 1991), pages 319-344.

Thanks for the references. I'll add them. Deco 04:10, 24 December 2005 (UTC)[reply]

Does the source code belong here?

[edit]

While the source code is no doubt useful, it's not really commented on that much in the rest of the entry. Might it belong in Wikisource? --Jacj 14:21, 8 May 2006 (UTC)[reply]

some notes

[edit]

wouldn't the Fredman and Saks result need a reference? (at the end of the "Disjoint-set forests" section)

also, isn't this the same algorithm that physicists know as Hoshen-Kopelman? I will try to add that in some appropriate way later.

Disjoint-set forests algorithm complexity

[edit]

On the page http://en.wikipedia.org/wiki/Big_O_notation#Common_orders_of_functions it says that the complexity of the The find algorithm of Hopcroft and Ullman on a disjoint set is iterated logarithmic. Iterated logarithmic is the inverse of Tetration. On this page at the end of the chapter Disjoint-set forests algorithm complexity however it says that the complexity is the inverse of the Ackerman function.

It also says on this page that the inverse of the Ackerman function is less than 5 for all practical n, while on the Iterated_logarithm page, it says the inverse of Tetration (= the iterated logarithm) is less than 5 for all practical n.

Tetration and Ackerman are related but not the same.

What is correct and what isn't?

Lodev 15:10, 10 September 2007 (UTC)[reply]

Both are correct, in that both are proven upper bounds for the runtime of these operations. The inverse Ackermann function grows more slowly than the iterated logarithm, so it's a better bound. This is already described briefly in the History section of this article. Dcoetzee 18:24, 10 September 2007 (UTC)[reply]

I am not sure how to parse this:

Using both path compression, splitting, or halving and union by rank or size (...)

Should it be as follows?

Using both [path compression, splitting or halving], and [union by rank or size] (...)

If so, I think that using a bulleted list would help. Alternatively, we could do the following:

  1. Introduce a new name (say, “path shortening strategy”) to refer to any of path compression, path splitting or path halving.
  2. Introduce another new name (say, “non-naïve root selection strategy”) to refer to either union by rank or union by size.
  3. Say “using both a path shortening strategy and a non-naïve root selection strategy ensures that the amortized time per operation is ”.

pyon (talk) 05:10, 22 January 2020 (UTC)[reply]

Honestly there are so many time complexities related to Union Find depending on which optimizations you use that we need a chart that directly state them all together. The article doesn't state the time complexity for Path splitting/halving (it just says they "retain the same worst-case complexity but are more efficient in practice").
In the reference it does state this:
> We shall prove that the algorithm with either linking by rank or linking by size and either compression, splitting, or halving runs in O(n + ma(m + n, n)) time for arbitrary m and n. Thus these six methods are all asymptotically optimal.
Also lower on this page, someone asked about the complexity of Union Find with Path Compression but without Union By Rank/Size. The wiki doesn't answer it either, and I also think there is an answer because I also sorta recall it being logn from somewhere (YouTube videos, textbook).
Overall my point is we should organize the complexities better (into a table). 136.56.82.189 (talk) 01:34, 11 September 2024 (UTC)[reply]

Hantao Zhang says that the time is linear

[edit]

This paper of Hantao Zhang [1], a CS professor at the University of Iowa, claims that the bound given in the Wikipedia article is not the best possible, and that a more careful analysis shows the algorithm to have linear worst-case time. I am following up with Zhang about the provenance of the paper, and whether it was published. -- Dominus (talk) 23:58, 21 January 2008 (UTC)[reply]

I was going to say that there's a proven lower bound, but the paper addresses this. However, I'm not confident that this paper, which is still very new, has been thoroughly peer-reviewed. I'd like to see it published in a conference or journal or at least a workshop first. Dcoetzee 03:28, 22 January 2008 (UTC)[reply]
It has not yet been published or peer-reviewed. It is under submission to STOC 2008. Zhang had not been aware of the Fredman and Saks thing, although he had been aware of similar work that proves the same result. I will post an update if I learn anything new. -- Dominus (talk) 23:51, 28 January 2008 (UTC)[reply]
The paper was rejected from STOC 2008. -- Dominus (talk) 15:15, 27 February 2008 (UTC)[reply]
Is there any indication of why the paper was rejected -- did the reviewers find mistakes, or did they simply not think it was important enough? 24.80.10.198 (talk) 03:28, 14 April 2008 (UTC)[reply]
I suppose there were mistakes in it, since Zhang has removed it from his website. Does someone remember the title of the paper? Adrianwn (talk) 09:36, 18 April 2008 (UTC)[reply]
The title was The Union-Find Problem Is Linear. I still have a copy. -- Dominus (talk) 13:56, 18 August 2008 (UTC)[reply]
Isn't the non-linear lower bound only valid under a separation assumption (i.e. disjoint sets representations do not share memory)? The following paper solves some offline version of the union-find problem in linear time by using a lookup table for small sets: "A linear-time algorithm for a special case of disjoint set union" (1983) by Gabow and Tarjan. Aureooms (talk) 16:24, 18 September 2022 (UTC)[reply]
What about the paper mentioned in e-maxx.ru: Wu, Otoo "A Simpler Proof of the Average Case Complexity of Union-Find with Path Compression"? https://www.researchgate.net/profile/Kesheng-Wu/publication/235708752_A_simpler_proof_of_the_average_case_complexity_of_union-find_with_path_compression/links/540bd2160cf2df04e7509fe0/A-simpler-proof-of-the-average-case-complexity-of-union-find-with-path-compression.pdf?origin=publication_detail It is for average case, but the complexity analysis seems pretty short if the paper is valid. Wqwt (talk) 15:10, 25 December 2022 (UTC)[reply]

Shorten Applications section

[edit]

I will shorten the Applications section; in particular I will remove all subsections except the one related to undirected Graphs. Rationale: Wikipedia is not a textbook, it is also not a programming manual or guide. Books or websites about algorithms or programming are better suited for such descriptions; however, references to them would be appropriate. Adrianwn (talk) 09:50, 18 April 2008 (UTC)[reply]

I disagree with this edit and with your interpretation of "Wikipedia is not a textbook". The policy page goes on to say: "It is not appropriate to create or edit articles which read as textbooks, with leading questions and step-by-step problem solutions as examples." This seems to imply that this has more to do with style than content; applications of a data structure are certainly relevant to the data structure, just as carbon dioxide describes industrial uses of that chemical in great detail. Dcoetzee 20:57, 18 April 2008 (UTC)[reply]
You are right, the applications of this data structure should be listed. However, the Applications section contained a lot of those step-by-step instructions. Could you please rewrite that section in a shorter, more concise way? I would do it myself if I understood more about these specific subjects. Adrianwn (talk) 07:20, 19 April 2008 (UTC)[reply]
You're right, it ought to summarize the applications rather than detail the application algorithms, but the point is to describe the mapping of various application domain events onto disjoint-set operations. I'll take a look at it again, at least the ones that I wrote. Dcoetzee 06:06, 30 April 2008 (UTC)[reply]
Also, "weighted union heuristic" is a term taken directly from CLRS, and it's used in many publications; you would have found many more Google hits if you tried searching for it with the hyphen ("weighted-union heuristic"). For this reason I've added it back. Dcoetzee 21:02, 18 April 2008 (UTC)[reply]
Nope, still gives about 200 hits (Google doesn't care much about hyphens), and only 11 hits in Google Scholar. Also, as far as I remember, CLRS never calls it "the" weighted-union heuristic, but rather "a" weighted-union heuristic (i.e., a description of the method instead of a name for it). What do you think about: "When the length of each list is tracked, the time needed can be ameliorated by always appending the smaller list to the longer. Using this "weighted-union heuristic", a sequence of [...]" ? Adrianwn (talk) 06:59, 19 April 2008 (UTC)[reply]
Sure, sounds fine. Dcoetzee 06:08, 30 April 2008 (UTC)[reply]
If step by step instructions have been removed from the article, maybe they could be transferred to wikibooks and linked to from here. —Preceding unsigned comment added by 207.241.238.233 (talk) 00:08, 30 April 2008 (UTC)[reply]

molecular dynamics

[edit]

Why is this citation in the reference section? The article doesn't appear to actually refer to it.

J. A. Spirko and A. P. Hickman, Molecular-dynamics simulations of collisions of Ne with La@C82, Phys. Rev. A 57, 3674–3682 (1998). —Preceding unsigned comment added by 207.241.238.233 (talk) 00:06, 30 April 2008 (UTC)[reply]

Path compression algorithm

[edit]

The find algorithm listed under path compression is the same one as the normal find, it does not include path compression at all. I figure it should be more like the following:

 function Find(x)
     if x.parent == x
        return x
     else
        l = [ x ]
        x = x.parent
        while ( x != x.parent)
           l += x
           x = x.parent
        for y in l
           y.parent = x
        return x

Wppds (talk) 08:02, 20 October 2008 (UTC)[reply]

No, it isn't the same one and yes, it does perform path compression. The difference is in the else-part of the if-statement:
 x.parent := Find(x.parent)
Once Find() terminates, it returns the root node. This root node then becomes the new parent of each traversed node. Adrianwn (talk) 15:35, 20 October 2008 (UTC)[reply]
Right, strange that I missed that. I'll delete this comment in a few days then.Wppds (talk) 09:01, 5 January 2009 (UTC)[reply]
Why? It is neither necessary nor customary to delete comments from a talk page. Often this will be reverted anyway; sometimes it might even be considered vandalism. Don't worry, we all make a mistake sooner or later :-) Adrianwn (talk) 12:38, 5 January 2009 (UTC)[reply]
Heh I figured they were to be cleaned up at some point. Did read the guidelines now, archivable then, when this gets too long. Wppds (talk) 16:08, 7 February 2009 (UTC)[reply]

first bullet

[edit]

in the first section, it says # Find: Determine which set a particular element is in. Also useful for determining if two elements are in the same set. But if this a disjoint set, how can an element be in two sets? —Preceding unsigned comment added by 24.1.30.246 (talk) 16:18, 7 April 2009 (UTC)[reply]

You misread it - it says "determine if two elements are in the same set", not "determine if two sets both contain the same element". Dcoetzee 22:57, 7 April 2009 (UTC)[reply]

Rank Heuristic algorithm error?

[edit]

Oh gosh, I'm on a roll of stupid edits... Sorry for the noise ;-)

Complexity

[edit]

The quote from the paper is unclear: is the Ackerman complexity the least an algorithm must use (a lower bound), or is it an upper-bound (the truly optimal complexity is no more than the Ackerman complexity)? --Gwern (contribs) 01:36 11 January 2010 (GMT)

I think it means that the Ackermann function is an upper bound for all inputs, and a lower bound for some inputs (i.e., the worst case). — Adrianwn (talk) 07:26, 11 January 2010 (UTC)[reply]

How is Union defined ?

[edit]

Is it Union(x,y) and x,y are representatives ? Cormen writes they are not. I am confused. The difference is whether we have to find the roots or not. Webskipper (talk) 15:58, 21 May 2010 (UTC)[reply]

It depends on how Union(x,y) is defined; the pseudocode in this article does not assume that x and y are representatives. – Adrianwn (talk) 16:22, 21 May 2010 (UTC)[reply]


Error in the implementation

[edit]

The implementation listed as Implementation of Disjoint-set Forests in C++, by Bo Tian seems not to update the path (it dont do path compression) which is the hole point. — Preceding unsigned comment added by 85.164.124.173 (talk) 17:57, 20 July 2011 (UTC)[reply]

I've removed the C++ links, as the code was poor quality. Follow the pseudocode in the article instead. The Python implementation also looks reasonably okay. If you're looking to do this in C++ in a production setting you should be using the Boost implementation instead. Dcoetzee 04:46, 21 July 2011 (UTC)[reply]

Linked lists

[edit]

The article claims that concatenation of linked lists in constant time. This is not so, you have to append the second list to the tail of the first one, which requires O(|L1|) where L1 is the first list.

To get such a limit, we need more, such as a pointer to the end of our lists. These are not strictly "linked-lists" and I suggest that this is mentioned.

STL (for C++) provides "splice" on lists which is O(1). To support this, lists are actually doubly linked lists, and for instance in GNU C++ (libstdc++), they are even circular doubly-linked list with a direct access to the last element.

Akim Demaille (talk) 13:33, 23 October 2011 (UTC)[reply]

Keeping a tail pointer for a linked list is a trivial extension and extremely common in practice (I'd argue, more common than not keeping such a tail pointer). I really don't think this requires any further explanation. It is true that STL splice is useful for implementing list merging in C++, but that's an implementation detail and not terribly relevant here. Dcoetzee 20:10, 24 October 2011 (UTC)[reply]

I see another error, or at least a significant omission regarding linked lists. The article claims that by adding a pointer back to head on each node, find can be implemented in constant time, but doesn't explain how this is possible. Find on an linked list typically requires visiting every element, and I don't see how pointers back to head on each node would change this. 75.21.70.205 (talk) 23:02, 25 October 2011 (UTC)[reply]

You're confusing the linked list "find" operation with the disjoint-set data structure "find" operation. The role of the first is to find a list element with a particular value. The role of the second is, given an element, to find its set representative - which we have chosen to be the head of the list that element belongs to. I've edited to attempt to clarify this. Let me know if this can be improved further. Dcoetzee 01:41, 26 October 2011 (UTC)[reply]

Complexity of path compression

[edit]

If I use only path compression without height/rank comparing, what is the complexity of the algorithm? I remember that somebody told me that it would be O(log n) but I'm not sure. It would be nice if the article provide this questionable complexity. --Nullzero (talk) 05:51, 12 February 2013 (UTC)[reply]

A parent-pointe tree of n nodes would have at most `log(n)` layers, which is would be the complexity of the `find()` operation. Thus, even without path compression, the `find()` operation would have .
However, if you're using path compression with the `find()` operation, then optimization would happen as -layers would get compressed to two layers.
I refrained from using the word 'rank' as I'm still trying to figure out its definition. The definition of rank in this article differs from my understanding of rank.
Antmaker (talk) 15:21, 21 April 2022 (UTC)[reply]

Run time of MakeSet in Disjoint-set forests

[edit]

Maybe I'm missing something very simple, but why it said that the run time of MakeSet in the Disjoint-set forests implementation is O(log(n))? Shouldn't it supposed to be O(1)?
Thanks — Preceding unsigned comment added by 79.177.179.58 (talk) 11:00, 14 January 2016 (UTC)[reply]

Fixed. Glrx (talk) 02:54, 16 January 2016 (UTC)[reply]

Can we add a box with the asymptotic runtimes, like some of the other data structures have? — Preceding unsigned comment added by 45.49.18.32 (talkcontribs) 18:18, 23 January 2016

if Condition is incorrect in Find function

[edit]

Operations Section
Find Sub-Section
if condition x.parent != NULL is incorrect it should be x.parent != x, because at the return of function null will be return instead of root i.e when find() will be used in union(), find(x) will return NULL instead of Root — Preceding unsigned comment added by CY5 (talkcontribs) 15:48, 18 February 2017 (UTC)[reply]

Restructuring

[edit]

@Glrx:: I notice that you reverted due to a violation of a citations policy. Yet the reversion took with it much material and labour and resulted in an article that had no clear citation style (unless I'm reading it wrong). If you prefer in-line citations, I'm happy to move citations around if you prefer a different format, but I think the article in its current format (as of 2017-05-31) is significantly improved over the old version.

Similarly, if you think the material regarding linked lists should be retained (I do not), I feel this should occupy a less prominent place in the article. It is unclear to me why anyone would use an _O(n log n)_ algorithm with instead of a _O(α(n))_ algorithm having a smaller space requirement. It seems to me that the linked list alternative is more of a historical footnote than a subject worthy of several sections of discussion. Indeed, an interview I spoke with cited the Wiki article as proof that Disjoint-set was in _O(n log n)_: the old format was actively misleading people.

To all: I've made some major changes to the article as of this version. I note the article was at a C rating with High importance; hopefully, this restructuring will provide a good framework for elevating the article. Is anyone able to comment on further improvements needed?

Best regards, Finog (talk) 17:23, 31 May 2017 (UTC)[reply]

I've reverted you again. You need to follow WP:BRD.
You changed the citations from inline to out-of-line which violates WP:CITEVAR.
The goal of the article is to explain the algorithm. The original text goes from a naive implementation to more sophisticated versions and is a reasonable thing to do. It puts the sophistication in context. In your version, the Time complexity section has no context. Starting with a sophisticated version is OK if you already know the subject, but the readers probably do not know the subject. That's why they came here.
The algorithm is more important than its history. History should be down below.
Some algorithm descriptions were contorted. For example, in MakeSet(x), x is abused.
Glrx (talk) 18:19, 31 May 2017 (UTC)[reply]
@Glrx:: I've reverted your reversions and placed the citations inline, per your preference.
I think the history section should be placed above. This is the case for A* search algorithm, Dijkstra's algorithm, Floyd–Warshall algorithm, and Alpha–beta pruning. As such, it seems to be a common stylistic choice and using it here helps maintain consistency between articles.
Cormen is quoted as the source of the linked-list description. Looking at his Chapter 23 notes, I don't see an attribution for the linked list form - just mention of the forest form. Perhaps he made up the linked list as a pedagogical tool. Given the simplicity of the actual data structure, it seems better to focus on a lucid explanation of that. After all, Wikipedia is not a textbook. If you feel this section is important, it may be better to preface it with a statement that it describes a purposely simplified variant of the algorithm useful only for contextualizing what you feel is a more complex sophisticated variant.
I do not understand your comment about "x being abused". Perhaps you could edit that section of the article to reflect your thoughts on the matter.
I am not sure what WP:BRD is, but gather that it is a cycle in which, rather than collaboratively working on a product, I build something and you trash/revert it. The cycle ends when I build a thing you don't feel like trashing. It seems to me that this is the longest path to satisfaction for you and the most frustrating to me. I hope we can instead build good things together.
Best regards, Finog (talk) 07:16, 2 June 2017 (UTC)[reply]
WP:BRD means you discuss changes on this talk page and get WP:CONSENSUS before you reinsert them.
It makes little sense to discuss who came up with idea X when X has not yet been explained.
My objections are deeper than simple syntactic issues. That said, you broke many reference templates ({{cite book}} changed to {{citation book}}; I've edited the article to a consistent style), inserted spaces where they don't belong (spaces before references; multiple blank line), and made strange edits ("It is also used for implementing..." changed to "It is also a key used for implementing").
I'm not saying that the current exposition is great. The linked list explanation is contorted. There's mention of the "one-pass" Find algorithm, but the article never raises the constant space two-pass Find.
Glrx (talk) 19:08, 2 June 2017 (UTC)[reply]
@Glrx:: I've reverted your changes and updated the citation styles per your advice. Note that many of the citations you believe I changed were, in fact, copied verbatim from the original post: they were probably in an antiquated format. The rest of the citations I populated using these as templates. The citations should now all match the more appropriate format you suggest.
The article has an introduction which explains the gist of the idea. The user does not enter the history section entirely unprepared. Further, the current content of the History section does not presume a technical understanding of the algorithm. And, as I pointed out previously, this seems to be the standard ordering for algorithms articles. Why should this article have a different formatting? If you feel the history section should go elsewhere, why not try to edit the article?
None of the issues you mention require complete reversion. They can all be addressed through simple edits, as your most recent edit aptly demonstrates. If you feel the spacing is amiss, that is an easy thing to fix. I will go through the article and see if I can find/resolve such stylistic issues. Perhaps my unfamiliarity with Wiki's formatting system is at the root.
Best regards, Finog (talk) 19:28, 2 June 2017 (UTC)[reply]
@Glrx:: I've reduced the multiple blank line issue, the spacing before references, and the "a key" wording.
If your objections are deeper, they seem to be something of a moving target. Your stated reason for reverting was originally due to the citation organization. Now I'm not sure what the objections are.
I notice that you have used your administrative powers to put a warning on my account. I hope you appreciate the power differential that is driving this editing process. I am not an experienced editor, but am trying to improve the article and asking you to work constructively with the material I'm trying to build. Further, I've posted in both WikiProject Computing and WikiProject Computer Science looking for feedback on the article. You are rejecting my material wholesale for reasons that I see as easily resolvable via edits and using your knowledge of community arcana and, now, your ability to freeze my account to enforce your view.
We agree that the current exposition is not great, but it seems that, of the two of us, I am the one actively trying to improve that exposition whereas your reversions keep bringing us back to it. If I tire of this process, will you take on the responsibility of improving the exposition alone? If not, can we discuss here how I can improve my material and iterate on it, rather than repeatedly starting from scratch?
Best regards, Finog (talk) 19:54, 2 June 2017 (UTC)[reply]

MakeSet: if x is not already present

[edit]

How would this one line of pseudocode be implemented? I assume there wouldn't be a separate hash table that makes the operation O(1). It seems like if you had to go through the list of trees in the forest, it would potentially be O(n), but that also seems wrong. — Preceding unsigned comment added by Davidvandebunte (talkcontribs) 02:48, 21 December 2017 (UTC)[reply]

The statement makes little sense, and the statement does not exist in earlier versions of this article. See this version. Also see my comment above that states, "Some algorithm descriptions were contorted. For example, in MakeSet(x), x is abused." The statement confuses the notion of the elements existing in an array or being individually allocated from the heap. MakeSet is not allocating, it is only initializing. When an element x is initialized, then its parent and rank need to set. The presence of x is never in doubt. InitSet() might be a better name, but MakeSet() is traditional.
Subsequent edits to the old article introduced significant confusion, poor exposition, and other problems. I didn't get anywhere with Finog.
I think the article should go back to the earlier text.
Glrx (talk) 00:44, 31 December 2017 (UTC)[reply]
I came to mention the same thing. It seem non obvious how (magically)
> if x is not already in the forest then
and
> This operation has constant time complexity.
would make sense, together.
The version you link. is way clearer and to-the-point with great examples which gradually grow in complexity. Completely agree it should go back to the earlier text. 2bam (talk) 19:12, 27 January 2024 (UTC)[reply]

Error in Union pseudocode?

[edit]

The article says "If two sets are unioned and have the same rank, the resulting set's rank is one larger". However, in the pseudocode the rank of the smaller(!) set is incremented by one, while nothing happens if they are the same size. 135.84.132.56 (talk) 17:31, 14 February 2018 (UTC)[reply]

Perhaps the 'union by rank' pseudocode has been updated because I don't see an error in it.
The case when the two sets of the same rank is unioned, the final set has a rank incremented by one.
Antmaker (talk) 15:45, 21 April 2022 (UTC)[reply]

Proof

[edit]

Content merged from Talk:Proof of O(log*n) time complexity of union–find.--Salix alba (talk): 07:05, 23 April 2020 (UTC)[reply]

Formatting makeover

[edit]

The formatting of this page (especially of the mathematics) was pretty cruddy, so I went through and did what I could, fixing up some grammar while I was at it. I haven't actually read the article yet, so it's quite possible I goofed something up by misinterpreting it.Dfeuer (talk) 05:27, 6 July 2012 (UTC)[reply]

Lemma 3

[edit]

Lemma three isn't explicitly mentioned anywhere after its statement. Dfeuer (talk) 06:14, 6 July 2012 (UTC)[reply]

Lemma 2 wrong?

[edit]

The proof of Lemma 2 only considers "Makeset" and "Union", but not "Find"; and indeed path compression will cause a violation of the claim.

This raises (again) the following question: — Preceding unsigned comment added by Martin Ziegler (talkcontribs) 09:33, 4 April 2018 (UTC)[reply]

Original research?

[edit]

Whose proof is this? If it was previously published, is there a copyright problem? If not, is it prohibited as original research? Dfeuer (talk) 09:14, 7 July 2012 (UTC)[reply]

This is about the most lucid and simple proof I have seen of the (non-optimal) fact that Union-Find algs have amortized runtime bounded by O(log* n). I plan to use this in my teaching notes on the topic.

Chris Lacher — Preceding unsigned comment added by 69.254.177.155 (talk) 17:13, 15 December 2013 (UTC)[reply]

Missing steps in that proof?

[edit]

The proof is basically that of Hopcroft and Ullman (1973). However, this short version does not clearly make use of the hypothesis that each Find leads to a path compression, which is crucial for the result to be correct. An extra explanation showing where and how that hypothesis is used seems needed. — Preceding unsigned comment added by Bsalvy (talkcontribs) 21:26, 15 November 2018 (UTC)[reply]

It is used (implicitly) in the argument to bound T_3. I'll try to augment that argument in the article. Preciser (talk) 02:06, 19 January 2019 (UTC)[reply]

Gaps in the proof

[edit]

As I read the current proof, it reads as if the forest is unchanging. The original proof by Hopcroft and Ullman[1] is careful (when defining the ranks, and when proving the analogous lemmas), to carefully account for how the trees change as a result of find and union operations. This issue seems to be swept under the rug in the current proof here, making it very hard to understand the details. Nealeyoung (talk) 23:18, 22 March 2019 (UTC)[reply]

References

  1. ^ SIAM J. Comput. Vol. 2, No. 4, December 1973, Set Merging Algorithms, J. E. Hopcroft and J. D. Ullman

Rank and Height definition

[edit]

I am confused with the definition of rank as the way it is used in this article differs from my understanding.

My understanding is as follow:

  • Rank: Property of a Tree - The number of nodes from the leaf to root including the root itself; thus, a singleton would have a rank of one.
  • Height: Property of a Tree - The number of edge from the leaf to the root; thus a singleton would have a height of zero.

The article's definitions:

  • Rank: Rank is the upper bound of height, and a singleton's rank and height are both 0.

Antmaker (talk) 15:31, 21 April 2022 (UTC)[reply]

Rank as used in this data structure is defined by the algorithm. It is indeed an upper bound on height, where height of a node is the number of edges from the node to its deepest leaf. This notion of height is standard. The notion of rank is not standard but specific to the algorithm. There is another way to define the rank of a node - it is the height that this node would have if no compressions took place. AmirOnWiki (talk) 12:31, 11 September 2023 (UTC)[reply]