I don't know the name of this algorithm (if it has one). I came up with it when I was parallelizing some code to process graphs. It is so simple I thought it could be helpful. To remove redundancy we need to do a lot of comparisons, but we cannot use fully independent parallelism because the data could overlap / be redundant. 

By pipelining the merge via sequential set subtractions, we can get a big speed up. We use the concept of set subtractions, but implement for elements. To initialize, we partition the already merged set (could be empty) across the processors. Elements that have not been eliminated (are not redundant) get added to the set that is being checked against (the partitioned merge set), so that new elements will be checked against it. 

keep performance from degrading, as unique elements accumulate (at the last processor), the new elements are moved to the sets of earlier processors. A processor can decide to move a new element back (to an earlier processor) using a probability based on the processors position in the sequence ( P = (p - 1)/p ). The procedure to move an element requires that a processor adds the element right away, but can only remove an element after it has processed all the elements in its input buffer plus one. Each processor also has to be able to prevent the earlier processor sending it more elements to check when its buffer is full. 

Note that all communication is one dimensional.