Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
625 views
in Technique[技术] by (71.8m points)

algorithm - Python - Removing overlapping lists

Say I have a list of lists that has indexes [[start, end], [start1, end1], [start2, end2]].

Like for example :

[[0, 133], [78, 100], [25, 30]].

How would I get check for overlap between among the lists and remove the list with the longer length each time? So:

>>> list = [[0, 133], [78, 100], [25, 30]]
>>> foo(list)
[[78, 100], [25, 30]]

This is what I tried to do so far:

def cleanup_list(list):
    i = 0
    c = 0
    x = list[:]
    end = len(x)
    while i < end-1:
        for n in range(x[i][0], x[i][1]):
            if n in range(x[i+1][0], x[i+1][1]):
                list.remove(max(x[i], x[i+1]))
        i +=1
    return list

But in addition to being kind of convoluted it's not working properly:

 >>>cleanup_list([[0,100],[9,10],[12,90]])
 [[0, 100], [12, 90]]

Any help would be appreciated!

EDIT:

Other examples would be:

>>>a = [[0, 100], [4, 20], [30, 35], [30, 78]]
>>>foo(a)
[[4, 20], [30, 35]]

>>>b = [[30, 70], [25, 40]]
>>>foo(b)
[[25, 40]]

I'm basically trying to remove all of the longest lists that overlap with another list. In this case I don't have to worry about the lists being of equal length.

Thanks!!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To remove a minimal number of intervals from the list such that the intervals that are left do not overlap, O(n*log n) algorithm exists:

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point
    L = sorted(intervals, key=lambda (start, end): (end, (end - start)),
               reverse=True) # O(n*logn)
    iv = build_interval_tree(intervals) # O(n*log n)
    result = []
    while L: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(L.pop()) # O(1)
        # remove intervals that overlap with the popped interval
        overlapping_intervals = iv.pop(result[-1]) # O(log n + m)
        remove(overlapping_intervals, from_=L) 
    return result

It should produce the following results:

f = maximize_nonoverlapping_count
assert f([[0, 133], [78, 100], [25, 30]]) == [[25, 30], [78, 100]]
assert f([[0,100],[9,10],[12,90]]) == [[9,10], [12, 90]]
assert f([[0, 100], [4, 20], [30, 35], [30, 78]]) == [[4, 20], [30, 35]]
assert f([[30, 70], [25, 40]]) == [[25, 40]]

It requires the data structure that can find in O(log n + m) time all intervals that overlap with the given interval e.g., IntervalTree. There are implementations that can be used from Python e.g., quicksect.py, see Fast interval intersection methodologies for the example usage.


Here's a quicksect-based O(n**2) implementation of the above algorithm:

from quicksect import IntervalNode

class Interval(object):
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self.removed = False

def maximize_nonoverlapping_count(intervals):
    intervals = [Interval(start, end) for start, end in intervals]
    # sort by the end-point
    intervals.sort(key=lambda x: (x.end, (x.end - x.start)))   # O(n*log n)
    tree = build_interval_tree(intervals) # O(n*log n)
    result = []
    for smallest in intervals: # O(n) (without the loop body)
        # pop the interval with the smallest end-point, keep it in the result
        if smallest.removed:
            continue # skip removed nodes
        smallest.removed = True
        result.append([smallest.start, smallest.end]) # O(1)

        # remove (mark) intervals that overlap with the popped interval
        tree.intersect(smallest.start, smallest.end, # O(log n + m)
                       lambda x: setattr(x.other, 'removed', True))
    return result

def build_interval_tree(intervals):
    root = IntervalNode(intervals[0].start, intervals[0].end,
                        other=intervals[0])
    return reduce(lambda tree, x: tree.insert(x.start, x.end, other=x),
                  intervals[1:], root)

Note: the time complexity in the worst case is O(n**2) for this implementation because the intervals are only marked as removed e.g., imagine such input intervals that len(result) == len(intervals) / 3 and there were len(intervals) / 2 intervals that span the whole range then tree.intersect() would be called n/3 times and each call would execute x.other.removed = True at least n/2 times i.e., n*n/6 operations in total:

n = 6
intervals = [[0, 100], [0, 100], [0, 100], [0, 10], [10, 20], [15, 40]])
result = [[0, 10], [10, 20]]

Here's a banyan-based O(n log n) implementation:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # sort by the end-point O(n log n)
    sorted_intervals = SortedSet(intervals,
                                 key=lambda (start, end): (end, (end - start)))
    # build "interval" tree O(n log n)
    tree = SortedSet(intervals, updator=OverlappingIntervalsUpdator)
    result = []
    while sorted_intervals: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(sorted_intervals.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
        sorted_intervals -= overlapping_intervals # O(m log n)
    return result

Note: this implementation considers [0, 10] and [10, 20] intervals to be overlapping:

f = maximize_nonoverlapping_count
assert f([[0, 100], [0, 10], [11, 20], [15, 40]]) == [[0, 10] ,[11, 20]]
assert f([[0, 100], [0, 10], [10, 20], [15, 40]]) == [[0, 10] ,[15, 40]]

sorted_intervals and tree can be merged:

from banyan import SortedSet, OverlappingIntervalsUpdator # pip install banyan

def maximize_nonoverlapping_count(intervals):
    # build "interval" tree sorted by the end-point O(n log n)
    tree = SortedSet(intervals, key=lambda (start, end): (end, (end - start)),
                     updator=OverlappingIntervalsUpdator)
    result = []
    while tree: # until there are intervals left to consider
        # pop the interval with the smallest end-point, keep it in the result
        result.append(tree.pop()) # O(log n)

        # remove intervals that overlap with the popped interval
        overlapping_intervals = tree.overlap(result[-1]) # O(m log n)
        tree -= overlapping_intervals # O(m log n)
    return result

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...