Algoritmi

Different approaches for finding the subsets of a set in Python 3

The first time I thought at this problem was when I worked on testing a component on a work-related project. Back then, I’ve realised that for properly testing the component, I should generate what seemed to be \(2^n\) distinct cases (n being the number of element types). \(2^n\)…odd coincidence or what?

After some thought I realised this was a general answer, as these are the number of subset types that you can generate from a set with n element types. I said types so many times because I want to underline I’m looking at the possible test signature, regardless the content.

Let’s say we have a function, f, which needs three parameters (three is a must, we’ll account for that with an assert). How many possibilities we have if we allow None (or null depending on the programming language) as input? None( or null) is a generic type, usually assigned for objects which, in object oriented programming languages (like Python or Java) is something common to all standard types (that’s because the standard types usually inherit some primitive object like ‘Object’). The answer is 8 because we either put some content in some parameter or we don’t \(\Leftrightarrow\) 0 or 1 so \(2^{\text{# of parameters}}\).

I will be working with Python3, using generators (because of that exponential I want to save the memory). Fortunately, there are multiple ways to generate the powerset of a set (as it’s usually called). It’s quite interesting to see the differences. Overall we have:

  1. Recursive implementation I
  2. Recursive implementation II
  3. Iterative implementation
  4. Using chain and combinations from itertools

Recursive implementation I

I really like this one (probably because it’s the only one I drafted top to bottom, the others being, with a little retouch, gathered). On this one I thought in an induction like manner…This problem, of size n it’s the same as it’s smaller cousin, of size n-1, plus something. I think it’s a nice way of using backward recursion, building the solutions as we exit from the recursive call.

I want to emphasise a little on one point. This solution has, although ‘hidden’, the idea that’s used in all the recursive implementations – that is, the full solution S([H|T])=[H|S(T), S(T)] (H – being the head element of the initial list, T being the tail (the rest of the elements from the list)). For those with some Prolog background this should look very intuitive.

Looking at the code below you might ask, were do we concatenate these two solutions (the one with H and the one without it). The answer is in the for loop, taking into account the yield [] part.

def classical_recursive_one(elems):
    yield []  # first return the result we're sure about
    for i in range(len(elems)):
        for x in classical_recursive_one(elems[i+1:]):  # induction part
            yield [elems[i]] + x

sth = ['neigh', 'category', 'purpose']
for x in classical_recursive(sth):
    print(x)
[] 
['neigh']
['neigh', 'category']
['neigh', 'category', 'purpose']
['neigh', 'purpose']
['category']
['category', 'purpose']
['purpose']

Recursive implementation II

This one uses backward recursion again, and the idea is that the solution is constructed either by taking the element at the current position or not:

def classical_recursive_two(elems):
    """ Given a list of elements return a generator 
    that will generate all the subsets"""
    if  len(elems) <= 1:
        yield elems
        yield []
    else:
        for item in classical_recursive_two(elems[1:]):
            yield [elems[0]] + item
            yield item

sth = ['neigh', 'category', 'purpose']
for x in classical_recursive_two(sth):
    print(x)
['neigh', 'category', 'purpose'] 
['category', 'purpose']
['neigh', 'purpose']
['purpose']
['neigh', 'category']
['category']
['neigh']
[]

I will explain the else part. That’s were all the fun is. That part reflects the two decision branches I’ve spoken above when constructing solutions. So for all the possible solutions of the subproblem (this is why the for part is needed), construct the two NEW possible solutions via 2 yields. Easy peasy…

Iterative implementation

This is the old school way to do it. It’s elegant nonetheless. It uses the incredible property that all numbers from 1 up to \(2^n\) are distinct. If we would write in base 2…those 1 and 0 bits could be interpreted like: “Take the element from the list if 1, don’t take it if 0”. But hey, we can actually do this with some bit operation magic:

def classical_iterative(elems):
    powerset_size = 2**len(elems)
    counter = 0
    j = 0
    
    for counter in range(0, powerset_size):
        results = []
        for j in range(0, len(elems)):
            # take the element if on bit position j it says to take it (i.e. 1)
            if((counter & (1 << j)) > 0):
                results.append(elems[j])
        yield results
        
sth = ['neigh', 'category', 'purpose']
for x in classical_iterative(sth):
    print(x)
[] 
['neigh']
['category']
['neigh', 'category']
['purpose']
['neigh', 'purpose']
['category', 'purpose']
['neigh', 'category', 'purpose']

Using chain and combinations from itertools

This one is for the lazy of hand…but a smart use of itertools nonetheless. chain is used to treat multiple sequences as a single sequence and combinations to generate…all the possible combinations. What’s so special about this and how it’s related to our problem anyway. It just so happens that the sum of all the possible combinations up to n is actually \(2^n\). This one is a mouthful and some time to actually appreciate this property is needed…essentially, if you don’t understand this solution, start by trying to understand this property: \(\sum_1^n \binom{n}{j} = 2^n\).

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

sth = ['neigh', 'category', 'purpose']
for x in powerset(sth):
    print(list(x))
[] 
['neigh']
['category']
['purpose']
['neigh', 'category']
['neigh', 'purpose']
['category', 'purpose']
['neigh', 'category', 'purpose']

Ok, now that we’ve sketched the solutions which one is actually faster. Let’s test it out:

from time import time

test_elems = list(range(1, 16))

def time_me(func, elems, title):
    start = time()
    # Because it's a generator we want to evaluate it
    for x in func(elems):
        x
    end = time()
    print(title, end-start)
    
time_me(classical_recursive_one, test_elems, "Classical Rec. I: ")
time_me(classical_recursive_two, test_elems, "Classical Rec. II: ")
time_me(classical_iterative, test_elems, "Classical Iterative: ")
time_me(powerset, test_elems, "Using itertools: ")
Classical Rec. I:  0.0767221450805664 
Classical Rec. II: 0.008540868759155273
Classical Iterative: 0.09819293022155762
Using itertools: 0.0015170574188232422

My method is almost the worst in terms of speed after the iterative one…Nevermind, now we know which one to use in production.

Write A Comment