I found an interesting problem in Ask Metafilter the other day about sorting people into work groups. The problem is this: suppose you have a group of 90 people and you want to sort them into 3 equal sized teams. To facilitate this you ask them to write down the 5 people they most want to work with. What method (algorithm) should you use to sort these people such that you accomodate the most preferences?

Before I dive in I need to build some model for how I am going to deal with this. Suppose each team is a set of people, and each person has a set of "friends" (the people they want to work with). A person is happy if the intersection between the set of their friends and the set of their team is 5 and the person is least happy when it is zero. We can generalize this to a team, a teams overall happiness is simply the sum of the happinesses of each member, and continuing on the happiness of the whole company is the sum of the happinesses of each team.

This is easy to do in Python using the built in `set`

data structure. Here I define a class `Person`

which is simply an object with a `friends`

property (why can't I make a set of sets? because Python sets cannot contain sets, for reasons that are not worth getting into here).

Now for one big grain of salt for the proceedings, I make a company (a larger set of people) just populating a set with people and randomly assigning each person friends. This is random network and doesn't have the structure you would expect from a real social network (e.g. popular people who everyone wants in their team, unpopular people who everyone shuns)

```
import random
class Person(object):
def __init__(self):
self.friends = set()
def makecomp(size):
company = { Person() for i in range(size)}
for person in company:
person.friends = set(random.sample(company,5))
return company
def teamhappiness(team):
return sum(map(lambda person: len(team & person.friends), team))
def happiness(*teams):
return sum(map(teamhappiness,teams))
```

## Random Partition

As a first kick at the cat, mostly for reference, lets just randomly assign people to each team. This is pretty straightforward using `random.sample(pool, k)`

which randomly pulls *k* samples from the *pool* without replacement.

I want to do this many times so I can get some statistics on it, so I made this an iterator that generates a new company and a new random partition for each iteration.

```
def randpart(compsize,numteams,numtrials):
for i in range(numtrials):
company = makecomp(compsize)
teamsize = compsize//numteams
teams = []
for team in range(numteams):
team = set(random.sample(company, teamsize))
teams.append(team)
company = company - team
yield happiness(*teams)
rscore = [ i for i in randpart(90,3,1000)]
```

As we can see the random partition generates, on average, an overall team happiness of about 150. For comparison the maximum theoretical value is 450 (i.e. 90*5, everyone is just blissed out), though that may not be achievable with any given network.

## Playground Algorithm

The next attempt is what I'm going to call the playground algorithm. First I start with 3 team captains and they pick the next available person their team most wants. Like how you divided into teams on the playground, except that the team captains are remarkably fair and poll their teammates to find out the overall team preference.

First I pick 3 people from random, they are the 1st players in each of their respective teams. Then I go through each team one by one picking the top pick, adding that pick's preferences to the group preference, until there are no more people left. Using a `Counter`

makes this very easy. Once there are very few people left it's possible that there is nobody left that the team wants but there are still people waiting to be assigned (the unpopular kids), in this case one is picked at random.

Again I wrap this whole algorithm in an iterator so I can easily generate many trials, for stats purposes.

```
from collections import Counter
def playpart(compsize,numteams,numtrials):
for i in range(numtrials):
company = makecomp(compsize)
seed = set(random.sample(company,numteams))
preferences = [ Counter(person.friends) for person in seed]
teams = [ {person} for person in seed]
company = company - seed
while len(company)>0:
for i in range(numteams):
try:
choices = preferences[i].most_common()
pick = next(choice[0] for choice in choices if choice[0] in company)
except StopIteration:
pick, = random.sample(company,1)
preferences[i].update(pick.friends)
teams[i].add(pick)
company = company - {pick}
yield happiness(*teams)
pscore = [i for i in playpart(90,3,1000)]
```

Compared to just randomly partitioning the teams this has worked much better. The playground algorithm has an average score around 225.

## Random With Sorting

This is the last method I'm going to try. The idea is to randomly divide the teams, much like in the random partition, then try randomly swapping team members, if the swap improves overall happiness keep it and if it doesn't keep the original team arrangement.

- Pick 2 teams at random
- Randomly pick 1 member from each team.
- Swap them
- If the overall happiness improves, keep the swap
- Repeat.

```
def sorting(teams, trials):
for i in range(trials):
team1, team2 = random.sample(teams,2)
pick1 = set(random.sample(team1, 1))
pick2 = set(random.sample(team2, 1))
newteam1 = (team1 - pick1) | pick2
newteam2 = (team2 - pick2) | pick1
if happiness(newteam1,newteam2) > happiness(team1,team2):
teams.remove(team1)
teams.append(newteam1)
teams.remove(team2)
teams.append(newteam2)
yield happiness(*teams)
```

Over time this does dramatically improve on the random partition. But when do we stop doing these trials? There can be long periods where nothing improves and simply running this for an obscene number of trials is computationally intensive.

I'm going to track *stagnation*, the number of timesteps where the sorting algorithm has failed to improve. Once the algorithm has run for *x* number of timesteps without improving any I shut it down. From the plot above these long stretches of time happen mostly at the end, when any future improvements are probably small anyways. Note that one there is an improvement *stagnation* resets to zero.

```
def sorting(teams, trials, staglimit):
stagnation = 0
for i in range(trials):
team1, team2 = random.sample(teams,2)
pick1 = set(random.sample(team1, 1))
pick2 = set(random.sample(team2, 1))
newteam1 = (team1 - pick1) | pick2
newteam2 = (team2 - pick2) | pick1
if happiness(newteam1,newteam2) > happiness(team1,team2):
stagnation = 0
teams.remove(team1)
teams.append(newteam1)
teams.remove(team2)
teams.append(newteam2)
else:
stagnation += 1
if stagnation>staglimit:
return i
def sortpart(compsize, numteams, numtrials, staglimit=100):
for i in range(numtrials):
company = makecomp(compsize)
teams = []
teamsize = compsize//numteams
for team in range(numteams):
team = set(random.sample(company,teamsize))
teams.append(team)
company = company-team
sorting(teams, 5000, staglimit)
yield happiness(*teams)
dscore = [i for i in sortpart(90,3,1000)]
```

This is an improvement over the playground algorithm, and I suppose if you really wanted to you could fuse the two together. One thing is that the *stagnation* limit really determines the performance of this algorithm. If I am impatient and fussy I can make the performance much worse (say a stagnation limit under 10) however if I have infinite patience I can push this towards whatever the actual limit is for the random network in question.

Above is the results for trying several different stagnation limits from 0 (i.e. just a random partition) up to 900. It seems like the practical limit of overall happiness and contentment for a random friend network is about 250. Again we are faced with the stopping problem of how optimal is "optimal enough" for our partitioning. But like most things in life there is a strong diminishing return.

As usual the ipython notebook is on github