Hedge funds work on a simple wisdom-of-the-crowd principle: Because they involve an extremely large number of transactions, which smooths out the vagaries of individual transactions, the movements of financial markets follow predictable patterns, which can be discerned from a study of their past behavior. Deviations from the patterns tend to be shortlived, and by making huge bets that the deviations will quickly return to the norm, you can make a whole lot of money. As we’ve seen recently, though, things aren’t quite as simple as the hedge fund operators assume. Sometimes, very weird things happen and the deviations become either larger or longer-lived than expected, at which point the big bets can unravel in very unpleasant ways.
Peer-to-peer networks, which also involve lots and lots of different actors doing lots and lots of different things for lots and lots of different reasons, work in a similar way, and a company like Skype, whose telephone network is designed to run on many thousands of computers spread across a big P2P network, has built its system on the assumption that usage patterns are predictable – even if the actions of any individual user are not. Last week, Skype ran head-on into the hedge fund problem. The network’s behavior deviated from the norm in a way that was greater than the Skype engineers had planned for, and the system crashed. The catalyst was the distribution of a routine patch from Microsoft, which led to a cascade of unanticipated effects, as Skype’s Villu Arak explains:
The Microsoft Update patches were merely a catalyst — a trigger — for a series of events that led to the disruption of Skype, not the root cause of it … The high number of post-update reboots affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources at the time, prompted a chain reaction that had a critical impact. The self-healing mechanisms of the P2P network upon which Skype’s software runs have worked well in the past … Unfortunately, this time, for the first time, Skype was unable to rise to the challenge and the reasons for this were exceptional. In this instance, the day’s Skype traffic patterns, combined with the large number of reboots, revealed a previously unseen fault in the P2P network resource allocation algorithm Skype used. Consequently, the P2P network’s self-healing function didn’t work quickly enough. Skype’s peer-to-peer core was not properly tuned to cope with the load and core size changes that occurred on August 16.
As our economy becomes ever more tightly and intricately networked, its continued operation will hinge on the assumptions that mathematicians and software engineers embed in the code that underpins it. Usually, the assumptions will hold. But usually isn’t always. Weird things happen, even in the largest of crowds.