Monday, June 7, 2010

More on Pareto Distributions

In 1896, Vilfredo Pareto researched the distribution of income and wealth patterns of different countries and in different times. Through this research he found that, across the population of a country, both income and wealth are distributed in a special pattern - one that follows a function: log(N) = log(A) + m log(x) [where N represents the number of people with wealth greater than x, and where A and m are constants]. After fitting the data to this function, Pareto found that about 20% of the people controlled roughly 80% of the wealth in a country - and that this relationship held across the different times and places he studied. This finding was the birth of the "80/20 Rule."

Pareto’s equations didn’t stop there though. They didn’t just predict the data at the 80/20 point and stop. They predicted an entire probability distribution. If you follow the logic of the Pareto Principle:
  • the bottom 80% of the population controls only 20% of a country's wealth = “the poor”
  • the top 20% of the population controls 80% of the wealth = “wealthy”
  • the top 4% of the population (20% of 20%) controls 64% (80% of 80%) of the total wealth = “very wealthy”
  • the top 0.8% of the population controls 51% of the total wealth = “the super wealthy”

This type of distribution isn't just limited to income and wealth distributions. We see Pareto and power law distributions in the financial markets, geology, physics, politics, traffic patterns, the internet and biology - just to name a few. For example, you might see the Pareto distribution in the occurrence of forest fires in California. If we applied the same 80/20 pattern here you might find that:
  • the smallest 80% of the fires in California cause only 20% of the damage - most of these fires would not be reported or even noticed.
  • the largest 20% of the fires cause 80% of the damage
  • the top 4% of the fires cause 64% of the damage
  • the top 0.8% of the fires cause 51% of the damage - these are the ones that make the news.
As a disclaimer. I used the 80/20 rule here just to illustrate the basic idea of the Pareto distribution. In reality, the idea of a Pareto distribution encompasses an infinite number of distributions which follow the function given above - but you can think of them as all having roughly the same shape and ideas of the 80/20 rule.

One thing is clear - this new type of distribution is not normal. By that I mean it isn't a “normal” distribution or bell curve you might have studied in college. A "normal" distribution tends to be well behaved in a statistical sense. It describes environment where wild events happen very rarely and when they do, they aren't so wild that they mess up the averages. Pareto distributions (and power law distributions) on the other hand can describe environments where wild events are also rare, but when they do happen they can be very, very wild - far wilder than would be conceivable using a normal distribution. Out of the thousands of earthquakes that occur every year, most are not even felt, but then one occurs which destroys a city. Or after months and months of "normal" financial markets, we get a massive and rapid selloff. These events wouild be typical of power law distributions.

What causes these distributions to show up? We don’t know yet. But one theory is that they all come from types of environments that can be structured as networks. I believe that the Pareto/power law distribution is in fact a signature or fingerprint of a networked environment. Along the same line of thinking, if you are dealing in an environment that operates as a network, you should expect to see this type of distribution to show up. If this theory is true, it provides the answers as to why we see the 80/20 rule so often in business settings. Business environments are all about networks - supply chain networks, computer networks, networks of traders, all sorts of networks but most importantly - social networks.

No comments:

Post a Comment