Eight years ago, I had finished my first year of graduate school in math, and I was at a loss as to what to research. My original focus, differential geometry, was a beautiful subject to learn about, but the open research questions were too abstract and technical to sustain my interest. I wanted something more relevant to the real world, something I could talk to people about.
Looking for new ideas, I took a course in complex systems, run by the New England Complex Systems Institute. The director, Yaneer Bar-Yam, had pioneered a new way of representing structure in a systems. I was fascinated by this idea but also puzzled. As a mathematician, I wanted to understand the basis of this idea. What assumptions does it rely on? How are its basic concepts defined?
My attempt to answer these questions turned into one of the longest and most demanding projects I’ve worked. After an eight-year collaboration Yaneer and my friend Blake Stacey, we finally have a preliminary manuscript up on the web. It is currently under review for publication. And to my pleasant surprise, we got a nice write-up in ScienceNews.
So what is this project all about? The idea is that we're using information theory (which I've written about previously) as a tool to represent and quantify the structure of a system.
Before I explain what any of this means, let's consider some motivating examples. Here's a system (call it system A):
You wouldn't really call this a complex system. It has only one component (a ball) that bounces around in a fairly simple way. Since there's not much to see here, let's turn to system B:
This system has many particles, which bounce around and bump into each other. In one sense, this system is quite complex: it is very difficult to describe or predict its exact state at any given time. But looking beyond the level of individual particles reveals a kind of simplicity: since the particles behave independently of each other, overall measures such as the average particle velocity or the rate of collisions are relatively stable. In other words, the individual complexity "averages out", so that on the whole, the system behaves quite simply.
Contrast that to the behavior of system C:
This is a murmuration of starlings. The starlings fly in a semi-coordinated, semi-independent way, creating intricate shapes and patterns that you would never observe in systems A and B. This is a prototypical "complex system"—the kind that has intrigued researchers since the 70's.
It is intuitively clear that systems A, B, and C have entirely different kinds of structure. But it is surprisingly difficult to capture this intuition mathematically. What is the essential mathematical property of system C that can allow us to differentiate it from A and B?
We try to answer this question using information theory. Information theory was first invented by mathematician Claude Shannon in 1948 to address problems of long-distance communication (e.g. by telegraph) when some signals may be lost along the way. Shannon's ideas are still used, for example, in the development of cell phone networks. But they also have found applications in physics, computer science, statistics, and complex systems.
To explain the concept of information, let's look at a system consisting of a single blinking light:
This is one of the simplest systems you could possibly imagine. In fact, we can quantify this simplicity. To describe the state of the system at any given time, you only have to answer one yes/no question: "Is the light on?"
The amount of information conveyed in one yes/no question is called one bit. "Bit" is short for "binary digit", and is the same unit used to quantify computer memory. In other words, the state of this light can be described in one binary digit, 0 for OFF and 1 for ON.
Now let's add another light:
Let's say these lights are statistically independent. This means that knowing the state of one doesn't tell you anything about the other. In this case, to identify the state of the system requires two bits of information—that is, two yes/no questions, one for the first light and one for the second. We can depict this situation with a diagram like this:
The circles are drawn separately, since information describing one of them tells us nothing about what the other is doing. We could say that each of these bits applies at "scale one", since each describes only a single light bulb.
Here are two lights that behave in a completely different fashion:
Note that the two light bulbs are always either both on or both off. Thus, even though there are two components, the system can still be described by a single bit of information—a single yes/no question. The answer to this question (e.g. "are they on?") applies to both bulbs at once. The "information diagram" for this system looks like two completely overlapping circles:
We could say that the one bit of information describing this system applies at "scale two", since it describes two light bulbs at once.
A more interesting case occurs between these two extremes:
It's hard to see it, but I've animated these bulbs to be in the same state 3/4 of the time, and the opposite state 1/4 of the time. If I told you the state of the first bulb, you wouldn't completely know the state of the second, but you could make an educated guess. Specifically, if I told you the first bulb is ON, you could guess that the second is ON too, and you'd be right 75% of the time. So there is information overlap: Information about the first bulb gives partial information about the second. In fact, we can use Shannon's formulas to actually calculate how much overlap there is: approximately 0.19 bits. So if you know the state of the first bulb (1 bit), then you also know 0.19 bits about the second bulb—not enough to know its state with certainty, but enough to make a guess that is 75% accurate. The overlapping information can be depicted like this:
As you can see, 0.19 bits of information apply to both light bulbs at once (scale two), while the remaining 0.81+0.81=1.62 bits apply only to a single bulb (scale one).
In principle, these "information diagrams" (we call them dependency diagrams) exist for any system. Highly ordered systems, like system A above, have lots of overlapping, large-scale information. Highly disordered systems like B have mostly small-scale, non-overlapping information. The systems that are most interesting to complex-systems researchers, like the starlings in example C, have lots of partial overlaps, with information distributed over a wide range of scales.
And that's the basic premise of our theory of structure. The structure of a system is captured in the overlaps of information describing different components, and the way information is distributed across scales. While we take these concepts quite a bit further in our paper, the central idea is right here in these blinking lights.
Thanks for reading!
Looking for new ideas, I took a course in complex systems, run by the New England Complex Systems Institute. The director, Yaneer Bar-Yam, had pioneered a new way of representing structure in a systems. I was fascinated by this idea but also puzzled. As a mathematician, I wanted to understand the basis of this idea. What assumptions does it rely on? How are its basic concepts defined?
My attempt to answer these questions turned into one of the longest and most demanding projects I’ve worked. After an eight-year collaboration Yaneer and my friend Blake Stacey, we finally have a preliminary manuscript up on the web. It is currently under review for publication. And to my pleasant surprise, we got a nice write-up in ScienceNews.
So what is this project all about? The idea is that we're using information theory (which I've written about previously) as a tool to represent and quantify the structure of a system.
Before I explain what any of this means, let's consider some motivating examples. Here's a system (call it system A):
You wouldn't really call this a complex system. It has only one component (a ball) that bounces around in a fairly simple way. Since there's not much to see here, let's turn to system B:
Source: Wikimedia Commons |
Contrast that to the behavior of system C:
Source: A Bird Ballet by Niels Castillon |
It is intuitively clear that systems A, B, and C have entirely different kinds of structure. But it is surprisingly difficult to capture this intuition mathematically. What is the essential mathematical property of system C that can allow us to differentiate it from A and B?
We try to answer this question using information theory. Information theory was first invented by mathematician Claude Shannon in 1948 to address problems of long-distance communication (e.g. by telegraph) when some signals may be lost along the way. Shannon's ideas are still used, for example, in the development of cell phone networks. But they also have found applications in physics, computer science, statistics, and complex systems.
To explain the concept of information, let's look at a system consisting of a single blinking light:
This is one of the simplest systems you could possibly imagine. In fact, we can quantify this simplicity. To describe the state of the system at any given time, you only have to answer one yes/no question: "Is the light on?"
The amount of information conveyed in one yes/no question is called one bit. "Bit" is short for "binary digit", and is the same unit used to quantify computer memory. In other words, the state of this light can be described in one binary digit, 0 for OFF and 1 for ON.
Now let's add another light:
Let's say these lights are statistically independent. This means that knowing the state of one doesn't tell you anything about the other. In this case, to identify the state of the system requires two bits of information—that is, two yes/no questions, one for the first light and one for the second. We can depict this situation with a diagram like this:
The circles are drawn separately, since information describing one of them tells us nothing about what the other is doing. We could say that each of these bits applies at "scale one", since each describes only a single light bulb.
Here are two lights that behave in a completely different fashion:
Note that the two light bulbs are always either both on or both off. Thus, even though there are two components, the system can still be described by a single bit of information—a single yes/no question. The answer to this question (e.g. "are they on?") applies to both bulbs at once. The "information diagram" for this system looks like two completely overlapping circles:
We could say that the one bit of information describing this system applies at "scale two", since it describes two light bulbs at once.
A more interesting case occurs between these two extremes:
It's hard to see it, but I've animated these bulbs to be in the same state 3/4 of the time, and the opposite state 1/4 of the time. If I told you the state of the first bulb, you wouldn't completely know the state of the second, but you could make an educated guess. Specifically, if I told you the first bulb is ON, you could guess that the second is ON too, and you'd be right 75% of the time. So there is information overlap: Information about the first bulb gives partial information about the second. In fact, we can use Shannon's formulas to actually calculate how much overlap there is: approximately 0.19 bits. So if you know the state of the first bulb (1 bit), then you also know 0.19 bits about the second bulb—not enough to know its state with certainty, but enough to make a guess that is 75% accurate. The overlapping information can be depicted like this:
As you can see, 0.19 bits of information apply to both light bulbs at once (scale two), while the remaining 0.81+0.81=1.62 bits apply only to a single bulb (scale one).
In principle, these "information diagrams" (we call them dependency diagrams) exist for any system. Highly ordered systems, like system A above, have lots of overlapping, large-scale information. Highly disordered systems like B have mostly small-scale, non-overlapping information. The systems that are most interesting to complex-systems researchers, like the starlings in example C, have lots of partial overlaps, with information distributed over a wide range of scales.
And that's the basic premise of our theory of structure. The structure of a system is captured in the overlaps of information describing different components, and the way information is distributed across scales. While we take these concepts quite a bit further in our paper, the central idea is right here in these blinking lights.
Thanks for reading!
In the geosciences many people use the autocorrelation function to describe the temporal structure of a time series. This is derived by computed the linear correlation for pairs of data points separated by certain temporal difference, for a large number of temporal differences.
ReplyDeleteTheoretically it is possible that the scatterplot of the pairs shows no linear correlation, but that there is a relationship. For example all points could lie on a circle. With your dependency measure (entropy) you could see the difference between an uncorrelated cloud of points and point lying, e.g. on a circle.
Do you know of anyone using an "auto-entropy function" to describe the temporal structure of a time series? Do you know of any applications where such a description of the temporal behaviour of a time series might be useful?
That's an excellent point, and I've never thought about it before! I think it could indeed be useful do define "entropic warning signals". I know that systems sometimes cycle as they approach a tipping point. Entropy measures could detect the difference between this cycling (which could mean trouble) and random noise (meaning everything is normal).
DeleteFantastic idea! As to whether it's been done before, I don't know offhand.
Nice intro into the Information Theory. Too bad it took that much space. I would very much love to read more about how all this related to the ideas presented in the newest paper "An Information-Theoretic Formalism for Multiscale Structure in Complex Systems" and in your project to characterize structure of complex systems in general.
ReplyDeleteThanks! What we do, beyond traditional information theory, is that we take the "scale" idea seriously. In our framework, all information applies at a particular scale. Large-scale information tells you a lot about the system, whereas small-scale information applies only to a isolated parts.
DeleteIn any system, there is a tradeoff between the total entropy (degree of freedom) and the scale at which it can act. For example, if ants want to move a large object, they need to behave in a highly coordinated fashion, reducing their collective degree of freedom. In return, they are able to perform a large-scale action.
Our paper gives a mathematical formalism for quantifying this tradeoff.
Great intro. Raises a few questions in my mind.
ReplyDeleteComplex system outcomes that conform to normal statistical distributions lend themselves to "information compression", i.e. one can describe the system using mean and variance. However, most complex systems either do not output stable distributions, or their distributions do not conform to normal. In that case, their "compressibility" is lower/Shannon entropy higher. So the challenge with complex systems is that self-organization implies some lower entropy/higher order, but it is tough to translate this into a lower Shanon entropy. To do so requires discovering a shorthand way to describe the more-or-less stable higher-order dynamic at work. The lightbulb example is one in which the system outputs a nice statistical relationship. I just wonder what happens when there is true novelty produced, one that is also subject to non-normal behavior like feedback loops and/or phase shifts? I suppose it will be in your paper, but anticipating that it may be quite technical, perhaps you could explain how the overlap concept works at higher levels of complexity?
Good question! I think the key to answering it is to think of information applying at different scales (see my reply to observingideas above).
DeleteOf systems A, B, and C in my post, system B has the highest entropy. To describe it at the smallest scale requires a maximal amount of information. But if you want to look at its large-scale behavior, then the central limit theorem applies and you can describe very simply it using summary statistics like mean and variance (as you say).
The starlings (example C) have reduced information at the smallest scale. This reduction occurs because information describing their individual movements is partly redundant: knowing the motions of one tells you something about what its neighbors are doing. But it is this very redundancy that enables their large-scale behavior, which does not "average out" when we look above the level of the individual.
In our formalism, we would say that system C has less total information that system B, but the information in system C applies at a larger scale.
The "true novelty" question is difficult for our formalism to answer, since it is purely descriptive. We address the "what" quantitatively, but the "why" and "how" must be left to future work.
I agree -- looking at the scale that is producing order (i.e. driving self-organization) helps to convert the system to a lower Shannon entropy. I am looking at doing this informally in finance, management and economics. My approach is to tag recurrent dynamics produced by complex systems and thereby compress information about the system. A good example is identifying a positive feedback loop and its associated resource constraint. Just doing this would have helped to predict the 2008 crisis (the feedback loop was in credit/housing, and the resource constraint was the population that qualified for a mortgage at the lower bound of mortgage underwriting standards). This alone would be an improvement over macroprudential analysis that tends to look only at mean and variance as system descriptors.
ReplyDeleteWhat I would like to see is a merging of informal and formal approaches, and an explicit "fine-tuning" of the trade-off between rigor and relevance to decision making. Analysis of simple systems (mean/variance) have just such a "mix", and complex systems will get there eventually.
Very interesting paper. The concepts suggest applications in computer science, and in particular to the area of software complexity metrics, an area that has long needed an objective, solid foundation.
ReplyDeleteCurrently, most software complexity metrics produce a single, scalar value. What we need is something like your complexity profile, which would give a series of metrics, one per scale. For a well-structured software system (with low coupling and high cohesion), the metric should give a result similar to your metric on a system of largely independent components. For a poorly-structured software system, the complexity profile should be relatively flat.
One problem I encounter in applying your idea is that software really only has low-level couplings: Software components don't refer to other components in their entirety. Rather, lots of individual lines of code in one reference individual lines of code in others. But if I account for dependencies at the lowest scale, then there is nothing to count at higher scales. Yet intuitively, software modules and sub-systems do depend on other ones.
Are you aware of anyone employing your ideas in this area? If so, how have they adapted it to quantifying software complexity?
Thank you very much.