A *complex dynamical system* is a network of dynamic components in which neighboring ones interact with each other. The interaction between components is often nonlinear, from which fascinating large-scale behavior emerges that are not apparent directly from the local dynamics and interactions. For instance, complete knowledge of how individual neurons in brain operate does not directly tell us how we perceive and think; the microscale description of human genome means less when it comes to the gene’s macroscale effect (e.g., phenotype). Hence the study of complex systems usually requires a holistic approach based on the right choice of mathematical model which captures the essential feature of the system, as well as ingenious ideas for its rigorous analysis.

In a general mathematical framework, a complex system can be described by specifying three different kinds of data structure: the way the components are connected to each other (*topology*), states each component assume (*state space*), and the way the components evolve their states in time (*coupling*). A fundamental question in the study of complex systems is the following: *how do these three constituents influence the emergent global dynamics?* My approach to this general question is the “combinatorialist’s strategy”: We first study some concrete discrete models, learn some fundamental principles and techniques, and then try to generalize them into a broader context.

In a series of three papers, I followed this line of approach to study “synchronization of pulse-coupled oscillators”. In the first two papers, I proposed and studied the -color firefly cellular automata (FCA), a discrete model for pulse-coupled oscillators, and showed that arbitrary initial configuration synchronizes on finite trees iff . In the third paper, I generalized this discrete model into a continuous one and adapted a similar technique to show an analogous result. An auxiliary dynamics was added to surpass degree constraint, and by composing with a distributed algorithm which constructs a spanning tree of a given network, this led to a fast universal clock synchronization algorithm with minimal memory and communication requirement.

Figure 1: Two examples of synchronizing 6-color FCA are shown in (a) and (b). In (c) and (d), the last configurations are rotation symmetric to the initial ones, so the networks do not synchronize.

Figure 2: Phase Response Curve (PRC) for the adaptive 4-coupling. Oscillators have auxiliary state variable *s *in {0,1,2} and adaptively changes their PRC accordingly.

One infinite one-dimensional lattice, randomly initialized discrete models often lead to some interesting connections with familiar random objects, such as random walks and random Young diagrams. The key insight I have learned is that on this restricted topology, information propagation is conducted as a simple but possibly correlated particle system, which is the bridge to related the long-term dynamics with a random object associated to the initial environment. For example, the synchrony of -color FCA on boils down to studying persistence of random walks with dependent increments. The box-ball system, also known as soliton cellular automata invented by Takahashi and Satsuma, relates to birth-death chains, Galton-Watson forests, and excursions of Brownian motion.

Figure 3: Simulation of 20-color (top) and 3-color (bottom) FCA on a path of 400 nodes for and iterations, respectively. The top rows are random -color initial colorings drawn from the uniform product measure, and iterations generates each successive row, from top to bottom. The color scales are shown to the right with and being the blinking colors in each case.

Figure 4: Construction of a Young diagram via hill flattening procedure from the Motzkin path associated with a box-ball configuration *X*. The bottom row is the box-ball configuration *X* and the black path is . Trapezoidal regions of label *i* are the hills of , each of which becomes a distinct cell in the *i* th row of . The resulting Young diagram is depicted in the upper left corner.

For some nice discrete models such as the 3-color cyclic cellular automata (CCA) and Greenberg-Hastings model (GHM), lifting the dynamics on a given graph onto its universal covering space reveals some hidden monotonicity, which resembles Lamport’s concensus algorithm in distributed systems. This led to a complete characterization of limiting behavior of these models on arbitrary topology in terms of contour integrals of discrete vecter field induced from the initial environment. On infinite trees, this connects the behavior of the system to some notions of speeds of tree-indexed random walks. Another nice model is called the parking process, for which recursion on the first moment and mass-transport principle yielded a sharp phase transition with respect to the density of cars.

Figure 5: (Top row) Snapshots of 3-color CCA on a uniform spanning tree of a 100 by 100 torus, each 100 iterations from left to right. (Second row) Dynamics after 12 random edges are added to the spanning tree. Orange =0, green=1, and yellow=2. Corresponding simulations for 3-color GHM are shown in the third and fourth rows. Dark blue=0, yellow=1, and red=2.

Still a lot of interesting questions concerning the above mentioned models are out of reach of current technologies: extending the clock synchronization algorithm for non-identical frequencies, autowave phenomenon of higher color FCA on lattices, mysterious clustering behavior of the 4-color FCA on , nucleation behavior of higher color CCA and GHM on infinite trees, higher dimensional analogue of soliton cellular automata, and fixation of parking process with coalescing or branching cars, just to name a few.

Figure 6: Clustering of the 4-color FCA on square lattice. Sites take one of the four colors from {0,1,2,3}. Colors 0 and 1 automatically advances to 1 and 2, respectively. Colors 2 and 3 advances to 3 and 0 if no neighbor has color 1, and otherwise they don’t update.

Figure 7: Two snapshots of 9-color CCA dynamics on a uniform spanning tree of a torus, at times about 3,000 and 17,000. By the former time, almost the entire square fixates, except for a single droplet that continues to grow slowly until it takes over the available space.

Counting discrete objects inside a continuous object is also useful in network and data analysis, especially in terms of clustering algorithms. In an ongoing project, we are developing a novel framework for analyzing hierarchical structures of large complex networks. An essential idea is that for a given network (e.g., a node and edge weighted graph), we construct its filtration with respect to a resolution parameter and then apply combinatorial or probabilistic functors to construct profiles of the network, which encode its structural information. Our approach streamlines and generalizes the method of persistent homology in topological data analysis, and has shown promising results and led us to new insights. This project will be accompanied with applications to real world data coming from social networks, epidemiology, and food web, for instance.

My strong desire of understanding structures of complex systems stems from and converges to understanding the brain. The cortex is a massively parallel complex system which is able to learn and predict *temporal* patterns. In a future research project, I will develop a cellular automata based model for the cortex. Main ideas were inspired by Jeff Hawkins and Numenta’s Hierarchical Temporal Memory. The cortex is represented as a stack of `layers’, where each layer is a rectangular slab of `cells’. The bottom layer is exposed to a stream of input data, and higher layers get input stream from layers right below. In each layer, each cell is connected to cells within certain radius. A proper update rule of the cells should be chosen so that (1) only about 2% of all columns are active at any given time, forming a sparse distributed representation of the input stream; and (2) each cell guesses the set of its active neighbors at next time iteration, and this prediction is compared with the actual active neighbors from the stream and the error must converge to zero. Each layer learns from the layer below in the same way, so the top layer represents most abstract and high-level modeling of the input stream.

A possible way to design such local update rule as well as various parameters is to adapt genetic algorithm. Namely, performance of each local rule may be evaluated by measuring how fast and correct the system learns a set of test functions of periodic data. A successful algorithm can easily be turned into a novel architecture for parallel chip design, which is scalable due to the cellular automaton construction. Such chip could be used to constantly learn and build models from sequential data, so it would be useful for industrial applications including anomaly detection, speech recognition, and market prediction. Lastly, this project could give us a better understanding of how we perceive, learn, and think. For instance, what would happen if the top layer could feed back on the bottom layer? Isn’t that how we make analogies from analogies and abstract from abstraction?