A *complex dynamical system* is a network of dynamic components in which neighboring ones interact with each other. The interaction between components is often nonlinear, from which fascinating large-scale behavior emerges that are not apparent directly from the local dynamics and interactions. For instance, complete knowledge of how individual neurons in brain operate does not directly tell us how we perceive and think; the microscale description of human genome means less when it comes to the gene’s macroscale effect (e.g., phenotype). Hence the study of complex systems usually requires a holistic approach based on the right choice of mathematical model which captures the essential feature of the system, as well as ingenious ideas for its rigorous analysis.

In a general mathematical framework, a complex system can be described by specifying three different kinds of data structure: the way the components are connected to each other (*topology*), states each component assume (*state space*), and the way the components evolve their states in time (*coupling*). A fundamental question in the study of complex systems is the following: *how do these three constituents influence the emergent global dynamics?* My approach to this general question is the “combinatorialist’s strategy”: We first study some concrete discrete models, learn some fundamental principles and techniques, and then try to generalize them into a broader context.

In a series of three papers, I followed this line of approach to study “synchronization of pulse-coupled oscillators”. In the first two papers, I proposed and studied the -color firefly cellular automata (FCA), a discrete model for pulse-coupled oscillators, and showed that arbitrary initial configuration synchronizes on finite trees iff . In the third paper, I generalized this discrete model into a continuous one and adapted a similar technique to show an analogous result. An auxiliary dynamics was added to surpass degree constraint, and by composing with a distributed algorithm which constructs a spanning tree of a given network, this led to a fast universal clock synchronization algorithm with minimal memory and communication requirement.

One infinite one-dimensional lattice, randomly initialized discrete models often lead to some interesting connections with familiar random objects, such as random walks and random Young diagrams. The key insight I have learned is that on this restricted topology, information propagation is conducted as a simple but possibly correlated particle system, which is the bridge to related the long-term dynamics with a random object associated to the initial environment. For example, the synchrony of -color FCA on boils down to studying persistence of random walks with dependent increments. The box-ball system, also known as soliton cellular automata invented by Takahashi and Satsuma, relates to birth-death chains, Galton-Watson forests, and excursions of Brownian motion.

For some nice discrete models such as the 3-color cyclic cellular automata (CCA) and Greenberg-Hastings model (GHM), lifting the dynamics on a given graph onto its universal covering space reveals some hidden monotonicity, which resembles Lamport’s concensus algorithm in distributed systems. This led to a complete characterization of limiting behavior of these models on arbitrary topology in terms of contour integrals of discrete vecter field induced from the initial environment. On infinite trees, this connects the behavior of the system to some notions of speeds of tree-indexed random walks. Another nice model is called the parking process, for which recursion on the first moment and mass-transport principle yielded a sharp phase transition with respect to the density of cars.

Still a lot of interesting questions concerning the above mentioned models are out of reach of current technologies: extending the clock synchronization algorithm for non-identical frequencies, autowave phenomenon of higher color FCA on lattices, mysterious clustering behavior of the 4-color FCA on , nucleation behavior of higher color CCA and GHM on infinite trees, higher dimensional analogue of soliton cellular automata, and fixation of parking process with coalescing or branching cars, just to name a few.

Counting discrete objects inside a continuous object is also useful in network and data analysis, especially in terms of clustering algorithms. In an ongoing project, we are developing a novel framework for analyzing hierarchical structures of large complex networks. An essential idea is that for a given network (e.g., a node and edge weighted graph), we construct its filtration with respect to a resolution parameter and then apply combinatorial or probabilistic functors to construct profiles of the network, which encode its structural information. Our approach streamlines and generalizes the method of persistent homology in topological data analysis, and has shown promising results and led us to new insights. This project will be accompanied with applications to real world data coming from social networks, epidemiology, and food web, for instance.

My strong desire of understanding structures of complex systems stems from and converges to understanding the brain. The cortex is a massively parallel complex system which is able to learn and predict *temporal* patterns. In a future research project, I will develop a cellular automata based model for the cortex. Main ideas were inspired by Jeff Hawkins and Numenta’s Hierarchical Temporal Memory. The cortex is represented as a stack of `layers’, where each layer is a rectangular slab of `cells’. The bottom layer is exposed to a stream of input data, and higher layers get input stream from layers right below. In each layer, each cell is connected to cells within certain radius. A proper update rule of the cells should be chosen so that (1) only about 2% of all columns are active at any given time, forming a sparse distributed representation of the input stream; and (2) each cell guesses the set of its active neighbors at next time iteration, and this prediction is compared with the actual active neighbors from the stream and the error must converge to zero. Each layer learns from the layer below in the same way, so the top layer represents most abstract and high-level modeling of the input stream.

A possible way to design such local update rule as well as various parameters is to adapt genetic algorithm. Namely, performance of each local rule may be evaluated by measuring how fast and correct the system learns a set of test functions of periodic data. A successful algorithm can easily be turned into a novel architecture for parallel chip design, which is scalable due to the cellular automaton construction. Such chip could be used to constantly learn and build models from sequential data, so it would be useful for industrial applications including anomaly detection, speech recognition, and market prediction. Lastly, this project could give us a better understanding of how we perceive, learn, and think. For instance, what would happen if the top layer could feed back on the bottom layer? Isn’t that how we make analogies from analogies and abstract from abstraction?