193.174.19.232Abstract: C. Witt (2015)

(2015)

Clustering Recurrence Plots

C. Witt

Dynamic systems are ubiquitous: we find them in astronomy, weather, the financial market and in the human body. Two key observations for investigating such systems are that “(1) similar situations often evolve in a similar way; (2) some situations occur over and over again.” [MCRTK07]. Using these properties of complex systems, we can for instance predict (sometimes correctly) that it is going to rain from cloud patterns or weather conditions we experienced before. Similarly, given a time series of observations of a system’s parameters, states – or situations – of the system can be reconstructed. By analysing patterns of recurrence, insights about the system can be gained.

Perhaps the most simple way to find recurrent states is to compare each state to each other state. States are usually described as numerical vectors and can be compared e.g. using euclidean distance. This results in a similarity matrix, to which usually a threshold Е is applied. A similarity matrix element having a dissimilarity Е is called a recurrence point. The visualization of the matrix is called a recurrence plot, as shown in the top right and the bottom of Figure 1. Typical structures formed by the recurrence points are diagonal and vertical lines. The former indicate that the system returns to a previous state and evolves in a similar manner again, the latter occur when the system stagnates. Recurrence quantification analysis (RQA) complements visual inspection with numerical measures that describe properties of the plot. For instance, the average length of the diagonal lines is a measure of how predictable a system is.

In real-world applications, there is often more than one signal present in the measure- ments of a time series. Wavelet analysis, more specifically wavelet packet decomposition, is one way to separate these signals. The process is sketched in Figure 1: The original time series is recursively decomposed into low and high frequency components, revealing different processes in it. When the resulting signals are still complex, advanced analysis tools like recurrence analysis are needed. The combination of both has been successfully applied to e.g. detect a type of heart disease [CY12]. It also makes recurrence analysis of long time series feasible, because analysing many short time series is far less expensive than analysing a very long time series. What is more, by separating signals, recurrence analysis becomes more informative. For example, in Figure 1, the recurrence plot of the original time series conveys little information. The recurrence plots of the separated signals reflect the underlying processes much better.

The downside is that this multiscale recurrence analysis produces many recurrence plots. Hundreds or thousands of plots might easily be generated, making visual inspection infeasible. To explore the different kinds of processes in the data, the proposed thesis suggests the use of clustering.

We are not aware of any previous work on clustering recurrence plots. There are, however, three works that propose distance measures for recurrence plots, namely [Bel11] and [SDSIB13, SSB14]. In [Bel11], recurrence plots are represented as bit strings by concatenating their rows. The bzip2 compression rate of the concatenation of the bit strings of two plots is interpreted as their distance. A major drawback of serializing the plots is that the spatial neighbourhood of the recurrence points is compromised. Silva et al. [SDSIB13] use the mpeg-1 video compression algorithm to solve that problem. This compressor accounts for the spatial nature of the data, but in contrast to [Bel11], Silva et al. and Souza et al. do not work with thresholded recurrence plots. This is not desirable, since in recurrence analysis, states are usually either similar or dissimilar. It is not clear whether the techniques applied in [SDSIB13, SSB14] work well on thresholded plots.

By choosing an appropriate representation for a recurrence plot, other similarity measures can be used, as summarized in Table 1. The measures have been selected with regard to two assumptions. First, the representation should preserve the spatial nature of the data, considering the superior performance of [SDSIB13] over [Bel11]. Second, robust similarity measures either require alignment or feature extraction, to compensate distortions (like shifting a plot by a small offset) in the data.

Image representations are closest to the related work and offer a lot of similarity measures. On the other hand, when working with thresholded recurrence plots, the depth of the image is reduced to only two levels, rendering e.g. standard histogram techniques useless. In the domain of dynamic systems analysis, RQA measures are natural quantifications of the recurrence plots image content which also work on thresholded recurrence plots. Forming vectors of RQA measures would be a straightforward approach

to reduce the problem to a classical feature vector clustering problem. A measure that fits the highly discrete nature of the data is inter-rater agreement. Variants of this measure can be used to compare more than two matrices, which could be interesting during the clustering step. Graph representations have the advantage of exploiting not only spatial neighbourhood but a transitive closure of that relationship. In [DSD+11] it has been shown that recurrence plots can be represented as graphs and the according network measures, like the clustering coefficient, convey useful information. Instead of comparing features of the graph, the entire graph can be aligned. This is done in contact map overlap, which is a similarity measure for two thresholded similarity matrices that originates in structural biology. Since this problem is NP-hard, solutions must be found heuristically, e.g. using linear programming [CCI+04].

Finally, if an equivalence between a time series similarity measure and a recurrence plot similarity measure can be shown, then it would be possible to reduce the problem to a time series clustering problem. It is possible to reconstruct a time series from a recurrence plot, which could help in making the time series comparable. Dynamic time warping is a popular choice and has been used for clustering before. However, the problem differs from time series clustering in that recurrence plots explicitly reveal patterns of recurrent behaviour that are not apparent from the raw time series.

There is a vast amount of literature covering clustering and clustering evaluation. Due to the generality of the topic, it is not reviewed here. This thesis will focus on partitional clustering, since it is simple, popular, and has been applied to a wide variety of scenarios. Hierarchical clustering, probably being the other most popular technique, has the downside that comparison of dendrograms is a difficult problem. For comparing partitions, there exist several tried and tested indices, like the adjusted Rand index [HA85] or the more recent variation of information [Mei07].



The goal of this thesis is to evaluate four similarity measures for recurrence plots from Table 1. An applied and empirical approach is proposed, centred around a synthetic data set that captures combinations of prototypical properties of recurrence plots in the analysis of dynamical systems domain. Typical systems in this domain include the Lorenz System, the Rössler System, autoregressive processes or noise.

Additionally, the performance of the similarity measures on real-world data sets shall be evaluated. The Bern-Barcelona EEG database [ASR12] contains two classes of time series, those recorded during an epileptic seizure and those recorded during normal brain activity. Andrzejak et al. have developed statistical tests based on recurrence properties that are correlated with the two classes. The outcomes of these tests might be taken as an additional structure on the data.

The UCR time series data set [KZH+11] can be used as another data set with known

categories. It is used in the two works closest related to the proposed thesis, [SDSIB13, SSB14]. Although these works focus on time series classification, comparing to their results could be a contribution to the current research in this area.

In a first step the chosen similarity measures sshall be compared to each other, based on these data sets. The following questions should be answered:

- Which similarity measures yield similar results?

- How do the similarity measures relate to RQA measures?

- How do the similarity measures behave when comparing noise with non-noise plots?

- How expensive are the different similarity measures? Can they be lower bounded?

Once a basic understanding of the different similarity measures has been developed, the interplay with the clustering algorithm shall be evaluated.

- Is there a clustering tendency for a given similarity measure at all?

- Which similarity measures are able to reveal known clustering structures?

- Do different similarity measures lead to different clusterings?

- How valid are the clusterings, according to internal, external and relative criteria?

The overall objective is to outline needs and solutions for recurrence plot clustering scenarios. More specifically, how different similarity measures capture domain relevant properties of recurrence plots.

back


Creative Commons License © 2024 SOME RIGHTS RESERVED
The content of this web site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Germany License.

Please note: The abstracts of the bibliography database may underly other copyrights.

Ihr Browser versucht gerade eine Seite aus dem sogenannten Internet auszudrucken. Das Internet ist ein weltweites Netzwerk von Computern, das den Menschen ganz neue Mglichkeiten der Kommunikation bietet.

Da Politiker im Regelfall von neuen Dingen nichts verstehen, halten wir es fr notwendig, sie davor zu schtzen. Dies ist im beidseitigen Interesse, da unntige Angstzustnde bei Ihnen verhindert werden, ebenso wie es uns vor profilierungs- und machtschtigen Politikern schtzt.

Sollten Sie der Meinung sein, dass Sie diese Internetseite dennoch sehen sollten, so knnen Sie jederzeit durch normalen Gebrauch eines Internetbrowsers darauf zugreifen. Dazu sind aber minimale Computerkenntnisse erforderlich. Sollten Sie diese nicht haben, vergessen Sie einfach dieses Internet und lassen uns in Ruhe.

Die Umgehung dieser Ausdrucksperre ist nach 95a UrhG verboten.

Mehr Informationen unter www.politiker-stopp.de.