## Permutation Entropy

Permutation entropy is a measure of complexity developed for application to time series data. I wasn’t familiar with this measure until Yoichi Ichikawa (peeq / nankotsuteacher, one of the Tokinogake crew) mentioned it on Twitter, noting that it could be applied to acoustics and therefore might be useful for making music. The tweet linked to a 2017 article written in Japanese, which I can’t read, so I found a copy of the article that first proposed the measure: Bandt & Pompe (2002) Permutation Entropy: A Natural Complexity Measure for Time Series.

I said I’d see if I could implement permutation entropy in Mathematica. It was a struggle to work it out from the original article, but this blog lays out the steps involved, and using that as reference I managed to write some code that does the job.

Here’s how the permutation entropy (PE) measure works. PE is an average of the frequencies of order patterns in a one-dimensional series of data, based on comparisons of neighbouring values. This is done by breaking down the data series into smaller chunks, working out the ordinal rank of data in each chunk, identifying which of the possible order permutations it matches, then summing the numbers of permutation patterns. The size of chunks, specified in terms of the number of data points they contain, is called the embedding dimension, (n). The distance between the starting point of each chunk, also specified in terms of the number of data points, is called the embedding time delay (d).

In Mathematica I used the same example provided by Bandt & Pompe and the blog that explains it, with parameters n = 3, d = 1, and the same list of data: (4, 7, 9, 10, 6, 11, 3). That way, I could check to see if the code was working. Below, input code is represented in bold text and output in plain text.

First assign the list:

``list = {4, 7, 9, 10, 6, 11, 3}``

Break the list into chunks of size n = 3 with delay d = 1. Because d < n, the chunks overlap:

``````With[{n = 3, d = 1}, Partition[list, n, d, {1, n}]

{{4, 7, 9}, {7, 9, 10}, {9, 10, 6}, {10, 6, 11}, {6, 11, 3}}``````

Calculate the ordinal rank of numbers in each chunk (the example in the blog counts from 0, whereas Mathematica ordering counts from 1, but it doesn’t make any difference as long as you identify the permutation patterns consistently). In Mathematica, the ‘Ordering’ function calculates the position of sorted list in list, but what we want is to calculate the position of list in sorted list. This can be achieved by applying the Ordering function twice (I am grateful to Yoichi Ichikawa (peeq / nankotsuteacher) for spotting the error in this calculation, which allowed me to correct it):

``````With[{n = 3, d = 1}, Map[Ordering[Ordering[#]]&, Partition[list, n, d, {1, n}]]

{{1, 2, 3}, {1, 2, 3}, {2, 3, 1}, {2, 1, 3}, {2, 3, 1}}``````

With n = 3, there are the 6 possible order permutation patterns:

``````Permutations[Range]

{{1, 2, 3}, {1, 3, 2}, {2, 1, 3}, {2, 3, 1}, {3, 1, 2}, {3, 2, 1}}``````

Count the occurrence of each permutation pattern:

``````With[{n = 3, d = 1},
Map[Count[
Map[Ordering[Ordering[#]]&, Partition[list, n, d, {1, n}]], #]&,
Permutations[Range[n]]]]

{2, 0, 1, 2, 0, 0}``````

Then divide these numbers by the number of chunks, which equals the number of elements in the list minus d × (n – 1) = 5:

``````With[{n = 3, d = 1},
Map[Count[Map[Ordering[Ordering[#]]&, Partition[list, n, 1, {1, n}]], #]&,
Permutations[Range[n]]]/(Length[list] - (n - 1) d)]

{2/5, 0, 1/5, 2/5, 0, 0}``````

Finally, PE is the sum of the logarithm (base 2) of those numbers (the probabilities of the distribution of permutation patterns), calculated with the following equation (where p(π) = the probability of each pattern):

H(n) = –Σ p(π) log p(π)

``````With[{n = 3, d = 1},
Total[Map[# *Log[2, (#)]&,
DeleteCases[
Map[Count[
Map[Ordering[Ordering[#]]&,
Partition[list, n, d, {1, n}]], #]&,
Permutations[Range[n]]]/(Length[list] - (n - 1) d),
0]]*-1]] // N

1.52193``````

That equation gives a value between 0 (for a uniformly increasing or decreasing list) and log2n! (for random lists), which for this example = 2.58. The measure can also be normalised to give a value between 0 and 1, by multiplying the result by 1/log2(n!):

``````With[{n = 3, d = 1},
Total[Map[# *Log[2, (#)]&,
DeleteCases[
Map[Count[
Map[Ordering[Ordering[#]]&,
Partition[list, n, d, {1, n}]], #]&,
Permutations[Range[n]]]/(Length[list] - (n - 1) d),
0]]*-1] * (1/Log[2, n!])] // N

0.588762``````

From this code, a function can be defined whose input is a list and the values for n and d:

``````permutationEntropyNormalised[list_, n_,
d_] := (1/Log[2, n!])*
Total[Map[# *Log[2, (#)]&,
DeleteCases[
Map[Count[
Map[Ordering[Ordering[#]] - 1&,
Partition[list, n, d, {1, n}]], #]&,
Permutations[Range[n] - 1]]/(Length[list] - (n - 1) d),
0]]*-1] // N``````

Which allows us to shorten the code to a simpler expression:

``````permutationEntropyNormalised[list, 3, 1]

0.588762``````

The result of this measure is a single number. Reducing a bunch of data down to a number may be useful for measurement, but less so for generative musical purposes. For actual time series data, which is much longer than the example above, PE can be calculated on a sliding window across the dataset, with the result being a series comprising a PE measure for each window position, which aligns with the original data (actually, it’s always a bit shorter – it starts from a point in the data equal to half the window size, and it ends the same distance before the last value). This is more useful for creative purposes. For example, a recording of an instrument could be analysed, and the PE measure output could be used to control video graphics in time with the original. Creating a sliding window of data can be achieved the same way that the data is split into chunks (but the window size must be greater than the chunk size n). This function, which incorporates the previous function for normalised PE, calculates PE using a sliding window:

``````peWindowAnalysis[list_, n_, d_, window_, delay_] :=
Map[permutationEntropyNormalised[#, n, d]&,
Partition[list, window, delay, {1, window}]]``````

I tried this with some short audio files made of test tones. The first was a 0.5 second 20 Hz sine wave at -6dBFS, whose waveform looks like this:

Using a sliding window size of 128 samples, this is the PE output:

The highest PE values (≈ 0.4) appear to coincide with the waveform peaks, where the audio data is more varied in terms of permutation patterns, in contrast with the up or down slopes of the waveform which comprise fully ordered data. Maybe this kind of data is not really the sort of thing that PE was designed to measure, but it’s useful to do these tests to understand how it works.

Then I tried a 0.5 second sine wave linear sweep from 20Hz to 20kHz, which produced the following:

This is quite interesting. The PE value goes up to 1 at around 15kHz, with a sharp drop down to 0.6 near the highest point. It’s surprising that this simple sound produces PE values up to the maximum, which should apply only to the most disordered data. It’s also odd how the PE measure varies with the frequency of the sine wave, increasing and then decreasing. I don’t know why.

Here’s the result for a logarithmic sweep (increasing slowly at the start, faster towards the end):

Compared with the linear sweep, the result here is the same but the left side of the chart is stretched and the right side is squeezed. The left part, where the frequency starts at 20Hz and slowly increases, looks similar to the first result with the steady 20Hz tone. But that regularity begins to break down above a certain frequency, then PE increases as the frequency increases.

## Conclusion

There’s probably a more elegant/efficient way to do program this algorithm, but I lack the mathematical and programming skills to achieve that. This one works well enough, though. Computer memory is the main limiting factor, due to the large number of permutations when n increases. My PC can cope with n values up to 9 before running out of memory. Bandt & Pompe recommend a value from 3 to 7. But different n values produce different results – in general, as n increases, PE decreases. For example, here’s the PE measures for a list of 100 random numbers between 0 and 10 when n ranges from 3 to 8:

``````With[{list = RandomInteger[10, 100]},
Table[permutationEntropyNormalised[list, n, 1], {n, 3, 8}]]

{0.979155, 0.94358, 0.845143, 0.675794, 0.531198, 0.426012}``````

In this example n = 3 gives the highest value and therefore the most accurate result, but intuitively it seems that larger n values should give better results. I’ll run more tests to see the effect of different n values and different window sizes, and I’ll report the results here if there’s anything interesting. It will also be interesting to see how it fares with longer and more complex audio, including music.

PE gives higher values for more random data, just as Shannon’s measure of information entropy does. Both differ from our intuitive sense of what ‘complexity’ means because they actually measure disorder (but that’s a whole separate subject, covered in my PhD thesis). There is one way in which PE and Shannon entropy differ greatly, however: A list of ascending numbers – such as the range of integers from 1 to 100 – has a minimal PE value of 0, but also has a maximum Shannon entropy (because the data cannot be compressed). In contrast, a list of identical numbers – such as ‘1’ repeated 100 times – has a minimal value for both PE and Shannon entropy. Nevertheless, this measure of complexity “has been applied to optical experiments, brain data, river flow data, control of rotating machines, and other problems” (Bandt, 2016). So it might also be interesting for applying to audio analysis, and maybe potentially useful for music creation.

This entry was posted in analysis, Audio, Complexity and tagged , , , , , . Bookmark the permalink.

### 2 Responses to Permutation Entropy

1. Udo Matthias says:

love it!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.