Approximate Entropy and Sample Entropy – A Beginner’s Guide

Approximate Entropy and Sample Entropy - A Beginner's Guide

Audience: Foundational knowledge in mathematics required. Technicalities will be avoided.

Purpose: To provide a basic intuition of sample and approximate entropy through a breakdown of their respective calculations

Say you have two signals:

Signal 1:

File:PerlinNoise1d.gif — Stevo-88, Public domain, via Wikimedia Commons

and Signal 2:

Milesjpool,, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

What is the difference between these two signals?

Some traditional metrics of distinguishing between signals, would tell you that the signals are the same. For example, the mean of both signals will be approximately 0. Additionally, the variance, which quantifies the spread of values in a dataset from the mean, will be approximately ½ for both signals.

But obviously the two signals are not the same. So how do we quantify the very obvious difference between these two signals? Well, that’s exactly the topic of today’s article – buckle up for a truly whimsical and interesting mathematical journey!

__________________________________________________________________________________________________

Visually, we can tell the signals are different. We just need to find a way to quantify it.

First of all, what if I asked you to tell me how the two signals differ?

Probably, you would tell me that the first signal is more regular than the second, which is a little more chaotic-looking. You could probably reasonably predict the next datapoint for the first signal ; while in the second, you might be a little more uncertain.

You have, in essence, described entropy as defined in information theory. We know that information and uncertainty are linked concepts; An increase in information about a variable results in a decrease in uncertainty. Thus, entropy in information, first defined in Claude Shannon’s seminal 1948 paper is the amount of information or uncertainty associated with a variable. Entropy in information theory has wide-ranging applications from computer science, to financial risk management. It is most widely used, however, in physiological signal processing, since physiological data, such as ECG and EEG usually demonstrate nonlinear trends. Note that a higher entropy means more chaos and unpredictability, while a lower one means more repeatability.

There are many versions of entropy calculations, but the most used types, approximate entropy (ApEn) and a modified version of ApEn, which is Sample Entropy (SampEn) are what we will focus on today. If those terms seem daunting, don’t worry, we’ll break them down step by step. We’ll start with explaining Approximate Entropy, then mention the limitations behind it, and then talk about how Sample Entropy, a new-and-improved version of Approximate Entropy, addresses it,

To understand Approximate Entropy, we have to understand that a repeatable signal, contains some part of it (which we will call the template) that repeats again and again in the signal (any sinusoidal signal, for example, will have an exact repeat at least every 2 Π radians). Going back to the first signal, you can see that the same initial pattern is repeated again and again and again, while this is less so in the second signal.

That is the idea behind Approximate Entropy, to find a template (a part of the signal), and to see how often that template repeats in the signal.

Take a look at the signal below. For simplicity sake let’s say the template (in orange) is 2 points long. This variable, the template length, is called m)

Template Illustration © 2025 by Raza Hyderi is licensed under CC BY-NC-SA 4.0

We in essence want to compare this template of 2 points to every other possible template of 2 points and see how many times and how closely this element repeats itself. How do we do this?

The Sliding Window Approach

What we do is slide this template across the signal and compare it with all other sets of two points. In practice, this is done in a table, but since we’re trying to build an understanding of the concept we’ll stick to the graph. This idea, usually referred to as the ‘sliding window embedding dimension’, was first introduced by Florins Takens in the 1960s (link to a more mathematically rigorous explanation for those who are interested). This is visualized by the sliding box in the image below.

Sliding Window Diagram © 2025 by Raza Hyderi is licensed under CC BY-NC-SA 4.0

Two things to note:

Firstly, There is a box around the first two points? This means we are comparing the first template with itself. This is important. Remember it for later.

Secondly, we can see that for the data that we have, the template repeats is found to repeat with another potential template once, indicated by the red box.

Now, someone might note that the template and the match aren’t exactly the same! In approximate entropy, we match templates to other potential templates that are arbitrarily similar. Arbitrarily as defined by within some tolerance factor, which is usually called r, and is a factor of the standard deviation (usually 0.2 times the standard deviation).

Next, we conduct a distance calculation. What this means is that since we have the position of each set of m (here, 2) points (sample table shown below), we can find the distance between them using the formula, known as the Euclidean distance formula in two dimensions, which is used to measure the straight line distance between two points).

Note that we can also use something called the Chebyshev distance, which measures the maximum distance between two points. In this example, for the sake of simplicity, we’ll stick with the Euclidean distance.

We can then create a table with each set’s distance from the other. This is known as a distance matrix. See below:

Matrix Reshaping — © 2025 by Raza Hyderi is licensed under CC BY-NC-SA 4.0

Note that since the matrix is symmetrical along the diagonal of 0s, we only need to worry about the top or bottom diagonal of the matrix.

From this, we can reshape the matrix into a vector (row-flattening reshaping). This means that we will store the values from the distance matrix above into a 1-Dimensional matrix (vector), by inputting values from the top row downwards. Computationally it is far easier to use a vector instead of a matrix.

The above matrix will be reshaped into:

[23,11,22,13,12,10,7,25,17,18,10,26,4,14,16]

Now we have a vector of distance matrix between the template and every other potential template of m points.

You might think we stop here, but we now have to repeat the process all over again for a template of size m + 1 (3, in our case).

Why? Think back to our perfectly repeatable signal, sin (x). Would it matter if the template size was any specific value? No, because it would be perfectly repeatable for all template lengths. We therefore test the rigor of the signal’s repeatability by conducting a similar sliding window test for another template size, m+1.

Once we have repeated the process and have the vector for template size m + 1 points, we now have two 1-Dimensional distance vectors.

Remember the tolerance variable, r, that determines if the matches are arbitrarily similar? We now compare each distance in each vector with it.

This is a logical comparison, meaning there can only be two results, greater than, or less than; the amount by which it is greater or less than does not matter. Since we are conducting a logical comparison, the only two outputs are 1 or 0. If the distance between two vectors is ≤ r, the comparison returns 1 (true). If the distance is > r, the comparison returns 0 (false).

The comparison between r and m+1 will be A, while the comparison between r and m will be B.

We can then use the formula:

The subscript i represents each term in the A and B vectors. For each term, Ai and Bi, do

-log (1 + Ai) / (1 + Bi), and then add up all the values for all values of i .

Congratulations! You have learnt to compute Approximate Entropy!

A Significant Flaw

Now you might remember at the very start of the article when we were talking about window sliding, we compared the template to itself, which seems to be an odd thing to do. After all, the self-match will always be 100%, so what’s the point?

Consider the case of an extremely chaotic signal, with a template having no arbitrarily similar matches. Since all the logical comparisons would output 0, both A and B would be 0, meaning our equation, -log (1 + A_i) / (1 + B_i), would become -log(1/1) = 1

In this case, you end up with -log (1/1) which is an entropy of 0, which is a perfectly predictable signal! But our signal had no matches, so it is obviously not perfectly predictable. The algorithm must be modified.

With self-matching, you always have at least one match (with the template) meaning you avoid this problem.

Additionally, with a short time series (small dataset) or a small number of template matches, the probability of finding similar patterns (within tolerance r) becomes lower. This means that if we have a small dataset, regardless of whether the dataset is ordered or not, the entropy is biased towards 0, as compared to a larger dataset. This inflates the correlation between data points, leading to a smaller entropy value, making the system seem more regular than it really is

This is one of the fundamental flaws in Approximate Entropy: it is data length dependent; that is, you need a larger dataset to use it appropriately.

Well, how do we solve it?

Because of this, Sample Entropy (SampEn) was proposed as a modification. The formula for SampEn is

The formula for SampEn is quite similar to ApEn regarding the utilization of the same parameters A and B. Nevertheless, the key difference lies in dealing with these parameters: in SampEn, the values of A and B are summed prior to applying the logarithm function, while in ApEn, the logarithm is applied to each parameter separately.

This correction is crucial in overcoming the 0-bias of ApEn. By excluding self-matches and making the probability calculation less dependent on data length, SampEn overcomes the cumulative effect inherent in ApEn. This is because repeating the calculations iteratively, as is the case with ApEn, leads to systematic errors being accumulated and bias. SampEn overcomes this problem by first taking the sum of A_i and B_ii prior to the application of the logarithmic function, thus avoiding cumulative bias and allowing for more consistent estimation of entropy. This modification allows SampEn to more closely reflect the true complexity of the signal, rather than being influenced by the size of the dataset, thus making it a more robust measure of entropy with respect to increasing data length.

And that’s it for today! Congratulations, you just learned the basics of ApEn and SampEn, two of the most important and widely used nonlinear statistics in physiological signal processing!

If you have any comments, clarifications to note, mistakes to point out, or questions, feel free to contact me or drop a comment below!

Additional Reading

If you’re interested in diving deeper into Approximate Entropy, Sample Entropy, and their applications, here are some resources to explore:

A more thorough explanation on computing Sample and Approximate Entropy

An extremely good book on the various nonlinear methods in time series processing