Skip to content

Latest commit

 

History

History
151 lines (82 loc) · 180 KB

File metadata and controls

151 lines (82 loc) · 180 KB

← Back to Index

Energy and RMSE

The energy ([Wikipedia](https://en.wikipedia.org/wiki/Energy_(signal_processing%29); FMP, p. 66) of a signal corresponds to the total magntiude of the signal. For audio signals, that roughly corresponds to how loud the signal is. The energy in a signal is defined as

$$ \sum_n \left| x(n) \right|^2 $$

The root-mean-square energy (RMSE) in a signal is defined as

$$ \sqrt{ \frac{1}{N} \sum_n \left| x(n) \right|^2 } $$

Let's load a signal:

In [3]:

x, sr = librosa.load('audio/simple_loop.wav')

In [4]:

sr

Out[4]:

22050

In [5]:

x.shape

Out[5]:

(49613,)

In [6]:

librosa.get_duration(x, sr)

Out[6]:

2.2500226757369615

Listen to the signal:

Out[7]:

Your browser does not support the audio element.

Plot the signal:

Out[8]:

<matplotlib.collections.PolyCollection at 0x10cd21cc0>

Compute the short-time energy using a list comprehension:

In [9]:

hop_length = 256
frame_length = 512

In [10]:

energy = numpy.array([
    sum(abs(x[i:i+frame_length]**2))
    for i in range(0, len(x), hop_length)
])

In [11]:

energy.shape

Out[11]:

(194,)

Compute the RMSE using librosa.feature.rmse:

In [12]:

rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length, center=True)

In [13]:

rmse.shape

Out[13]:

(1, 194)

In [14]:

rmse = rmse[0]

Plot both the energy and RMSE along with the waveform:

In [15]:

frames = range(len(energy))
t = librosa.frames_to_time(frames, sr=sr, hop_length=hop_length)

Out[16]:

<matplotlib.legend.Legend at 0x10cd54cc0>

Questions

Write a function, strip, that removes leading silence from a signal. Make sure it works for a variety of signals recorded in different environments and with different signal-to-noise ratios (SNR).

In [17]:

def strip(x, frame_length, hop_length):

    # Compute RMSE.
    rmse = librosa.feature.rmse(x, frame_length=frame_length, hop_length=hop_length, center=True)

    # Identify the first frame index where RMSE exceeds a threshold.
    thresh = 0.01
    frame_index = 0
    while rmse[0][frame_index] < thresh:
        frame_index += 1

    # Convert units of frames to samples.
    start_sample_index = librosa.frames_to_samples(frame_index, hop_length=hop_length)

    # Return the trimmed signal.
    return x[start_sample_index:]

Let's see if it works.

In [18]:

y = strip(x, frame_length, hop_length)

Out[19]:

Your browser does not support the audio element.

Out[20]:

<matplotlib.collections.PolyCollection at 0x10ce20128>

It worked!

← Back to Index