Main Content

istftLayer

Inverse short-time Fourier transform layer

Since R2024a

    Description

    An ISTFT layer computes the inverse short-time Fourier transform of the input. Use of this layer requires Deep Learning Toolbox™.

    Creation

    Description

    layer = istftLayer creates an Inverse Short-Time Fourier Transform (ISTFT) layer. The input to istftLayer must be a real-valued dlarray (Deep Learning Toolbox) object in "CBT" or "SCBT" format.

    • For "CBT" inputs, the size of the channel ("C") dimension must be even and divisible by floor(FFTLength/2)+1.

    • For "SCBT" inputs, the size of the spatial ("S") dimension must equal floor(FFTLength/2)+1.

    The output of istftLayer is a real-valued array in "CBT" format.

    For more information, see Layer Input and Output Formats.

    example

    layer = istftLayer(Name=Value) creates an ISTFT layer with properties specified by one or more name-value arguments. You can specify the analysis window and the number of overlapped samples, among others.

    Properties

    expand all

    ISTFT

    This property is read-only.

    Windowing function used to compute the ISTFT, specified as a vector with two or more elements. For perfect reconstruction, use the same window as in stftLayer.

    For a list of available windows, see Windows.

    You can set this property when you create an istftLayer object. After you create an istftLayer object, this property is read-only.

    Note

    istftLayer initializes the weights internally so that Window is in single precision. Initializing the weights directly is not recommended.

    Example: hann(N+1) and (1-cos(2*pi*(0:N)'/N))/2 both specify a Hann window of length N + 1.

    Data Types: double | single

    This property is read-only.

    Number of overlapped samples, specified as a nonnegative integer smaller than the length of window. If you omit OverlapLength or specify it as empty, the object sets it to the largest integer less than 75% of the window length, which turns out to be 96 samples for the default Hann window. Equivalently, the stride between adjoining segments is 32 samples.

    You can set this property when you create an istftLayer object. After you create an istftLayer object, this property is read-only.

    Data Types: double | single

    This property is read-only.

    Number of discrete Fourier transform (DFT) points, specified as a positive integer greater than or equal to the window length. To achieve perfect time-domain reconstruction, set the number of DFT points to match that used in stftLayer.

    You can set this property when you create an istftLayer object. After you create an istftLayer object, this property is read-only.

    Data Types: double | single

    This property is read-only.

    Method of overlap-add, specified as one of these:

    • "wola" — Weighted overlap-add

    • "ola" — Overlap-add

    You can set this property when you create an istftLayer object. After you create an istftLayer object, this property is read-only.

    This property is read-only.

    Expected number of channels and samples output by istftLayer, specified as a two-element vector of positive integers. The first element is the expected number of channels, and the second element is the expected number of time samples.

    By default, istftLayer does not check the output size of the ISTFT. If you specify ExpectedOutputSize, istftLayer errors if the inverse short-time Fourier transform for the given inputs do not match ExpectedOutputSize in the number of channels and samples.

    You can set this property when you create an istftLayer object. After you create an istftLayer object, this property is read-only.

    Data Types: single | double

    Layer

    Multiplier for weight learning rate, specified as a nonnegative scalar. If you do not specify this property, it defaults to zero, resulting in weights that do not update with training. You can also set this property using the setLearnRateFactor (Deep Learning Toolbox) function.

    Data Types: double | single

    Layer name, specified as a character vector or a string scalar. For Layer array input, the trainnet (Deep Learning Toolbox) and dlnetwork (Deep Learning Toolbox) functions automatically assign names to layers with the name "".

    The istftLayer object stores this property as a character vector.

    Data Types: char | string

    This property is read-only.

    Number of inputs to the layer, returned as 1. This layer accepts a single input only.

    Data Types: double

    This property is read-only.

    Input names, returned as {'in'}. This layer accepts a single input only.

    Data Types: cell

    This property is read-only.

    Number of outputs from the layer, returned as 1. This layer has a single output only.

    Data Types: double

    This property is read-only.

    Output names, returned as {'out'}. This layer has a single output only.

    Data Types: cell

    Examples

    collapse all

    Create an inverse short-time Fourier transform layer. Specify a 64-sample Hamming window. Specify 63 overlapped samples between adjoining segments.

    layer = istftLayer(Window=hamming(64),OverlapLength=63)
    layer = 
      istftLayer with properties:
    
                         Name: ''
        WeightLearnRateFactor: 0
                       Window: [64x1 double]
                OverlapLength: 63
                    FFTLength: 64
                       Method: 'wola'
           ExpectedOutputSize: 'none'
    
       Learnable Parameters
                      Weights: [64x1 single]
    
       State Parameters
        No properties.
    
    Use properties method to see a list of all properties.
    
    

    Create an array of five layers, containing a sequence input layer, an STFT layer, an LSTM layer, an ISTFT layer, and a regression layer. There is one feature in the sequence input. Set the minimum signal length in the sequence input layer to 2048 samples. Use the default window of length 128 for both STFT and ISTFT layers.

    layers = [
        sequenceInputLayer(1,MinLength=2048)
        stftLayer(TransformMode="realimag")
        lstmLayer(130)
        fullyConnectedLayer(130)
        istftLayer];

    Create a random array containing a batch of 10 signals and 2048 samples. Save the signal as a dlarray in "CBT" format. Analyze the layers as a dlnetwork using the example network input.

    networkInput = dlarray(randn(1,10,2048,"single"),"CBT");
    analyzeNetwork(layers,networkInput,targetusage="dlnetwork")

    Create a deep learning network that demonstrates perfect reconstruction of the short-time Fourier transform (STFT) of a dlarray. To minimize edge effects, the network zero-pads the data before computing the STFT.

    Generate a 3-by-2000-by-5 array containing five batches of a three-channel sinusoidal signal sampled at 1 kHz for two seconds. Save the array as a dlarray, specifying the dimensions in order. dlarray permutes the array dimensions to the "CBT" shape expected by a deep learning network. Display the array dimension sizes.

    Fs = 1e3;
    nchan = 3;
    nbtch = 5;
    nsamp = 2000;
    t = (0:nsamp-1)/Fs;
    
    x = zeros(nchan,nsamp,nbtch);
    for k=1:nbtch
        x(:,:,k) = sin(k*pi.*(1:nchan)'*t)+cos(k*pi.*(1:nchan)'*t);
    end
    
    xd = dlarray(x,"CTB");

    Design a periodic Hann window of length 100 and set the number of overlap samples to 75. Check the window and overlap length for COLA compliance.

    nwin = 100;
    win = hann(nwin,"periodic");
    noverlap = 75;
    
    tf = iscola(win,noverlap)
    tf = logical
       1
    
    

    Create a STFT layer that uses the Hann window. Set the number of overlap samples to 75 and FFT length to 128. Set the layer transform mode to "realimag" to concatenate the real and imaginary parts of the layer output along the channel dimension. Create an ISTFT layer using the same FFT length, window, and overlap.

    fftlen = 128;
    
    ftl = stftLayer(Window=win,FFTLength=fftlen, ...
        OverlapLength=noverlap,TransformMode="realimag");
    
    iftl = istftLayer(Window=win,FFTLength=fftlen, ...
        OverlapLength=noverlap);

    Create a deep learning network appropriate for the data that demonstrates perfect reconstruction of the STFT. Use a function layer to zero-pad the data on both sides along the time dimension before computing the STFT. The length of the zero-pad is the window length. Use a function layer after the ISTFT layer to trim both sides of the ISTFT layer output by the same amount.

    layers = [
        sequenceInputLayer(nchan,MinLength=nsamp)
        functionLayer(@(X) paddata(X,nsamp+2*nwin,dimension=3,side="both"))
        ftl
        iftl
        functionLayer(@(X) trimdata(X,nsamp,dimension=3,side="both"))];
    dlnet = dlnetwork(layers);

    Analyze the network using the data. The number of channels of the STFT layer output is twice the layer input.

    analyzeNetwork(dlnet,xd)

    Run the data through the forward method of the network.

    dataout = forward(dlnet,xd);

    The output is a dlarray in "CBT" format. Convert the network output to a numeric array. Permute the dimensions so that each page is a batch.

    xrec = extractdata(dataout);
    xrec = permute(xrec,[1 3 2]);

    Choose a batch. Plot the original and reconstructed multichannel signal of that batch as a stacked plot.

    wb = 4;
    tiledlayout(2,1)
    nexttile
    stackedplot(x(:,:,wb)',DisplayLabels="Channel "+string(1:nchan))
    title("Batch "+num2str(wb)+": Original")
    nexttile
    stackedplot(xrec(:,:,wb)',DisplayLabels="Channel "+string(1:nchan))
    title("Batch "+num2str(wb)+": Reconstruction")

    Confirm perfect reconstruction of the data.

    max(abs(x(:)-xrec(:)))
    ans = single
        6.2170e-07
    

    More About

    expand all

    Algorithms

    The size of the ISTFT depends on the dimensions and data format of the input STFT, the length of the windowing function, the number of overlapped samples and the number of DFT points.

    Define the hop size as hopSize = length(Window)-OverlapLength. The number of samples in the ISTFT is length(Window)+(nseg-1)*hopSize, where nseg is the size of the input in the time ("T") dimension.

    • If the input to istftLayer is a "SCBT" formatted dlarray, the number of channels is szC/2, where szC is the size of the input in the channel ("C") dimension.

    • If the input to istftLayer is a "CBT" formatted dlarray, the number of channels is szC/(2*nfreq), where nfreq = floor(FFTLength/2)+1.

    Version History

    Introduced in R2024a

    See Also

    Apps

    Objects

    Functions