开源软件名称(OpenSource Name):Soumitro-Chakrabarty/Single-speaker-localization开源软件地址(OpenSource Url):https://github.com/Soumitro-Chakrabarty/Single-speaker-localization开源编程语言(OpenSource Language):MATLAB 50.1%开源软件介绍(OpenSource Introduction):Single-speaker-localization with CNNsThe repository provides trained models that are related to the CNN based single source localization method presented in the paper Title: Broadband DOA estimation with convolutional neural networks trained using noise signals However, there are a few differences from the acoustic and array geometry setup described in the paper. Some of the main differences that should be kept in mind before trying to run the code is as follows:
A small test dataset, with the features (phase maps) and targets, created by convolving a 13 s long speech signal with Measured RIRs from the Bar-Ilan Multi-Channel Impulse Response Database for 9 different angles from the 4 middle microphones in the [8,8,8,8,8,8,8] ULA setup is included (DOA_test.hdf5), as well as the output .mat file (DOA_test_OP.mat). Please note that the angle convention in the Bar-Ilan dataset is different to ours. To account for that, the original ground truth angles from the dataset were translated to our convention. The below figure shows the Bar-Ilan convention, as given in their example code. In brackets are the corresponding angles from our convention. All angles are in degrees.
Running the code would generate an output file called DOA_OP.mat and it should be the same as DOA_test_OP.mat. In addition a MATLAB script to visualize the output is also provided. The acoustic setup for the provided test data is as follows:
UsageThe python dependencies can be installed by using the requirements file
You can now run the script
Training data generation - Pseudo codeGenerate RIRs This pseudo-code explains the generation of RIRs for the different acoustic conditions. For the specific acoustic parameters used in this work, please refer to Table 1. Select R rooms of different sizes
for nb_room in range(1,R)
Randomly select P array positions
Choose D source-array distances
for nb_pos in range(1,P)
for nb_dist in range(1,D)
Generate RIRs corresponding to each of the 37 discrete DOAs and M microphones
Store the NR = R*P*D RIRs
NOTE: Each RIR file corresponds to a specific acoustic setup and contains 37 x M source-mic RIRs for each DOA and microphone in the array In the referenced paper:
Training data - Features and Target generation for nb_rir in range(1,NR)
for nb_ang in range(1,37)
sig_anechoic = 2 s long white Gaussian noise # each iteration a different variance was used
sig_spatial = sig_anechoic convolved with the M RIRs
sig_noisy = sig_spatial + noise ## noise = spatially uncorrelated white noise with a randomly chosen SNR in the range of [0,20]dB
sig_STFT = STFT(sig_noisy) ## size M (mics) x K (frequency bins) x N (time frames)
phase_component = angle(sig_STFT)
for nb_frame in range(1,N)
phase_map(nb_frame) = phase_component(:,:,nb_frame) # matrix of size M x K taken from phase_component
target(nb_frame) = one-hot encoded vector of size 37 x 1 with the true DOA label as 1, rest 0s
# Training pairs
X_train = phase_map tensor of size M x K x 1 x (N*NR*37) # resizing done for input to Conv2D in Keras
Y_train = target matrix of size 37 x (N*NR*37)
NOTE: Since the SNRs for each nb_ang and nb_rir is randomly chosen, the whole procedure was repeated
several times to have a balanced dataset in order to avoid a specific SNR bias.
The size of the training data was influenced by the memory constraints. CitationIf you find the provided model useful in your research, please cite:
|
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论