Audio is provided in a single-channel 44.1kHz 24-bit format. The development dataset comprises 40 hours of data from device A, and smaller amounts from the other devices. The impulse responses are proprietary data and will not be published. A recording from device A is processed through convolution with the selected Si impulse response, then processed with a selected set of parameters for dynamic range compression (device-specific). We also present the baseline system consisting of a convolutional neural network and its performance in the subtasks using the recommended cross-validation setup.",Īdditionally, 11 mobile devices S1-S11 are simulated using the audio recorded with device A, impulse responses recorded with real devices, and additional dynamic range compression, in order to simulate realistic recordings. The newly recorded TUT Urban Acoustic Scenes 2018 dataset consists of ten different acoustic scenes and was recorded in six large European cities, therefore it has a higher acoustic variability than the previous datasets used for this task, and in addition to high-quality binaural recordings, it also includes data recorded with mobile devices. As in previous years of the challenge, the task is defined for classification of short audio samples into one of predefined acoustic scene classes, using a supervised, closed-set classification setup.
Keywords = "Acoustic scene classification, DCASE challenge, public datasets, multi-device data",Ībstract = "This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. Title = "A multi-device dataset for urban acoustic scene classification",īooktitle = "Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018)", KeywordsĪcoustic scene classification, DCASE challenge, public datasets, multi-device = "Mesaros, Annamaria and Heittola, Toni and Virtanen, Tuomas", We also present the baseline system consisting of a convolutional neural network and its performance in the subtasks using the recommended cross-validation setup. This paper introduces the acoustic scene classification task of DCASE 2018 Challenge and the TUT Urban Acoustic Scenes 2018 dataset provided for the task, and evaluates the performance of a baseline system in the task. The data collection received funding from the European Research Council, grant agreement 637422 EVERYSOUND.įor complete details on the data recording and processing seeĪ multi-device dataset for urban acoustic scene classification Abstract The dataset was collected by Tampere University of Technology between 05/2018 - 11/2018.
The main recording device consists in a Soundman OKM II Klassik/studio A3, electret binaural microphone and a Zoom F8 audio recorder using 48kHz sampling rate and 24-bit resolution, referred to as device A. Recordings were made using four devices that captured audio simultaneously. Of the 12 cities, two are present only in the evaluation set. Additionally, synthetic data for 11 mobile devices was created based on the original recordings. The dataset contains recordings from 12 European cities in 10 different acoustic scenes using 4 different devices. The dataset for this task is TAU Urban Acoustic Scenes 2020 Mobile. Figure 1: Overview of acoustic scene classification system.