Smartphone Audio based Distress Detection

Anil Sharma, Sarthak Ahuja, Mayank Gautam, Sanjit Kaul
Indrapratha Institute of Information Technology, Delhi, India
This work was partially funded by DST-SERB, government of India, Grant: SB/S3/EECE/019/2013.
IIIT-Delhi IIIT-Delhi

Motivated by the possibility of 24x7 monitoring of a human using his smartphone, we investigate detection of screaming and crying in urban environments, which we categorize into the context of indoors (home and office), outdoors, human conversations, large human gathering, machinery, and audio from multimedia devices. On this webpage, we host our scripts and database to make it convenient for other authors to make a fair comparison and to reproduce the results of the paper "Two-Stage Supervised Learning-Based Method to Detect Screams and Cries in Urban Environments" accepted for publication in IEEE/ACM transactions on Audio, Speech and Language Processing. In this paper, we proposed a technique for distress detection in presence of environmental context. The stage2 of the proposed technique helps to reduce false alarms (~60% in the paper) due to the presence of background context. Refer to the paper for more details.

Here we host our database "IUEC (IIITD Urban Environment Context) database" which contains distress (scream and cry sounds) and sounds of 6 environmental contexts collected from mobile phones, movies and Internet. IUEC database contains raw audios, normalization and other necessary operations has to be carried out before extracting features. This database can be used by authors who want to evaluate their algorithm in presence of environmental context sounds especially for speech and sound event recognition. Details regarding recording setup and kinds of sounds can be found in the paper.

The IUECDB contains audios of distress and 6 environmental contexts (approx. 20 hours of length). It also contains audios that are more than 250 hours in total for all volunteers which portrays the daily routine of 17 volunteers. Volunteer data does not contain any distress sounds and is UNlabeled for environmental context. Our paper(#) uses former set to create training and evaluation sets and later set as the test set (first 10 volunteers were used for evaluation). Interested authors can refer to our paper for more details.

Sample audios from each of the set are given below:

1. Context set:

Distress context     scream1.wav scream2.wav cry1.wav cry2.wav
Conversations     sample1.wav sample2.wav sample3.wav
Human Gathering sounds     talk_in_metro.wav people_cheering.wav people_talking.wav
Indoor sounds     kitchen.wav room_conversation.wav inside_room.wav
Outdoor sounds     metro.wav road_horn.wav market.wav
Machinery sounds     electric_shaver.wav exhaust.wav microwave.wav
Multimedia sounds     comedy.wav cricket_match.wav indian_song.wav

2. Volunteer set:

sample1_s20.wav  sample2_s20.wav  sample1_s3_day1.wav  sample2_s3_day1.wav  sample1_s3_day4.wav

Full database: Link will be given only after signing this eula.

In case of any direct or indirect use of the above database the licensee must cite the following paper (available here):

Sharma, A.; Kaul, S., "Two-Stage Supervised Learning-Based Method to Detect Screams and Cries in Urban Environments," in Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.PP, no.99, pp.1-1 doi: 10.1109/TASLP.2015.2506264
keywords: {Context;Data collection;Mel frequency cepstral coefficient;Signal to noise ratio;Smart phones;Speech;Urban areas},

Scripts to reproduce our results:


Matlab application for real time distress detection:

This page is maintained by Anil Sharma (