Rajiv Ratn SHAH

Assistant Professor,

Department of Computer Science and Engineering (joint appointment with the Department of Human-centered Design)

Indraprastha Institute of Information Technology, Delhi (IIIT-Delhi).

Rajiv Ratn Shah currently works as an Assistant Professor in the Department of Computer Science and Engineering (joint appointment with the Department of Human-centered Design) at IIIT-Delhi. He received his Ph.D. in computer science from the National University of Singapore, Singapore. Before joining IIIT-Delhi, he worked as a Research Fellow in Living Analytics Research Center (LARC) at the Singapore Management University, Singapore. Prior to completing his Ph.D., he received his M.Tech. and M.C.A. degrees in Computer Applications from the Delhi Technological University, Delhi and Jawaharlal Nehru University, Delhi, respectively. He has also received his B.Sc. in Mathematics (Honors) from the Banaras Hindu University, Varanasi. Dr. Shah is the recipient of several awards, including the prestigious Heidelberg Laureate Forum (HLF) and European Research Consortium for Informatics and Mathematics (ERCIM) fellowships. He has also received the best paper award in the IWGS workshop at the ACM SIGSPATIAL conference 2016, San Francisco, USA and was runner-up in the Grand Challenge competition of ACM International Conference on Multimedia 2015, Brisbane, Austraila. He is involved in organizing and reviewing of many top-tier international conferences and journals. Recently, he has organized a workshop on Multimodal Representation, Retrieval, and Analysis of Multimedia Content (MR2AMC) in the conjunction of the first IEEE MIPR 2018 conference. His research interests include multimedia content processing, natural language processing, image processing, multimodal computing, data science, social media computing, and the internet of things.

Specifically, I am looking for motivated students and interns in the following areas at IIIT-Delhi.:

  • Multimodal deep learning based healthcare solutions
  • Multimodal fake news detection using deep learning techniques
  • Multimodal semantic and sentiment analysis of user-generated social media content
  • Event detection and recommendation on social media
  • Multimodal multimedia search, retrieval, and recommendation
  • Deep learning based multimedia systems, etc.

Recent Publications

Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann, "Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings," In the proceedings of NAACL, (Accepted). New Orleans, Louisiana, USA, 2018.

Hitkul Jangid, Shivangi Singhal, Rajiv Ratn Shah, Roger Zimmermann, "Aspect-Based Financial Sentiment Analysis using Deep Learning," In the proceedings of WWW Conference, (pages 1961-1966). Perth, Australia, 2018.

Yifang Yin, Rajiv Ratn Shah, Guanfeng Wang, Roger Zimmermann, "Feature-based Map Matching for Low-Sampling-Rate GPS Trajectories," In the proceedings of ACM Transactions on Spatial Algorithms and Systems, (Accepted). 2018.

Debanjan Mahata, Jasper Friedrichs, Rajiv Ratn Shah, Jing Jiang, "Did you take the #pill? - Detecting Personal Intake of Medicine from Twitter," In the proceedings of IEEE Intelligent Systems on Affective Computing and Sentiment Analysis, (Accepted). 2018.

Mayank Meghawat, Satyendra Yadav, Debanjan Mahata, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann, "A Multimodal Approach to Predict Social Media Popularity," In the proceedings of MR2ACM in IEEE MIPR, (In press). Miami, Florida, USA, 2018.

See the List of All Publications


Deep Learning based Healthcare Systems

Mining social media messages such as tweets, blogs, and Facebook posts for health and drug related information has received significant interest in pharmacovigilance research. Social media sites (e.g., Twitter), have been used for monitoring drug abuse, adverse reactions of drug usage and analyzing expression of sentiments related to drugs. Most of these studies are based on aggregated results from a large population rather than specific sets of individuals. In order to conduct studies at an individual level or specific cohorts, identifying posts mentioning intake of medicine by the user is necessary. Towards this objective we develop a classifier for identifying mentions of personal intake of medicine in tweets. We train a stacked ensemble of shallow convolutional neural network (CNN) models on an annotated dataset. We use random search for tuning the hyper-parameters of the CNN models and present an ensemble of best models for the prediction task. Our system produces state-of-the-art result, with a micro-averaged F-score of 0.693. We believe that the developed classifier has direct uses in the areas of psychology, health informatics, pharmacovigilance and affective computing for tracking moods, emotions and sentiments of patients expressing intake of medicine in social media.

  1. Debanjan Mahata, Jasper Friedrichs, Rajiv Ratn Shah, Jing Jiang, "Did you take the #pill? - Detecting Personal Intake of Medicine from Twitter," In the proceedings of IEEE Intelligent Systems on Affective Computing and Sentiment Analysis, (Accepted). 2018.

Sentiment Analysis using Deep Learning

Aspect based sentiment analysis aims to detect an aspect (i.e. features) in a given text and then perform sentiment analysis of the text with respect to that aspect. This paper aims to give a solution for the FiQA 2018 challenge subtask 1. We perform aspect-based sentiment analysis on the microblogs and headlines of financial domain. We use a multi-channel convolutional neural network for sentiment analysis and a recurrent neural network with bidirectional long short-term memory units to extract aspect from a given headline or microblog. Our proposed model produces a weighted average F1 score of 0.69 for the aspect extraction task and predicts sentiment intensity scores with a mean squared error of 0.112 on 10-fold cross validation. We believe that the developed system has direct applications in the financial domain.

  1. Hitkul Jangid, Shivangi Singhal, Rajiv Ratn Shah, Roger Zimmermann, "Aspect-Based Financial Sentiment Analysis using Deep Learning," In the proceedings of WWW Conference, (pages 1961-1966). Perth, Australia, 2018.
  2. Debanjan Mahata, Jasper Friedrichs, Rajiv Ratn Shah, Jing Jiang, "Did you take the #pill? - Detecting Personal Intake of Medicine from Twitter," In the proceedings of IEEE Intelligent Systems on Affective Computing and Sentiment Analysis, (Accepted). 2018.

Soundtrack-generation for Outdoor Videos

Capturing videos anytime and anywhere, and then instantly sharing them online, has become a very popular activity. However, many outdoor user-generated videos (UGVs) lack a certain appeal because their soundtracks consist mostly of ambient background noise. Aimed at making UGVs more attractive, we introduce ADVISOR, a personalized video soundtrack recommendation system. We propose a fast and effective heuristic ranking approach based on heterogeneous late fusion by jointly considering three aspects: venue categories, visual scene, and user listening history. Specifically, we combine confidence scores, produced by SVMhmm models constructed from geographic, visual, and audio features, to obtain different types of video characteristics. Our contributions are threefold. First, we predict scene moods from a real-world video dataset that was collected from users’ daily outdoor activities. Second, we perform heuristic rankings to fuse the predicted confidence scores of multiple models, and third we customize the video soundtrack recommendation functionality to make it compatible with mobile devices. A series of extensive experiments confirm that our approach performs well and recommends appealing soundtracks for UGVs to enhance the viewing experience. This work results in the following publications.

  1. Rajiv Ratn Shah, Yi Yu, and Roger Zimmermann, "ADVISOR - Personalized Video Soundtrack Recommendation by Late Fusion with Heuristic Rankings," In ACM Multimedia, pages 607-616. Orlando, Florida, USA, 2014. Download here.
  2. Rajiv Ratn Shah, Yi Yu, and Roger Zimmermann, "User Preference-Aware Music Video Generation Based on Modelling Scene Moods," In ACM Multimedia Systems, pages 156-159. Singapore, 2014. Download here.

Tag Relevance Computation

Social media platforms such as Flickr allow users to annotate photos with descriptive keywords, called, tags with the goal of making multimedia content easily understandable, searchable, and discoverable. However, due to the manual, ambiguous, and personalized nature of user tagging, many tags of a photo are in a random order and even irrelevant to the visual content. Aiming to automatically compute tag relevance for a given photo, we propose a tag ranking scheme based on voting from photo neighbors derived from multimodal information. Specifically, we determine photo neighbors leveraging geo, visual, and semantics concepts derived from spatial information, visual content, and textual metadata, respectively. We leverage high-level features instead traditional low-level features to compute tag relevance. Moreover, we explore the fusion of multimodal information to refine tag ranking leveraging recall based weighting. Subsequently, we build a tag recommendation system because manual annotation is very time-consuming and cumbersome for most users, which makes it difficult to search relevant photos. Moreover, predicted tags for a photo are not necessarily relevant to users’ interests. We aim to automatically annotate photos such that tags describe objective aspects of the photos considering user tagging behaviours. Our tag recommendation system, called, PROMPT, that recommends personalized tags for a given photo leveraging personal and social contexts. Specifically, first, we determine a group of users who have similar tagging behavior as the user of the photo, which is very useful in recommending personalized tags. Next, we find candidate tags from visual content, textual metadata, and tags of neighboring photos, and recommends five most suitable tags. We initialize scores of the candidate tags using asymmetric tag co-occurrence probabilities and normalized scores of tags after neighbor voting, and later perform random walk to promote the tags that have many close neighbors and weaken isolated tags. Finally, we recommend top five user tags to the given photo. This work results in the following publications.

  1. Rajiv Ratn Shah, Anupam Samanta, Deepak Gupta, Yi Yu, Suhua Tang, Roger Zimmermann, "PROMPT: Personalized User Tag Recommendation for Social Media Photos Leveraging Personal and Social Contexts," In IEEE International Symposium on Multimedia, (Accepted). San Jose, California, USA, 2016.
  2. Rajiv Ratn Shah, Yi Yu, Suhua Tang, Shin'ichi Satoh, Akshay Verma, and Roger Zimmermann, "Concept-Level Multimodal Ranking of Flickr Photo Tags via Recall Based Weighting," In MMCommon's Workshop at ACM Multimedia, pages 19-26. Amsterdam, The Netherlands, 2016.

Event Understanding

The rapid growth in the amount of user-generated content (UGCs) online necessitates for social media companies to automatically extract knowledge structures (concepts) from photos and videos to provide diverse multimedia-related services. However, real-world photos and videos are complex and noisy, and extracting semantics and sentics from the multimedia content alone is a very difficult task because suitable concepts may be exhibited in different representations. Hence, it is desirable to analyze UGCs from multiple modalities for a better understanding. To this end, we first present the EventBuilder system that deals with semantics understanding and automatically generates a multimedia summary for a given event in real-time by leveraging different social media such as Wikipedia and Flickr. Subsequently, we present the EventSensor system that aims to address sentics understanding and produces a multimedia summary for a given mood. It extracts concepts and mood tags from visual content and textual metadata of UGCs, and exploits them in supporting several significant multimedia-related services such as a musical multimedia summary. Moreover, EventSensor supports sentics-based event summarization by leveraging EventBuilder as its semantics engine component. Experimental results confirm that both EventBuilder and EventSensor outperform their baselines and efficiently summarize knowledge structures on the YFCC100M dataset. This work results in the following publications.

  1. Rajiv Ratn Shah, Anwar Dilawar Shaikh, Yi Yu, Wenjing Geng, Roger Zimmermann, and Gangshan Wu, "EventBuilder: Real-time Multimedia Event Summarization by Visualizing Social Media," In ACM Multimedia, pages 185-188. Brisbane, Australia, 2015.
  2. Rajiv Ratn Shah, Yi Yu, Akshay Verma, Suhua Tang, Anwar Dilawar Shaikh, and Roger Zimmermann, "Leveraging Multimodal Information for Event Summarization and Concept-level Sentiment Analysis," In Elsevier Knowledge Based Systems, pages 102-109. Volume 108, 2016.

Map Matching

Accurate map matching has been a fundamental but challenging problem that has drawn great research attention in recent years. It aims to reduce the uncertainty in a trajectory by matching the GPS points to the road network on a digital map. Most existing work has focused on estimating the likelihood of a candidate path based on the GPS observations, while neglecting to model the probability of a route choice from the perspective of drivers. Here we propose a novel feature-based map matching algorithm that estimates the cost of a candidate path based on both GPS observations and human factors. To take human factors into consideration is very important especially when dealing with low sampling rate data where most of the movement details are lost. Additionally, we simultaneously analyze a subsequence of coherent GPS points by utilizing a new segment-based probabilistic map matching strategy, which is less susceptible to the noisiness of the positioning data. We have evaluated the proposed approach on a public large-scale GPS dataset, which consists of 100 trajectories distributed all over the world. The experimental results show that our method is robust to sparse data with large sampling intervals (e.g., 60 s ~ 300 s) and challenging track features (e.g., u-turns and loops). Compared with two state-of-the-art map matching algorithms, our method substantially reduces the route mismatch error by 6.4% ~ 32.3% and obtains the best map matching results in all the different combinations of sampling rates and challenging features. This work results in the following publication.

  1. Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann, "A General Feature-based Map Matching Framework with Trajectory Simplification," In IWGS Workshop at ACM SIGSPATIAL, (pages 7(1-10)). Francisco Bay Area, California, USA, 2016. (Won the best paper award)

Lecture Video Segmentation

In multimedia-based e–learning systems, the accessibility and searchability of most lecture video content is still insufficient due to the unscripted and spontaneous speech of the speakers. Moreover, this problem becomes even more challenging when the quality of such lecture videos is not sufficiently high. To extract the structural knowledge of a multi-topic lecture video and thus make it easily accessible it is very desirable to divide each video into shorter clips by performing an automatic topic-wise video segmentation. To this end, we first present the ATLAS syetsm which leverages the visual content and transcription of a lecture video to determine segment boundaries. Subsequently, we present the TRACE system that leverages existing knowledge bases such as Wikipedia in addition to visual content and transcription to determine segment boundaries. TRACE has two main contributions: (i) the extraction of a novel linguistic-based Wikipedia feature to segment lecture videos efficiently, and (ii) the investigation of the late fusion of video segmentation results derived from state-of-the-art algorithms. Specifically for the late fusion, we combine confidence scores produced by the models constructed from visual, transcriptional, and Wikipedia features. This work results in the following publications.

  1. Rajiv Ratn Shah, Yi Yu, Anwar Dilawar Shaikh, Roger Zimmermann, "TRACE: A Linguistic-based Approach for Automatic Lecture Video Segmentation using Wikipedia Text," In IEEE International Symposium on Multimedia, pages 217-220. Miami, Florida, USA, 2016.
  2. Rajiv Ratn Shah, Yi Yu, Anwar Dilawar Shaikh, Suhua Tang, and Roger Zimmermann, "ATLAS: Automatic Temporal Segmentation and Annotation of Lecture Videos Based on Modelling Transition Time," In ACM Multimedia, pages 209-212. Orlando, Florida, USA, 2014. Download here.

News Video Uploading

An interesting recent trend, enabled by the ubiquitous availability of mobile devices, is that regular citizens report events which news providers then disseminate, e.g., CNN iReport. Often such news are captured in places with very weak network infrastructures and it is imperative that a citizen journalist can quickly and reliably upload videos in the face of slow, unstable, and intermittent Internet access. We envision that some middleboxes are deployed to collect these videos over energy-efficient short-range wireless networks. Multiple videos may need to be prioritized, and then optimally transcoded and scheduled. In this study we introduce an adaptive middlebox design, called NEWSMAN, to support citizen journalists. NEWSMAN jointly considers two aspects under varying network conditions: (i) choosing the optimal transcoding parameters, and (ii) determining the uploading schedule for news videos. We design, implement, and evaluate an efficient scheduling algorithm to maximize a user-specified objective function. We conduct a series of experiments using trace-driven simulations, which confirm that our approach is practical and performs well. For instance, NEWSMAN outperforms the existing algorithms (i) by 12 times in terms of system utility (i.e., sum of utilities of all uploaded videos), and (ii) by 4 times in terms of the number of videos uploaded before their deadline. This work results in the following publication.

  1. Rajiv Ratn Shah, Mohamed Hefeeda, Roger Zimmermann, Khaled Harras, Cheng-Hsin Hsu, Yi Yu, "NEWSMAN: Uploading Videos over Adaptive Middleboxes to News Servers In Weak Network Infrastructures," In Springer Multimedia Modeling, (pages 100-113). Miami, Florida, USA, 2016.

SMS-based FAQ Retrieval

We provide solution for SMS and FAQs matching in Malayalam, Hindi and English laguages. In order to perform a matching between SMS queries and FAQ database, we introduce enhanced similarity score, proximity score, enhanced length score and an answer matching system. We introduce the stemming of terms and consider the effects of joining adjacent terms in SMS query and FAQ to improve the similarity score. We propose a novel method to normalize FAQ and SMS tokens to improve the accuracy for Hindi language. Moreover, we suggest a few character substitutions to handle error in the SMS query. We demonstrate the effectiveness of our approach by considering many real-life FAQ-datasets provided by FIRE from a number of different domains such as Health, Telecom, Insurance and Railway booking. Experimental results confirm that our solution for the SMS-based FAQ Retrieval monolingual task is very encouraging and among the top submissions which performed very well for English, Hindi and Malayalam. The Mean Reciprocal Rank (MRR) scores for our approach are 0.971, 0.973 and 0.761 respectively for English, Hindi and Malayalam SMS-based FAQ Retrieval monolingual task in FIRE 2012. Furthermore, our solution topped the task for Hindi language with MRR score equal to 0.971 in FIRE 2013. Our approach performs very well for English language as well in FIRE 2013 despite transcripts of the speech queries are included in test dataset along with the normal SMS queries. This work results in the following publications.

  1. Anwar Shaikh, Rajiv Ratn Shah, and Rahis Shaikh. "SMS based FAQ Retrieval for Hindi, English and Malayalam," In Forum for Information Retrieval and Evaluation, pages 9-16. ACM, New Delhi, India, 2013. Download here.
  2. Anwar Shaikh, Mukul Jain, Mukul Rawat, Rajiv Ratn Shah, and Manoj Kumar "Improving Accuracy of SMS Based FAQ Retrieval System," In Multilingual Information Access in South Asian Languages , pages 142-156. Springer Berlin Heidelberg, 2013. Download here.