Location (with Workshop directions for Map, Parking, Shuttle, Lunch & WiFi Info): Gates B01, Stanford University 

Time: June 20, Friday, 10:00am - 5:00pm     Agenda     Reserve Now

Keynote Speaker - Multimedia and Big Data

Ramesh Jain

University of California, Irvine

The growth in social media, Internet of things, wearable devices, mobile phones, and planetary- scale sensing is resulting in unprecedented need and opportunity to assimilate spatio-temporally distributed heterogeneous data streams into actionable information. Although this data is commonly called Big Data, the more important dimensions of the data are its heterogeneous and spatio-temporal nature. Multimedia has always been addressing the issues in volume, as well as analysis, processing, communication, and display of spatio-temporal nature of data. We believe that concepts and techniques from multimedia can lead utilization of big data. This presentation first motivates and presents a systematic approach for combining multimodal real-time heterogeneous big data into actionable situations. Specifically, an approach for modeling and recognizing situations using available data streams is implemented using EventShop to model and detect environmental situations of interest. Similarly Personal EventShop is applied at personal level to determine evolving personal situations. By combining personal situation and environmental situation, it is possible to connect needs of people to appropriate resources efficiently, effectively, and promptly. We will discuss this framework using some early examples.


Ramesh Jain is an entrepreneur, researcher, and educator.

He is a Donald Bren Professor in Information & Computer Sciences at University of California, Irvine where he is doing research in Event Web and experiential computing. Earlier he served on faculty of Georgia Tech, University of California at San Diego, The university of Michigan, Ann Arbor, Wayne State University, and Indian Institute of Technology, Kharagpur. He is a Fellow of ACM, IEEE, AAAI, IAPR, and SPIE. His current research interests are in processing massive number of geo-spatial heterogeneous data streams for building Smart Social System. He is the recipient of several awards including the ACM SIGMM Technical Achievement Award 2010.

Ramesh co-founded several companies, managed them in initial stages, and then turned them over to professional management. These companies include PRAJA, Virage, and ImageWare. Currently he is involved in Stikco and SnapViz. He has also been advisor to several other companies including some of the largest companies in media and search space.

Facebook Graph Search - A Brief Introduction

Junfeng He


Graph Search is a personalized search service that traverses Facebook's social graph of billions of users and entities as well as trillions of contents and connections in a fraction of a second, to help people better understand and explore the world around them. In this talk, I will briefly introduce nuts and bolts of Graph Search and challenges around indexing, retrieval and ranking at this scale. I will also share our recent study about interesting user behaviors on Graph search, and how the behaviors are related to users’ demographic, social status, etc.


Junfeng He is currently a senior research scientist in search ranking group of Facebook graph search team.

He has taken charge of photo ranking and machine learning projects for Facebook graph search.

Before joining Facebook, he got his PhD from DVMM lab in Columbia University, working on large scale similarity search and machine learning, with applications to multimedia search and recognition. He obtained both his bachelor and master degree from Tsinghua University in China.

He has been active in the research area of computer vision, multimedia, machine learning in recent years, and published around 30 papers in first-tier conferences/ journals of the above areas.

Large-scale-high-precision topic modeling on Twitter

Shuang Yang


We aim to provide a topic-aware multi-channel experience on Twitter to facilitate content creation, discovery and consumption. This requires the ability to organize in real-time a continuous stream of sparse and noisy texts (i.e., tweets) into hundreds of topics with measurable and stringently high precision.  We present a spectrum of techniques that contribute to a deployed tweet topic modeling system. These include taxonomy construction, non-topical tweet detection, automatic labeled data acquisition, evaluation with human computation, diagnostic and corrective learning, and most importantly high-precision topic modeling. The latter represents a novel two-stage training algorithm for tweet text classification and a close-loop inference mechanism for combining texts with additional sources of information. The resulting system achieves 93% precision at substantial overall coverage.


Dr. Yang is a Research Scientist at Twitter, working on machine learning infrastructure and social targeting. He earned his Ph.D in Computer Science from Georgia Institute of Technology in 2012.

He has published actively at leading academic conferences and journals. He is the recipient of the ACM SIGIR 2011 Best Student Paper award, the UAI 2010 Best Student Paper award (nominated) and the PAKDD 2008 Best Student Paper award.


The Role of Emotional and Social Signals in Multimedia

Hayley Hung


In light of the new area on emotional and social signals in multimedia, in this talk, I will present an overview of past work, current work, and speculate about potential future directions.  Although the use of social media data in a multimedia context is now widely accepted, the use of spontaneous emotional and social signals is less broadly adopted. The aim of this talk is to provide definitions and explanations for why this might be and to provide some suggestions for points of synergy.


Hayley Hung is an Assistant Professor and Delft Technology Fellow in the Pattern Recognition and Bioinformatics group at TU Delft, The Netherlands, since 2013. She is also a visiting researcher at the VU University of Amsterdam, The Netherlands. Between 2010-2013, she held a Marie Curie Intra-European Fellowship at the Intelligent Systems Lab at the University of Amsterdam, working on devising models to estimate various aspects of human behaviour in large social gatherings. Between 2007-2010, she was a post-doctoral researcher at Idiap Research Institute in Switzerland, working on methods to automatically estimate human interactive behaviour in meetings such as dominance, cohesion and deception. She obtained her PhD in Computer Vision from Queen Mary University of London, UK in 2007 and her first degree from Imperial College, UK in Electrical and Electronic Engineering. Her research interests are in social computing, social signal processing, machine learning, ubiquitous computing, and human computer interaction. She is a founding member and area chair of the area on emotional and social signals at ACM MM, has acted as Doctoral Symposium co-chair ACM MM (2013), and has organised workshops on human behaviour understanding (InterHUB ( AmI 2011), Measuring Behaviour in open spaces (MB 2012), HBU (ACM MM 2013). She is also a special issue guest editor for ACM Transactions on Interactive Intelligent Systems. She has received first prize in the IET Written Premium competition 2009 and was nominated for outstanding paper at ICMI 2011.

Sensing, Understanding, and Shaping Human Behavior

Speaker: Vivek Singh


Today there are more than a trillion multimodal data points observing human behavior. This allows us to understand real world social behavior at scale and resolution not possible before. Based on multimodal interaction data (calls, bluetooth, sms, surveys) coming from a 'living lab' involving 100+ users observed for over a year, this talk discusses multiple results obtained at understanding social behavior. The obtained results demonstrate the value of such data for understanding human behavior in spending and emotional well-being settings. The results also indicate that it is possible to automatically detect ""trusted"" ties in social networks, which in turn can be critical for causing behavior change in health and wellness settings.


Dr. Vivek Singh is a post-doctoral researcher at the MIT Media Lab. He holds a Ph.D. in Computer Science from the University of California, Irvine. He obtained his bachelors and masters degrees in Computer Science from the National University of Singapore and was a full-time Lecturer at the Institute of Technical Education, Singapore for 4 years. His work has been presented at multiple leading venues and received two best paper awards. He was selected as one of the ‘Emerging leaders in multimedia research’ by IBM Research Labs in 2009 and he recently won the ‘Big Data for Social Good’ datathon organized by Telefónicathe Open Data Institute and the MIT. His research interests lie at the intersection of Big Data, Computational Social Science, and Multimedia Information Systems.

Vivek will be joining as a faculty member at the Rutgers University starting Fall 2014. 

Dispel the Clouds and See the Sun

Xian-Sheng Hua


We will introduce an automatic approach to mine useful knowledge from the large-scale noisy data on the Internet in the context of multimedia content analysis. Both algorithms and big data processing techniques will be discussed, together with a live demo to show how it works step by step.


Dr Xian-Sheng Hua has been a senior researcher in Microsoft Research Redmond since 2013, working on Web-scale image and video understanding and search. Before that, he was a Principal Research and Development Lead in Multimedia Search for the Microsoft search engine, Bing, in 2011~2012, where he led a team that designed and delivered leading-edge media understanding, indexing and searching features; and a researcher of Microsoft Research Asia in 2001~2010. Dr Hua received his BS in 1996 and PhD in applied mathematics in 2001 from Peking University, Beijing.

He served or is now serving as an associate editor of IEEE Transactions on Multimedia, and an associate editor of ACM Transactions on Intelligent Systems and Technology, etc.. He served as a program co-chair for IEEE ICME 2013, ACM Multimedia 2012, and IEEE ICME 2012. He was honored as one of the recipients of the prestigious 2008 MIT Technology Review TR35 Young Innovator Award for his outstanding contributions to video search. He won the Best Paper and Best Demonstration Awards at ACM Multimedia 2007, the Best Student Paper Award at ACM Conference on Information and Knowledge Management 2009, the Best Paper Award at International Conference on MultiMedia Modeling 2010, and the best paper award of IEEE Trans. on CSVT in 2014, etc.

When textual and visual information join forces for multimedia retrieval

Benoit Huet


Currently, popular search engines retrieve documents on the basis of text information. However, integrating the visual information with the text-based search for video and image retrieval is still a hot research topic. In this paper, we propose and evaluate a video search framework based on using visual information to enrich the classic text-based search for video retrieval. The framework extends conventional text-based search by fusing together text and visual scores, obtained from video subtitles (or automatic speech recognition) and visual concept detectors respectively. We attempt to overcome the so called problem of semantic gap by automatically mapping query text to semantic concepts. With the proposed framework, we endeavor to show experimentally, on a set of real world scenarios, that visual cues can effectively contribute to the quality improvement of video retrieval. Experimental results show that mapping text-based queries to visual concepts improves the performance of the search system.

Moreover, when appropriately selecting the relevant visual concepts for a query, a very signicant improvement of the system's performance is achieved.


Dr. Benoit Huet is Assistant Professor in the multimedia information processing group of Eurecom (France). He received his BSc degree in computer science and engineering from the École Supérieure de Technologie Électrique (Groupe ESIEE, France) in 1992. In 1993, he was awarded the MSc degree in Artificial Intelligence from the University of Westminster (UK) with distinction, where he then spent two years working as a research and teaching assistant. He received his DPhil degree in Computer Science from the University of York (UK) for his research on the topic of object recognition from large databases. He was awarded the HDR (Habilitation to Direct Research) from the University of Nice Sophia Antipolis, France, in October 2012 on the topic of Multimedia Content Understanding: Bringing Context to Content. He is associate editor for IEEE Multimedia, Multimedia Tools and Application (Springer) and Multimedia Systems (Springer) and has been guest editor for a number of special issues (EURASIP Journal on Image and Video Processing, IEEE Multimedia). He regularly serves on the technical program committee of the top conference of the field (ACM MM/ICMR, IEEE ICME/ICIP). He is chairing the IEEE MMTC Interest Group on Visual Analysis, Interaction and Content Management (VAIG). He is vice-chair of the IAPR Technical Committee 14 Signal Analysis for Machine Intelligence. His research interests include computer vision, large-scale multimedia data mining and indexing (still and/or moving images), content-based retrieval, semantic labelling and annotation of multimedia content, multimodal fusion, and pattern recognition. He has co-authored over 120 papers in Books, Journals and International conferences. His current research interests include Multimedia Content Analysis, Mining and Indexing - Multimodal Fusion - Socially-Aware Multimedia.

The Role of Knowledge in Large-Scale Multimedia Analysis

Tat-Seng Chua


The emergence of social networking sites has given rise to a huge amount of user-generated contents. Many of these contents are increasingly multimedia in nature, often with no relevant text annotations. The analysis of such multimedia collections by jointly modeling textual, visual and social features have been attempted, but with limited success. This paper explores such joint analysis of large-scale multimedia contents supplemented with domain knowledge automatically acquired from the Web. We demonstrate the effectiveness of this framework on the tasks of attribute learning, video event detection and modeling of curated images, by supplementing the analysis using semanticNet, frameNet, and Wikipedia knowledge resources respectively.


Dr Chua is the KITHCT Chair Professor at the School of Computing, National University of Singapore. He was the Acting and Founding Dean of the School during 1998-2000. His main research interest is in multimedia information retrieval and social media analysis. In particular, his research focuses on the extraction, retrieval and question-answering (QA) of text, video and live media arising from the Web and social networks. He is the co-Director of a multi-million-dollar joint Center between NUS and Tsinghua University to develop technologies for live social observatory.

Dr Chua is active in the international research community. He was the conference co-chair of ACM Multimedia 2005, ACM CIVR 2005, and ACM SIGIR 2008, and serves in the editorial boards of several journals. He is the member of steering committee of ICMR (International Conference on Multimedia Retrieval) and Multimedia Modeling conference series. He is the Director of several companies, including a publicly listed one in Singapore.

Large-scale Video Classification with Convolutional Neural Networks

Speaker: Andrej Karpathy


Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).

Performance benefit of single assignment languages for parallel execution

Speaker: Carsten Griwodz


We look at the advantages that a compiler can draw from compiling a kernel-based high level language that allows only single assignment of virtual memory for run-time performance, compared to a straight-forward translation to TBB and OpenMP. The generated run-time itself may in turn rely on pthreads, TBB or OpenMP. This is promising early work.   


Neuromedia: Brain-Computer Interactive Narratives

Marc Cavazza


There has been growing interest in developing interactive media that would respond to the spectator's emotional response. This endeavour is faced with both engineering challenges and fundamental ones, which pursue unified models of user experience and affective filmic response. Recently, we have extended our work in 3D animated Interactive Storytelling to incorporate Brain-Computer Interfaces (BCI) as an input modality. Our objective was to explore a framework in which contemporary theories of filmic emotions, as well as recent neuroscience findings, and affective BCI could be integrated. We have developed an Interactive Narrative whose plot is inspired by popular medical dramas. This story is designed to elicit empathetic responses towards the lead character, which faces considerable difficulties in her everyday work and is embroiled in professional and personal arguments. As her situation deteriorates, the user has the possibility to support her mentally by expressing sympathy or positive thoughts, which would alter the course of action and contribute to a positive outcome of the narrative. Our BCI is based on pre-frontal alpha EEG asymmetry, originally reported by Davidson as a marker of approach/withdrawal, hence a higher-order affective dimension than those routinely used in affective computing. We use alpha-asymmetry under a Neurofeedback (NF) paradigm so as to support conspicuous interaction. In addition the usability of alpha-asymmetry for NF has been established by previous clinical studies.

This system has been fully-implemented and has undergone a preliminary proof-of-concept analysis using fMRI, whose results were compatible with our initial intuitions. We have recently carried out a first usability study with 36 subjects which has shown that even with very minimal training a significant fraction of users could successfully interact with the system. The study has also shed light on the cognitive strategies adopted by the users for NF. The talk will describe the implementation of the system, as well as reporting results from the various experiments, and will be illustrated by footage from the interactive narrative. This research was carried out at Teesside University in partnership with the Functional Brain Center of the Tel Aviv Sourasky Medical Center.


Marc Cavazza is a Professor at Teesside University (UK), where he has conducted research on Interactive Storytelling since 2001. His group has introduced several innovations (immersive storytelling, character-centric approaches, affective interaction, narrative trajectory control…) and has produced some of the most cited publications on the topic. His recent work has explored affective multimodal interfaces, including subliminal interfaces and Brain-Computer Interfaces. He has coordinated the FP7 European Network of Excellence dedicated to Interactive Storytelling, and is currently involved in two FP7 Future and Emerging Technologies (FET) Projects. He has authored over 200 publications, supervising the development of numerous prototypes presented over the years at AAMAS, ACM Multimedia, ACM IUI and ICAPS. These have received the Best Demonstration Award at AAMAS 2010 and the Best Application Award at ICAPS 2013. He holds an MD and a PhD, both from Université Paris-Diderot.

Virtual Kitchens for Teaching Multimedia Analysis

Speaker: Martha Larson


Strengthening the field of multimedia means effectively training each new generation of multimedia researchers. We believe in introducing students to multimedia analysis as early as possible in the computer science curriculum. Undergraduate students benefit from hands-on assignments where they can take the first steps in implementing multimedia analysis algorithms and understanding how they work by making adaptations and comparisons. Multimedia analysis labs are a challenging form of instruction because they are time consuming to develop, and it is difficult to compare students’ implementations across different computing environments. For this reason, we look to an initiative of the speech technology community, the “Speech Recognition Virtual Kitchen” for This initiative has introduced the idea that a virtual machine can be used to provide a standard “kitchen” environment with all the necessary “ingredients” that students need to create basic experiments. We describe our experiences using Virtual Kitchens in the first year of our new Multimedia Analysis course, and give an outlook on how the sharing of Virtual Kitchens among institutions can, in the future, represent an important teaching resource for the multimedia community.

What's BAMMF?

BAMMF is a Bay Area Multimedia Forum series. Experts from both academia and industry are invited to exchange ideas and information through talks, tutorials, panel discussions and networking sessions. Topics of the forum will include emerging areas in multimedia, advancement in algorithms and development, demonstration of new inventions, product innovation, business opportunities, etc. If you are interested in giving a talk at the forum, please contact us.

Subscribe to BAMMF