BAY AREA MULTIMEDIA FORUM (BAMMF)

Location: George E. Pake Auditorium, 3333 Coyote Hill Road, Palo Alto, CA 94304

Time: Nov. 20th, Thursday, 1:00pm - 5:00pm  Reserve Now

November meeting is hosted by Guest Organizer:

Jianchao Yang (Adobe)

Bio: Jianchao Yang is a research scientist in Imagination Lab at Adobe Research, San Jose, California. He got his M.S. and Ph.D. degrees from Electrical and Computer Engineering (ECE) Department of University of Illinois at Urbana-Champaign (UIUC) in 2011, under supervision of Professor Thomas S. Huang at Beckman Institute. Before that, he received his Bachelor's degree in EEIS Department from University of Science and Technology of China (USTC) in 2006.

His research interests are in the broad area of computer vision, machine learning, and image processing. Specifically, he has extensive experience in the following research areas: image categorization, object recognition and detection, image retrieval; image and video super-resolution, denoising and deblurring; face recognition and soft biometrics; sparse coding and sparse representation; unsupervised learning, supervised learning, and deep learning.

Speakers for the November meeting are as follows, details for their talks will be announced soon:

Li DENG (Deep Learning Technology Center, Microsoft Research, Redmond, USA)

Title: Industrial Impact of Deep Learning - From Speech Recognition to Language and Multimodal Processing

Bio: Li Deng received Ph.D. from the University of Wisconsin-Madison. He was assistant, associate, and full professor (1989-1999) at the University of Waterloo, Ontario, Canada, and then joined Microsoft Research, Redmond, where he is currently a Principal Research Manager of its Deep Learning Technology Center. Since 2000, he has also been an affiliate full professor at the University of Washington, Seattle. He has been granted over 60 US or international patents, and has received numerous awards and honors bestowed by IEEE, ISCA, ASA, and Microsoft including the latest IEEE SPS Best Paper Award (2013) on deep neural nets for large vocabulary speech recognition. He authored or co-authored 5 books including the latest two on Deep Learning: Methods and Applications (2014) and on Automatic Speech Recognition: A Deep Learning Approach (Springer, 2014). He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the ISCA. He served as the Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011), and currently as Editor-in-Chief for IEEE Transactions on Audio, Speech and Language Processing. His recent research interests and activities have been focused on deep learning and machine intelligence applied to large-scale text analysis and to speech/language/image multimodal processing, advancing his earlier work with collaborators on speech recognition using deep neural networks since 2009.

Abstract: Since 2010, deep neural networks have started making real impact in speech recognition industry, building upon earlier work on (shallow) neural nets and (deep) graphical models developed by both speech and machine learning communities. This keynote will first reflect on the historical path to this transformative success. The role of well-timed academic-industrial collaboration will be highlighted, so will be the advances of big data, big compute, and seamless integration between application-domain knowledge of speech and general principles of deep learning. Then, an overview will be given on the sweeping achievements of deep learning in speech recognition since its initial success in 2010 (as well as in image recognition since 2012). Such achievements have resulted in across-the-board, industry-wide deployment of deep learning. The final part of the talk will focus on applications of deep learning to large-scale language/text and multimodal processing, a more challenging area where potentially much greater industrial impact than in speech and image recognition is emerging.

Ronan Collobert (Facebook)

Title: Applied Deep Learning

Bio: I joined Facebook in September 2014, as a Research Scientist in the AI Research Group. In the previous past four years I was a researcher at the Idiap Research Institute in Switzerland, leading the Applied Machine Learning group. Previously, I have spent six years at NEC Labs in Princeton, NJ. My current interests are around deep learning in general, with applications in natural language, image and speech processing. I also hack Torch (a machine learning library) on my free time. I received my PhD from University of Pierre & Marie Curie in Paris, while achieving it under the supervision of the Bengio brothers (Idiap in Swizterland, and University of Montreal).

Abstract: I am interested in machine learning algorithms which can be applied in real-life applications and which can be trained on “raw data". Specifically, I prefer to trade simple "shallow" algorithms with task-specific handcrafted features for more complex ("deeper") algorithms trained on raw features. In that respect, I will present several general deep learning architectures, which excels in performance on various Natural Language, Speech and Image Processing tasks. I will look into specific issues related to each application domain, and will attempt to propose general solutions for each use case.

Yangqing Jia (Google)

Title: Brewing a Deeper Understanding of Images

Bio: Yangqing Jia obtained his BS and MS degrees from Tsinghua University, and his PhD degree at UC Berkeley advised by Professor Trevor Darrell. He is currently a research scientist at Google. His main interest includes large-scale and cognitive science inspired computer vision, efficient learning of state-of-the-art visual features, and parallel computation in vision applications.

Abstract: In this talk I will introduce the recent developments in the image recognition fields from two perspectives: as a researcher and as an engineer. For the first part I will describe our recent entry "GoogLeNet" that won the ImageNet 2014 challenge, including the motivation of the model and knowledge learned from the inception of the model. For the second part, I will dive into the practical details of Caffe, an open-source deep learning library I created at UC Berkeley, and show how one could utilize the toolkit for a quick start in deep learning as well as integration and deployment in real-world applications.

Richard Socher (Stanford)

Title: Recursive Deep Learning for Modeling Compositional and Grounded Meaning

Bio: Richard Socher obtained his PhD from Stanford where he worked with Chris Manning and Andrew Ng. His research interests are machine learning for NLP and vision. He is interested in developing new deep learning models that learn useful features, capture compositional structure in multiple modalities and perform well across different tasks. He was awarded the 2011 Yahoo! Key Scientific Challenges Award, the Distinguished Application Paper Award at ICML 2011, a Microsoft Research PhD Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation.

Abstract: Great progress has been made in natural language processing thanks to many different algorithms, each often specific to one application. Most learning algorithms force language into simplified representations such as bag-of-words or fixed-sized windows or require human-designed features. I will introduce models based on recursive neural networks that can learn linguistically plausible representations of language and reason over knowledge bases. These methods jointly learn compositional features and grammatical sentence structure for parsing or phrase level sentiment predictions. They can also be used to represent the visual meaning of a sentence which can be used to find images based on query sentences or to describe images with a more complex description than single object names. Besides the state-of-the-art performance, the models capture interesting phenomena in language such as compositionality. For instance, people easily see that the "with" phrase in "eating spaghetti with a spoon" specifies a way of eating whereas in "eating spaghetti with some pesto" it specifies the dish. I show that my model solves these prepositional attachment problems well thanks to its distributed representations. In sentiment analysis, a new tensor-based recursive model learns different types of high level negation and how they can change the meaning of longer phrases with many positive words. They also learn that when contrastive conjunctions such as "but" are used the sentiment of the phrases following them usually dominates.

Jia Li (Yahoo Research)

Bio: I am a research scientist at Yahoo! Research. My research interests are computer vision, social network analysis, machine learning and multimedia analysis. I received my Ph.D. degree from the Computer Science Department at Stanford University. I was the leader of the OPTIMOL team, which won the first prize in the Semantic Robotics Vision Challenge in 2007. I served as the volunteers chair in CVPR 2010 and travel funding committee for ACM Multimedia Systems 2014. I co-organized the 1st IEEE workshop on Visual Scene Understanding (ViSU) in conjunction with CVPR 2009 as well as the Fine-Grained Challenge at ICCV 2013.

What's BAMMF?

BAMMF is a Bay Area Multimedia Forum series. Experts from both academia and industry are invited to exchange ideas and information through talks, tutorials, panel discussions and networking sessions. Topics of the forum will include emerging areas in multimedia, advancement in algorithms and development, demonstration of new inventions, product innovation, business opportunities, etc. If you are interested in giving a talk at the forum, please contact us.


Subscribe to BAMMF


http://www.computer.org/portal/web/tcmchttp://www.computer.org/portal/web/tandc/tcsemhttp://www.sigmm.org/

http://www.fxpal.com
PARC, a Xerox Company