BAY AREA MULTIMEDIA FORUM (BAMMF)

Location: George E. Pake Auditorium, 3333 Coyote Hill Road, Palo Alto, CA 94304

Time: Nov. 20th, Thursday, 1:00pm - 5:00pm  Reserve Now        Agenda

Li Deng (Deep Learning Technology Center, Microsoft Research, Redmond, USA)

Title: Industrial Impact of Deep Learning - From Speech Recognition to Language and Multimodal Processing

Bio: Li Deng received Ph.D. from the University of Wisconsin-Madison. He was assistant, associate, and full professor (1989-1999) at the University of Waterloo, Ontario, Canada, and then joined Microsoft Research, Redmond, where he is currently a Principal Research Manager of its Deep Learning Technology Center. Since 2000, he has also been an affiliate full professor at the University of Washington, Seattle. He has been granted over 60 US or international patents, and has received numerous awards and honors bestowed by IEEE, ISCA, ASA, and Microsoft including the latest IEEE SPS Best Paper Award (2013) on deep neural nets for large vocabulary speech recognition. He authored or co-authored 5 books including the latest two on Deep Learning: Methods and Applications (2014) and on Automatic Speech Recognition: A Deep Learning Approach (Springer, 2014). He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of the ISCA. He served as the Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011), and currently as Editor-in-Chief for IEEE Transactions on Audio, Speech and Language Processing. His recent research interests and activities have been focused on deep learning and machine intelligence applied to large-scale text analysis and to speech/language/image multimodal processing, advancing his earlier work with collaborators on speech recognition using deep neural networks since 2009.

Abstract: Since 2010, deep neural networks have started making real impact in speech recognition industry, building upon earlier work on (shallow) neural nets and (deep) graphical models developed by both speech and machine learning communities. This keynote will first reflect on the historical path to this transformative success. The role of well-timed academic-industrial collaboration will be highlighted, so will be the advances of big data, big compute, and seamless integration between application-domain knowledge of speech and general principles of deep learning. Then, an overview will be given on the sweeping achievements of deep learning in speech recognition since its initial success in 2010 (as well as in image recognition since 2012). Such achievements have resulted in across-the-board, industry-wide deployment of deep learning. The final part of the talk will focus on applications of deep learning to large-scale language/text and multimodal processing, a more challenging area where potentially much greater industrial impact than in speech and image recognition is emerging.

Yangqing Jia (Google)

Title: Brewing a Deeper Understanding of Images

Bio: Yangqing Jia obtained his BS and MS degrees from Tsinghua University, and his PhD degree at UC Berkeley advised by Professor Trevor Darrell. He is currently a research scientist at Google. His main interest includes large-scale and cognitive science inspired computer vision, efficient learning of state-of-the-art visual features, and parallel computation in vision applications.

Abstract: In this talk I will introduce the recent developments in the image recognition fields from two perspectives: as a researcher and as an engineer. For the first part I will describe our recent entry "GoogLeNet" that won the ImageNet 2014 challenge, including the motivation of the model and knowledge learned from the inception of the model. For the second part, I will dive into the practical details of Caffe, an open-source deep learning library I created at UC Berkeley, and show how one could utilize the toolkit for a quick start in deep learning as well as integration and deployment in real-world applications.

Ronan Collobert (Facebook)

Title:Applied Deep Learning

Bio:I joined Facebook in September 2014, as a Research Scientist in the AI Research Group. In the previous past four years I was a researcher at the Idiap Research Institute in Switzerland, leading the Applied Machine Learning group. Previously, I have spent six years at NEC Labs in Princeton, NJ. My current interests are around deep learning in general, with applications in natural language, image and speech processing. I also hack Torch (a machine learning library) on my free time. I received my PhD from University of Pierre & Marie Curie in Paris, while achieving it under the supervision of the Bengio brothers (Idiap in Swizterland, and University of Montreal).

Abstract: I am interested in machine learning algorithms which can be applied in real-life applications and which can be trained on “raw data". Specifically, I prefer to trade simple "shallow" algorithms with task-specific handcrafted features for more complex ("deeper") algorithms trained on raw features. In that respect, I will present several general deep learning architectures, which excels in performance on various Natural Language, Speech and Image Processing tasks. I will look into specific issues related to each application domain, and will attempt to propose general solutions for each use case.

Richard Socher (Stanford)

Title: Compositional Language and Visual Understanding

Bio: Richard Socher obtained his PhD from Stanford where he worked with Chris Manning and Andrew Ng. His research interests are machine learning for NLP and vision. He is interested in developing new deep learning models that learn useful features, capture compositional structure in multiple modalities and perform well across different tasks. He was awarded the 2011 Yahoo! Key Scientific Challenges Award, the Distinguished Application Paper Award at ICML 2011, a Microsoft Research PhD Fellowship in 2012 and a 2013 "Magic Grant" from the Brown Institute for Media Innovation.

Abstract: In this talk, I will describe deep learning algorithms that learn representations for language that are useful for solving a variety of complex language tasks. I will focus on 3 projects:

- Contextual sentiment analysis (e.g. having an algorithm that actually learns what's positive in this sentence: "The Android phone is better than the IPhone")

- Question answering to win trivia competitions (like IBM Watson's Jeopardy system but with one neural network)

- Multimodal sentence-image embeddings to find images that visualize sentences and vice versa (with a fun demo!) All three tasks are solved with a similar type of recursive neural network algorithm.

November meeting is hosted by Guest Organizers:

Jianchao Yang (Adobe)

Bio: Jianchao Yang is a research scientist in Imagination Lab at Adobe Research, San Jose, California. He got his M.S. and Ph.D. degrees from Electrical and Computer Engineering (ECE) Department of University of Illinois at Urbana-Champaign (UIUC) in 2011, under supervision of Professor Thomas S. Huang at Beckman Institute. Before that, he received his Bachelor's degree in EEIS Department from University of Science and Technology of China (USTC) in 2006.

His research interests are in the broad area of computer vision, machine learning, and image processing. Specifically, he has extensive experience in the following research areas: image categorization, object recognition and detection, image retrieval; image and video super-resolution, denoising and deblurring; face recognition and soft biometrics; sparse coding and sparse representation; unsupervised learning, supervised learning, and deep learning.

Eugene Bart (PARC)

Bio: Evgeniy Bart is a member of research staff at PARC, Palo Alto, California. He received his Ph.D. degree from the Weizmann Institute in 2004, under the supervision of Prof. Shimon Ullman. Prior to that, he received his B.Sc degree in physics and computer science from the Tel Aviv University. His research interests are in machine learning, computer vision, and biological vision.


What's BAMMF?

BAMMF is a Bay Area Multimedia Forum series. Experts from both academia and industry are invited to exchange ideas and information through talks, tutorials, posters, panel discussions and networking sessions. Topics of the forum will include emerging areas in vision, audio, touch, speech, text, various sensors, human computer interaction, natural language processing, machine learning, media-related signal processing, communication, and cross-media analysis etc. Talks in the event may cover advancement in algorithms and development, demonstration of new inventions, product innovation, business opportunities, etc. If you are interested in giving a presentation at the forum, please contact us.


Subscribe to BAMMF


http://www.computer.org/portal/web/tcmchttp://www.computer.org/portal/web/tandc/tcsemhttp://www.sigmm.org/

http://www.fxpal.com
PARC, a Xerox Company