Location: George E. Pake Auditorium, 3333 Coyote Hill Road, Palo Alto, CA 94304

Time: August 21, 2015 Friday, 1:30pm - 5:00pm  Reserve Now

Zhengyou Zhang (Research Manager / Principal Researcher, Microsoft Research)

Zhengyou Zhang received the B.S. degree in electronic engineering from Zhejiang University, Hangzhou, China, in 1985, the M.S. degree in computer science from the University of Nancy, Nancy, France, in 1987, and the Ph.D. degree in computer science and the Doctorate of Science (Habilitation à diriger des recherches) from the University of Paris XI, Paris, France, in 1990 and 1994, respectively.

He is a Principal Researcher with Microsoft Research, Redmond, WA, USA, and the Research Manager of the “Multimedia, Interaction, and Experiences” group. Before joining Microsoft Research in March 1998, he was a Senior Research Scientist with INRIA (French National Institute for Research in Computer Science and Control), France. In 1996-1997, he spent a one-year sabbatical as an Invited Researcher with the Advanced Telecommunications Research Institute International (ATR), Kyoto, Japan. He has published over 200 papers in refereed international journals and conferences, and has coauthored the following books: 3-D Dynamic Scene Analysis: A Stereo Based Approach (Springer-Verlag, 1992); Epipolar Geometry in Stereo, Motion and Object Recognition (Kluwer, 1996); Computer Vision (Chinese Academy of Sciences, 1998, 2003, in Chinese); Face Detection and Adaptation (Morgan and Claypool, 2010), and Face Geometry and Appearance Modeling (Cambridge University Press, 2011). He is an inventor of more than 100 issued patents. He has given a number of keynotes in international conferences and invited talks in universities.

Dr. Zhang is an IEEE Fellow and an ACM Fellow. He is the Founding Editor-in-Chief of the IEEE Transactions on Autonomous Mental Development, and has served on the editorial board of IEEE TPAMI, IEEE TCSVT, IEEE TMM, IJCV, IJPRAI, MVA, among others. He has served as a program chair, a general chair, and a program committee member for numerous international conferences in the areas of computer vision, audio and speech signal processing, multimedia, human-computer interaction, and autonomous mental development. He is serving as a General Chair of International Conference on Multimodal Interaction (ICMI) 2015, and a General Chair of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017. He received the IEEE Helmholtz Test of Time Award at ICCV 2013 for his paper published in 1999 on camera calibration, now known as Zhang’s method.

Title: Vision-enhanced Immersive Interaction and Collaboration with Large Touch Displays

Abstract: Large displays are becoming commodity, and more and more, they are touch-enabled. In this talk, we describe a system called ViiBoard, Vision-enhanced Immersive Interaction with Touch Board. It consists of two parts.

The first part is called VTouch that augments touch input with visual understanding of the user to improve interaction with a large touch-sensitive display such as Microsoft Surface Hub. A commodity color plus depth sensor such as Microsoft Kinect adds the visual modality and enables new interactions beyond touch. Through visual analysis, the system understands where the user is, who the user is, and what the user is doing even before the user touches the display. Such information is used to enhance interaction in multiple ways. For example, a user can use simple gestures to bring up menu items such as color palette and soft keyboard; menu items can be shown where the user is and can follow the user; hovering can show information to the user before the user commits to touch; the user can perform different functions (for example writing and erasing) with different hands; and the user’s preference profile can be maintained, distinct from other users. User studies are conducted and the users very much appreciate the value of these and other enhanced interactions.

The second part is called ImmerseBoard. ImmerseBoard is a system for remote collaboration through a digital whiteboard that gives participants a 3D immersive experience, enabled only by an RGBD camera mounted on the side of a large touch display. Using 3D processing of the depth images, life-sized rendering, and novel visualizations, ImmerseBoard emulates writing side-by-side on a physical whiteboard, or alternatively on a mirror. User studies involving three tasks show that compared to standard video conferencing with a digital whiteboard, ImmerseBoard provides participants with a quantitatively better ability to estimate their remote partners’ eye gaze direction, gesture direction, intention, and level of agreement. Moreover, these quantitative capabilities translate qualitatively into a heightened sense of being together and a more enjoyable experience. ImmerseBoard’s form factor is suitable for practical and easy installation in homes and offices.


Bo Begole (Vice President, Global Head, Huawei Technologies' Media Lab)

Bio: Dr. Bo Begole is VP and Global Head of Huawei Technologies’ Media Technologies Lab spanning 7 locations in China, US, Europe and Russia, focusing on the creation of next generation networked media experiences.  Previously, he was a Sr. Director at Samsung Electronics’ User Experience Center America where he directed a team to develop new contextually intelligent services for wearable, mobile and display devices. Prior to that, he was a Principal Scientist and Area Manager at Xerox PARC where he directed the Ubiquitous Computing research program creating user behavior-modeling technologies, responsive media, and intelligent mobile agents.

He is an inventor with more than 25 issued patents and dozens pending.  He has spoken at several industry conferences, delivered many press interviews, co-authored dozens of peer-reviewed research publications and is the author of Ubiquitous Computing for Business (FT Press, 2011).  He is active in the organization of several conferences in the field of human-computer interaction and is General Chair of the 2015 conference on human factors in computing systems (CHI 2015) in Seoul, Korea. He has had the great fortune to work with fantastic colleagues with whom he has developed products in unified communication, groupware, media interoperability, consumer electronics and mobile context-aware recommenders.

Dr. Begole received a Ph.D. in computer science from Virginia Tech in 1998. Prior to his studies, he was enlisted in the US Army as an Arabic language interpreter.

Title:  Full Field Communication: Challenges and future experiences with realtime capture and transport of ultra-high resolution light and sound fields

Abstract: Full field communication' will ultimately capture and transport full light and sound fields of data in realtime across future IP networks, going far beyond today's conventional 'tele-presence' systems into a new realm of virtual 'tele-portation' systems.  Although much attention is currently focused on head-mounted displays and immersive goggles that are well suited for games, these HMDs are not proven to be effective in productivity or business applications.  Recent announcements from HMD manufacturers have had a lot of sizzle but provided little evidence to indicate that the benefits will likely outweigh the costs of discomfort, social stigma and the results of human factors studies that indicate poor suitability of HMDs in real-world tasks.  In contrast, future large-scale video/light-field displays, combined with high-speed networks, will allow people to engage in activities most of us only dream of today: diagnose and treat patients with full visual detail, repair complex machinery, climb Mount Everest, visit the Taj Mahal, drive a Formula 1 car, attend World Cup Football, jump from a plane, dive the Great Coral Reef, shop exotic bazaars around the world and more.  This presentation will describe the theoretical upper bounds of data transmission needed to achieve the virtual remote reality of full field communication.  The initial Huawei prototype toward that goal called MirrorSys is designed to meet the extreme limits of human visual and auditory perception in a full-size, high definition, real-time sharing and communication system.


Max Mühlhäuser (Professor & Head of Telecooperation Lab, Technische Universität Darmstadt, Germany)

Bio: Max Mühlhäuser is a Full Professor of Computer Science at Technische Universität Darmstadt, Germany, and head of the Telecooperation Lab. He received his Doctorate from the University of Karlsruhe and founded a research center for Digital Equipment (DEC). Since 1989, he worked as either professor or visiting professor at universities in Germany, Austria, France, Canada, and the US. Max published more than 450 articles, co-authored and edited books about UbiComp, E-learning, and distributed & multimedia software engineering. Max is deputy speaker of a nationally funded cooperative research center on the Future Internet and directorate member of the Center for Advanced SEcurity research Darmstadt (CASED).

Title: Interaction and Collaboration in Space: From Co-Located to Distributed

Abstract: Telepresence research used to focus on augmenting the experience of remote presence: the goal was “being there without really being there, then”, as the late Gordon Bell put it twenty years ago. In other words, two research questions were (and still are) guiding many projects:  (a) How to best represent a real person virtually, elsewhere? (b) How to best represent an “elsewhere” situation (e.g., meeting room) to remote participants? 

However, before “being there”, we first have to “get there”: participants have to be captured (in their context) first, only *then* “beamed to & presented at” the remote site(s). The same way, the “elsewhere” situation has to be captured first, only *then* can it be “beamed & presented” to the remote participant. The user experience concerning this preliminary step was largely “sacrificed” in favor of the second step. In other words, the research focus on remote presentation lead to many restrictions on the “capturing” side in past research. A recent example are telepresence robots: capable of conveying spatial presence on the remote site at the price of greatly restricting spatial freedom on the local site – the remote participant has to work in a quite constraint setting, devoting a lot of attention to her or his “spatial behavior” on the participating site.

The talk is centered around the hypothesis that recent advancements in spatial interaction (at large) open the path to more natural and free telepresence on the “capturing” side, plus even new opportunities w.r.t. remote representations. Therefore, a review of recent approaches to “interaction and cooperation in local smart spaces” will dominate the talk. Only then, the impact of these advancements for future telepresence solutions will be presented, design-centered around an emerging concept called federated smart spaces.   

Susie Wee (VP and Chief Technology Officer of Networked Experiences, Cisco)

Bio: Susie Wee is the Vice President and Chief Technology Officer of Networked Experiences at Cisco Systems. She is developing technologies and architectures for software-defined networks that provide improved operational experiences, end user experiences, and developer experiences with the network. She is contributing to Cisco’s unified platform strategy by coupling applications to the converged network and compute infrastructure. She is also developing technologies and systems for augmented collaboration and co-creation. Prior to this, Susie was the Vice President and Chief Technology and Experience Officer of Cisco’s Collaboration Technology Group where she was responsible for driving innovation and experience design in Cisco's collaboration products and software services, including unified communications, telepresence, web and video conferencing, and cloud collaboration.

Before joining Cisco, Susie was at Hewlett Packard in the roles of founding Vice President of the Experience Software Business and Chief Technology Officer of Client Cloud Services in HP’s Personal Systems Group and Lab Director of the HP Labs Mobile and Media Systems Lab. Susie was the co-editor of the JPSEC standard for the security of JPEG-2000 images and the editor of the JPSEC amendment on File Format Security. She was formerly an associate editor for the IEEE Transactions on Circuits, Systems and Video Technology and for the IEEE Transactions on Image Processing. While at HP Labs, Susie was a consulting assistant professor at Stanford University where she co-taught a graduate-level course on digital video processing.

Susie received Technology Review’s Top 100 Young Innovators award, ComputerWorld's Top 40 Innovators under 40 award, the INCITs Technical Excellence award, the Women In Technology International Hall of Fame award, and was on the Forbes Most Powerful Women list. She is an IEEE Fellow for her contributions in multimedia technology and has over 50 international publications and over 45 granted patents. Susie received her B.S., M.S., and Ph.D. degrees from the Massachusetts Institute of Technology.

Title: TBD

Abstract:  TBD

Subscribe to BAMMF

Our Sponsors:
PARC, a Xerox Company

Hewlett Packard

What's BAMMF?

BAMMF is a Bay Area Multimedia Forum series. Experts from both academia and industry are invited to exchange ideas and information through talks, tutorials, posters, panel discussions and networking sessions. Topics of the forum will include but not limited to emerging areas in vision, audio, touch, speech, text, sensors, human computer interaction, natural language processing, machine learning, media-related signal processing, communication, and cross-media analysis etc. Talks in the event may cover advancement in algorithms and development, demonstration of new inventions, product innovation, business opportunities, etc. If you are interested in giving a presentation at the forum, please contact us.