FULL Papers - Oral Presentation
1. Motion Estimation for Regions of Reflections Through Layer Separation - Mohamed Ahmed, Francois Pitie & Anil Kokaram (Trinity College Dublin)
Regions of reflections contain two semi-transparent layers moving over each other. This generates two motion vectors per pel. Current multiple motion estimators either extend the usual brightness consistency assumption to two motions or are based on the Fourier phase shift relationship. Both approaches assume constant motion over at least three frames. As a result they can not handle temporally active motion due to camera shake or acceleration. This paper proposes a new approach for multiple motion estimation by modeling the correct motions as the ones generating the best layer separation of the examined reflection. A Bayesian framework is proposed which then admits a solution using candidate motions generated from KLT trajectories and a layer separation technique. We use novel temporal priors and our results show handling of strong motion inconsistencies and improvements over previous work.
CVMP11_Motion-Estimation-for-Regions-of-Reflections-through-Layer-Separation.pdf (Adobe PDF - 1.93Mb)
2.Practical Image-Based Relighting and Editing with Spherical-Harmonics and Local Lights - Borom Tunwattanapong (University of Southern California), Abhijeet Ghosh (USC-ICT), Paul Debevec (USC-ICT)
We present a practical technique for image-based relighting under environmental illumination which greatly reduces the number of required photographs compared to traditional techniques, while still achieving high quality editable relighting results. The proposed method employs an optimization procedure to combine spherical harmonics, a global lighting basis, with a set of local lights. Our choice of lighting basis captures both low and high frequency components of typical surface reflectance functions while generating close approximations to the ground truth with an order of magnitude less data. This technique benefits the acquisition process by reducing the number of required photographs, while simplifying the modification of reflectance data and enabling artistic lighting edits for post-production effects. Here, we demonstrate two desirable lighting edits, modifying light intensity and angular width, employing the proposed lighting basis.
CVMP11_Practical-Image-Based-Relighting-and-editing.pdf (Adobe PDF - 4.24Mb)
CVMP11_Practical-ImageBased-Relighting-and-editing_Supplementary.mov (Quicktime Movie - 7.93Mb)
3.Space-time editing of 3D video sequences - Margara Tejera, Adrian Hilton (University of Surrey)
A shape constrained Laplacian mesh deformation approach is introduced for interactive editing of mesh sequences. This allows low-level constraints, such as foot or hand contact, to be imposed while preserving the natural dynamics of the captured surface. The approach also allows artistic manipulation of motion style to achieve effects such as squash-and-stretch.
Interactive editing of key-frames is followed by automatic temporal propagation over a window of frames.
User edits are seamlessly integrated into the captured mesh sequence. Three spatio-temporal interpolation methods are evaluated. Results on a variety of real and synthetic sequences demonstrate that the approach enables flexible manipulation of captured 3D video sequences.
CVMP11_Space-time-editing-of-3D-video-Sequences.pdf (Adobe PDF - 5.66Mb)
4.Head-mounted Photometric Stereo for Performance Capture - Andrew Jones, Graham Fyffe, Xueming Yu, Wan-Chun Ma, Jay Busch, Ryosuke Ichikari , Mark Bolas, Paul Debevec (USC-ICT)
Head-mounted cameras are an increasingly important tool for capturing facial performances to drive virtual characters. They provide a fixed, unoccluded view of the face, useful for observing motion capture dots or as input to video analysis. However, the 2D imagery captured with these systems is typically affected by ambient light and generally fails to record subtle 3D shape changes as the face performs. We have developed a system that augments a head-mounted camera with LED-based photometric stereo. The system allows observation of the face independent of the ambient light and generates per-pixel surface normals so that the performance is recorded dynamically in 3D. The resulting data can be used for facial relighting or as better input to machine learning algorithms for driving an animated face.
CVMP11_Head-mounted-Photometric-Stereo-for-performance-Capture.pdf (Adobe PDF - 1.71Mb)
5.Depth Estimation from Three Cameras Using Belief Propagation - Kensuke Ikeya, Kensuke Hisatomi, Miwa Katayama, Yuichi Iwadate (NHK)
We propose a method to estimate depth from three wide-baseline camera images using belief propagation. With this method, message propagation is restricted to reduce the effects of boundary overreach, and max and min values and kurtosis of message energy distribution are used to reduce errors caused by large occlusion and textureless areas. In experiments, we focused on scenes of the traditional Japanese sport of sumo and created 3D models from three HD images using our method. We displayed them on a 3D display using the principle of integral photography (IP). We confirmed from the experimental results that our method was effective for estimating depth.
CVMP11_Depth_Estimation_from_Three_Cameras_Using_Belief_Propagation.pdf (Adobe PDF - 1.06Mb)
6.Semantic Kernels Binarized -- A Feature Descriptor for Fast and Robust Matching - Frederik Zilly, Christian Riechert, Peter Eisert, Peter Kauff (Fraunhofer HHI)
This paper presents a new approach for feature description used in image processing and robust image recognition algorithms such as 3D camera tracking, view reconstruction or 3D scene analysis. State of the art feature detectors distinguish interest point detection and description. The former is commonly performed in scale space, while the latter is used to describe a normalized support region using histograms of gradients or similar derivatives of the grayscale image patch. This approach has proven to be very successful. However, the descriptors are usually of high dimensionality in order to achieve a high descriptiveness. Against this background, we propose a binarized descriptor which has a low memory usage and good matching performance. The descriptor is composed of binarized responses resulting from a set of folding operations applied on the normalized support region. We demonstrate the realtime capabilities of the feature descriptor in a stereo matching environment.
CVMP11_Semantic-Kernals-Binarized.pdf (Adobe PDF - 804Kb)
7.Realtime Video Based Water Surface Approximation - Chuan Li, Martin Shaw, David Pickup, Darren Cosker, Phil Willis, Peter Hall (University of Bath)
This paper describes an approach for automatically producing convincing water surfaces from video data in real time. Fluids simulation has long been studied in the Computer Graphics literature, but the methods developed are expensive and require input from highly trained artists. In contrast our method is a low cost Computer Vision based solution which requires
only a single video as a source. Our output consists of an animated mesh of the water surface captured together with surface velocities and texture maps from the video data. As an example of what can be done with this data, a modiﬁed form of video textures is used to create naturalistic inﬁnite transition loops of the captured water surface. We demonstrate our approach over a wide range of inputs, including quiescent lakes, breaking sea waves, and waterfalls. All source video we use are taken from a third-party publicly available database.
CVMP11_Realtime-video-based-water-surface-approximation.pdf (Adobe PDF - 975Kb)
8.Efficient Dense Reconstruction from Video - Phil Parsonage (The Foundry), Adrian Hilton (University of Surrey), Jon Starck (The Foundry)
We present a framework for efficient reconstruction of dense scene structure from video. Sequential structure-from-motion recovers camera information, providing only sparse 3D points, which we make more dense. First, we present a novel algorithm for sequential frame selection to extract a set of keyframes with sufficient parallax for accurate depth reconstruction. Second, we introduce a technique for efficient reconstruction using dense tracking with geometrically correct optimisation of depth and orientation. Keyframe selection is also performed in optimisation to provide accurate depth reconstruction for different scene elements. We test our work on benchmark footage and scenes containing local non-rigid motion, foreground clutter and occlusions to compare performance to state of the art techniques. We show a substantial increase in speed on real world footage compared to existing methods, when they succeed, and successful reconstructions when they fail.
CVMP11_Efficient-dense-reconstruction-from-video.pdf (Adobe PDF - 4.72Mb)
9.Disparity-aware Stereo 3D Production Tools - Aljoscha Smolic, Steven Poulakos, Simon Heinzle, Pierre Greisen, Manuel Lang, Alexander Hornung, Miquel Farre, Nikolce Stefanoski, Oliver Wang, Lars Schnyder, Rafael Monroy Rodriguez , Markus Gross (Disney Research)
Stereoscopic 3D (S3D) has reached wide levels of adoption by consumer and professional markets. However, production of high quality S3D content is still a difficult and expensive art. Various S3D production tools and systems have been released recently to assist high quality content creation. This paper presents a number of such algorithms, tools and systems developed at Disney Research Zurich, which all make use disparity-aware processing.
CVMP11_Disparity-Aware-stereo-3D-Production-tools.pdf (Adobe PDF - 1.52Mb)
10.Multi-camera scheduling for video production - Fahad Daniyal, Andrea Cavallaro (QMUL)
We present a novel algorithm for automated video production based on content ranking. The proposed algorithm generates videos by performing camera selection while minimizing the number of inter-camera switches. We model the problem as a finite horizon Partially Observable Markov Decision Process over temporal windows and we use a multivariate Gaussian distribution to represent the content-quality score for each camera. The performance of the proposed approach is demonstrated on a multi-camera setup of fixed cameras with partially overlapping fields of view. Subjective experiments based on the Turing test confirmed the quality of the automatically produced videos. The proposed approach is also compared with recent methods based on Recursive Decision and on Dynamic Bayesian Networks and its results outperform both methods.
CVMP11_Multicamera-scheduling-.pdf (Adobe PDF - 7.92Mb)
11.Making of "Who Cares?" - HD Stereoscopic Free Viewpoint Video - Christian Lipski, Felix Klose, Kai Ruhl, Marcus Magnor (TU Braunschweig)
We present a detailed blueprint of our stereoscopic free-viewpoint video system.
Using unsynchronized footage as input, we can render virtual camera paths in the post-production stage.
The movement of the virtual camera also extends to the temporal domain, so that slow-motion and freeze-and-rotate shots are possible.
As a proof-of-concept, a full length stereoscopic HD music video has been produced using our approach.
CVMP11_Making-of-Who-Cares_HD-stereoscopic-Free-Viewpoint-video.pdf (Adobe PDF - 2.82Mb)
Supplementary video material can be found here.
12.Automatic Object Segmentation from Calibrated Images - Neill Campbell (University of Cambridge), George Vogiatzis (Aston University), Carlos Hernandez-Esteban (Google), Roberto Cipolla (University of Cambridge)
This paper concerns the automatic recovery of binary object/background segmentations of a central object in a set of calibrated images. The resulting segmentations may be used to recover the shape of the object, particularly useful in scenes where MVS methods are unable to recover the full surface, e.g. textureless objects (Plant dataset). The results are also useful as an input to MVS algorithms, either to increase accuracy or decrease computation time. Our approach combines multiple cues (object/background appearance models, epipolar constraints and weak stereo correspondence) into a single energy-based framework that may be solved efficiently. We achieve superior performance to previous publications as well as MVS methods when observing a textureless object.
CVMP11_Automatic-Object-Segmentation-from-calibrated-images.pdf (Adobe PDF - 1.08Mb)
13.Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production - Rene Kaiser, Marcus Thaler, Andreas Kriechbaum, Hannes Fassold, Werner Bailer (Joanneum Research), Jakub Rosner (Silesian University of Technology)
For enabling immersive user experiences for interactive TV services and automating camera view selection and framing, knowledge of the location of persons in a scene is essential. We describe an architecture for detecting and tracking persons in high-resolution panoramic video streams, obtained from the OmniCam, a panoramic camera stitching video streams from 6 HD resolution tiles. The AV content analysis uses a CUDA accelerated feature point tracker, a blob detector, a CUDA HOG person detector, which are used for region tracking in each of the tiles before fusing the results for the entire panorama. In this paper we focus on the application of the HOG person detector in real-time and the speedup of the feature point tracker by porting it to NVIDIA's Fermi architecture. Evaluations indicate significant speedup for our feature point tracker implementation, enabling the entire process in a real-time system.
CVMP11_Realtime-person-tracking-in-HiRes-Panoramic-video.pdf (Adobe PDF - 1.45Mb)