 RESEARCH PAPER
 Open Access
Temporally coherent disparity maps using CRFs with fast 4D filtering
 Siavash Arjomand Bigdeli^{1}Email author,
 Gregor Budweiser^{2} and
 Matthias Zwicker^{1}
https://doi.org/10.1186/s4107401600112
© The Author(s) 2016
 Received: 15 April 2016
 Accepted: 23 November 2016
 Published: 9 December 2016
Abstract
Stateoftheart methods for disparity estimation achieve good results for single stereo frames, but temporal coherence in stereo videos is often neglected. In this paper, we present a method to compute temporally coherent disparity maps. We define an energy over whole stereo sequences and optimize their conditional random field (CRF) distributions using the meanfield approximation. In addition, we introduce novel terms for smoothness and consistency between the left and right views. We perform CRF optimization by fast, iterative spatiotemporal filtering with linear complexity in the total number of pixels. We propose two CRF optimization techniques, using parallel and sequential updates, and compare them in detail. While parallel updates are not guaranteed to converge, we show that, in practice with appropriate initialization, they provide the same quality as sequential updates and they also lead to faster implementations. Finally, we demonstrate that the results of our approach rank among the state of the art while having significantly less flickering artifacts in stereo sequences.
Keywords
 Disparity map estimation
 Temporal smoothness
 Conditional random fields
1 Introduction
While some disparity estimation methods leverage information over several frames of stereo video sequences, most do not attempt to produce temporally coherent disparity maps. In applications like video production for 3D displays, however, temporally coherent disparity maps are crucial. While human observers are more forgiving about incorrect disparities, they easily notice flickering artifacts due to temporally incoherent disparity maps.
We address these challenges by proposing a technique that produces temporally coherent disparity maps over stereo videos. We formulate an energy minimization problem consisting of unary, smoothness, and consistency terms, which we solve using the meanfield approximation of a densely connected conditional random field (CRF). We propose two efficient filtering techniques to solve the meanfield approximation, using parallel and sequential updates. Both have linear complexity in terms of the number of pixels in the input. Parallel updates allow us to process all pixels in a stereo sequence independently, enabling fast GPU implementations. In contrast to sequential updates, parallel updates are not guaranteed to converge. We provide a detailed comparison between both techniques and show that, with proper initialization, parallel updates obtain the same quality of results. Hence, they are preferable in practice.
In summary, our contributions are (1) a new smoothness term that leverages both the left and right images to distinguish between image edges due to disparity discontinuities, and edges due to surface texture; (2) a novel consistency term to obtain a joint leftandright disparity estimation problem; (3) a temporal smoothness term to achieve temporally coherent disparity maps over stereo video sequences; (4) a comparison of efficient CRF optimization techniques based on parallel and sequential updates.
The rest of this paper is organized as follows: after discussing previous work in Section 2, we introduce our energy formulation that includes a novel consistency term and the temporal extension in Section 3. Next, in Section 4, we discuss energy minimization via the meanfield approximation and using an iterative algorithm with parallel updates. Parallel updates are not guaranteed to converge, however, and we develop an efficient sequential approach in Section 5 that does not suffer from this problem. Finally, we evaluate our approach using standard datasets in Section 6.
This paper is based on a conference publication [1]. Here, we describe the method in more detail and provide further analysis of the CRF inference scheme. We also develop a novel, efficient sequential approach that guarantees convergence unlike the previous parallel approaches. We evaluate the parallel and sequential techniques and conclude that, in practice with appropriate initialization, parallel updates lead to equivalent results but can be implemented more efficiently.
2 Related work
Disparity estimation is commonly defined as a discrete labeling problem. Aggregationbased methods [22] share the cost of each assignment with neighboring pixels to reduce noise. They are efficient but unable to reason about more complex assignment configurations. Optimizationbased methods try to find the best assignment of disparities by minimizing an energy function. Semiglobal matching (SGM) [12] is a fast and effective approach that enforces local smoothness over many directional scanlines using dynamic programming. Methods such as wSGM [24] and iSGM [11] modified the original SGM to improve performance. žbontar and Yann [32] used convolutional neural networks to define a new unary term for SGM that leads to significant quality improvements but incurs a high computational cost. While SGM is able to find a semiglobal establishment of disparity labels, it is unable to capture the local structure due to the simple energy function.
On the other hand, filterbased meanfield approximation [17] supports very fast optimization over a fully connected CRF. Yu and Gallup [31] used this approach to obtain disparity maps. Vineet et al. [26] further extend the optimization to include higher order terms that incorporate information about objects to be used in the disparity estimation problem. Many methods use a multiscale approach to increase robustness to local minima [33]. Zhang et al. [33] aggregate the cost between different scales such that the assignment is consistent in all scales. Vineet et al. [25] run the optimization on coarser scales to initialize finer ones. We use the SGM method to initialize our CRFbased optimization, which further incorporates other complex terms.
Some methods use several stereo frames and attempt to ensure temporal coherence. Slanted plane StereoFlow [30] uses two consecutive frames to improve results. The method computes an initial disparity map using SGM and then jointly optimizes for planar surfaces and local segments. This approach is tailored for applications such as autonomous vehicles with an egomotion assumption. Vogel et al. [28] use consistency factors between the views that are defined as a data term in their optimization. Using a piecewise rigid model, their method includes consistencies in the temporal dimension that incorporates neighboring views. Unlike these methods, we do not enforce segmentation nor local planarity on our disparity maps. In addition, our method has linear complexity with respect to the number of frames, which allows us to compute the disparity maps of the whole sequence in a single optimization.
Disparity flicker artifacts have been previously addressed [21, 23]. Richardt et al. [23] assumed that the pixel’s disparity persist in time and aggregated the costs between temporally consecutive pixels. Min et al. [21] filtered noisy disparity maps between different frames. Similar to their work, we use a precomputed flow field and enforce temporal coherence along its vectors. In addition to endtoend disparity error, we propose a quantitative measure to better evaluate the flicker artifacts in disparity sequences and compare with previous works.
3 Energy terms
In this section, we describe our energy terms that characterize the spatiotemporal disparity estimation problem. We assume that the stereo inputs are rectified such that the disparity is only in the horizontal direction, but our method is not limited to this setup. We define random variables \({x^{L}_{i}}\) for the disparity values of pixels i, where i determines the spatial location of the pixel, in the disparity field X ^{ L } of the left image, and similarly \({x_{i}^{R}}\) in X ^{ R } for the right image. Our joint energy function over X ^{ L } and X ^{ R } includes unary (per pixel), smoothness, and consistency terms. We omit the left and right superscripts unless necessary.
3.1 Unary term
where \({\phi _{u}^{L}}(x_{i} = d)\) is the unary cost of assigning disparity d to pixel i in the left image, S ^{ L } and S ^{ R } denote the response to the horizontal Sobel operator, H is the Hamming distance of the centersymmetric census transforms T ^{ L } and T ^{ R } introduced by Spangenberg et al. [24], and \(\lambda _{\text {cen}} = \frac {1}{3}\) is a constant that controls the relative weight of the two terms. The cost for pixel i is averaged over its 8connected neighbors j∈N(i). We compute the census transform in a 7×7 window on the blurred image using a 3×3 box filter. This will increase robustness against artifacts such as noise and aliasing. The census transform is a feature that represents the local arrangement of pixels in a neighborhood robust to brightness changes and noise by capturing if the brightness of a pixel is larger than the center pixel of that neighborhood. Since this transformation looses some textural information, adding the edge difference measure helps to better identify the matching pixels in the other view.
3.2 Disparitydependent smoothness term
where σ _{ r },σ _{ s }, and σ _{ d } control the kernel support for the three length terms. Applying a Gaussian weight to the sum of the three distances ensures that W ^{ L }(P) decreases when the two pixels are separated by a large distance and it increases when they are close. Because we sum the negative weights W ^{ L }(P) over all paths, the smoothness energy (cost) decreases by the weight of each path and each short path further reduces the energy. In contrast, Hosni et al. [13] used only the path with the minimum distance. A single path, however, is more sensitive to noise. Summing up the weights from all paths not only includes the weight from the shortest path but also increases robustness to noise. Additionally, including all paths favors arrangements where assignments are connected by many long paths in contrast to assignments with few short paths. This choice of weight will later allow us to efficiently compute the smoothness energy.
This distance will be small if the pixel colors along the path have correspondences in the other image under their disparities, even if the image itself has large color dissimilarities along that path.
3.3 Higher order local consistency term
3.4 Temporal extension
4 Energy minimization
Here, we describe our fast spatiotemporal energy minimization based on the meanfield approximation and using parallel updates of the mean field. In addition, we discuss initialization and post processing, followed by a description of our GPU implementation.
4.1 Meanfield approximation
Here, the sum over k∈{l−1,l,l+1} corresponds to the compatibility function ν in Section 3.3. Although the consistency term ϕ _{ c } is defined over three independent random variables, the expected value here is conditioned on the assignment of disparity d to pixel i; hence, the conditional expected energy only depends on the probabilities of the two remaining variables \({Q_{j}^{L}}\) and \(Q_{j+l}^{R}\).
4.2 Filterbased parallel update iteration
Algorithm 1 minimizes our energy by iteratively updating the meanfield distributions by computing Eq. 3. The first iteration of the algorithm updates the disparity distribution of the left image (Q ^{ L }). In subsequent iterations, we switch between updating the disparity maps of the left and right images (line 5) to avoid oscillations between them. The notation implies that the operations are applied to all variables i and values d in parallel. The first two lines in the loop compute the expected values (Eqs. 4 and 5) and the summation over all pixels j in Eq. 3. First (line 1), we compute intermediate values \(\tilde {Q}_{i}\) that store the contributions that each pixel will make to the conditional expected energies of the smoothness and consistency terms of all other pixels. Next (line 2), at each pixel, we simultaneously compute the expected values (summation over l) and accumulate the contributions from all the other pixels (summation over j) using a single, fast filtering operation over the intermediate values \(\tilde {Q}_{i}\). We provide some more details about the filter implementation below. A single filtering step is possible since we have the same weights W defined in ϕ _{ s } and ϕ _{ c }. In line 3, the disparity potential is computed by adding the unary term, exponentiating, and normalizing to a distribution in line 4, which completes computation of Eq. 3. Finally, the iteration ends by switching the target distribution (line 5).
A key element of our algorithm is that we compute the path weights W efficiently using the domain transform filter [7], which allows us to evaluate each filtering operation (line 2 of Algorithm 1) in constant time. We use interpolated convolution by iteratively applying a moving sum (box filter) in the transformed domain. The joint image and disparity space leads to 3D filtering, and our temporal extension to 4D filtering over two spatial, the temporal, and the disparity dimensions. In the temporal dimension, we filter along the precomputed flow vectors similar as Lang et al. [18]. We obtained our best results by iterating over passes along spatiotemporal directions and filter in the disparity domain at the end. We refer to the original publication [7] for more details about the domain transform filter.
4.3 Initialization
For initializing Algorithm 1, we leverage semiglobal matching (SGM) [12] with penalties P _{1}=4,P _{2}=64 in four directions. Instead of the MAP results of SGM, we rather use the obtained (minmarginal) energies to initialize our distribution Q _{ i }(d). For a better initialization, we run the first two iterations of the optimization using a large kernel support (σ _{ s }=7,σ _{ r }=100,σ _{ d }=2).
4.4 Final disparity map
We compute final disparities by finding the one with the minimum energy − log(Q _{ i }(d)) from Algorithm 1. For accuracy below the level of the disparity discretization, we fit a quadratic to the three disparity costs centered at the minimum. We remove spikes by applying a 5×5 median filter. We fill occluded regions by checking for leftright consistency to find pixels with disparity differences higher than a threshold and replacing disparities marked as occluded with the last nonoccluded disparity in the left direction for the left view (similarly for the right view).
4.5 Implementation
The CPU version of the proposed pipeline supports 256 or more disparity hypotheses. We also implemented a GPU version for the whole pipeline that takes advantage of parallelism in the optimization at the pixel level. We ran our experiments on an Nvidia Titan Black graphics card with 6GB memory on board. We allocate memory for a batch of left and right images, including the disparity hypothesis layers requiring 2×Width×Height×Frames×Disparities floating point values. Because of the limited GPU memory, we are currently restricted to batches of 14 frames at a resolution of 960×540 and 32 disparity layers. Note that we evaluate the unary term at a finer discretization of disparity steps, typically at one pixel steps. We then store the minimum for each of the 32 layers. At the end of the optimization, the disparity is computed and finalized as described above, and by fitting the quadratic to the 32 layers, we achieve finer levels of disparity. After the disparities of a batch of frames are computed, we move forward by seven frames and compute the disparities for the next batch. We finally interpolate the disparity values of the overlapping frames in consecutive batches for smoother transitions.
5 Convergence analysis
Our proposed Algorithm 1 in Section 3 and other filterbased meanfield approximation methods [17, 26] update the random variables in the meanfield in parallel. While parallel updates lead to very fast implementations, they are not guaranteed to converge at all. The goal of this section is to answer two questions: First, how good are results obtained using parallel updates of the meanfield compared to sequential updates, which are guaranteed to converge to a fixed point? Second, how well can the meanfield approximate our energy functional compared to methods that do not make the same assumption? To answer the first question, we develop an efficient method that applies meanfield inference with guaranteed convergence using sequential updates and we compare its results with the parallel implementation’s. Second, we compare our approach with the minimized energy of Graph Cuts [2], which does not rely on the meanfield approximation.
Keep in mind, however, that this is only for explanatory purposes. We also implemented the sequential approach for the same energy and update equations as in Section 3 for our evaluation. The main challenge is now to compute the summation over all variables j in Eq. 6 efficiently but sequentially over the pixels i. This is what we focus on next.
5.1 Sequential updates for meanfield approximation
To optimize the meanfield approximation, each variable update needs to reduce the relative entropy (KLdivergence) between the estimated and the true distribution [15]. In the parallel scheme, while each variable tries to reduce its dependent energy in each update, all other variables change their distribution too, which invalidates the update in each variable. This could lead to oscillations in the distribution as well as being more prone to local minima in the energy functional.
5.1.1 Leveraging constant time filtering

The collection pass (Fig. 4 a) traverses the pixels in the inverse order of the update sequence (compare to Fig. 3). At each pixel, it collects the contributions from all variables that come later in the update sequence and stores them in a temporary buffer, shown in red. The key point is that we compute each step (each new red pixel) in this pass in constant time using the technique by Gastal and Oliveira [8], instead of linear time as illustrated in Fig. 4 a.

The update pass (Fig. 4 b) traverses the pixels in the update sequence (as in Fig. 3). In each step, it accumulates the contributions to the current pixel from all previous pixels that have already been updated (green), again in constant time. In addition, we add the contribution from all pixels that has not been updated to the current pixel (that is, the value of the corresponding red pixel from Fig. 4 a) to complete the update of the current pixel.
We first give a brief explanation of the constant time filtering process for accumulating the contributions to the expected energy and then show how the filter is employed in our twopass algorithm. Gastal and Oliveira [8] showed that processing signals with infinite impulse response (IIR) filters can be performed using a summation of firstorder recursive operations. In other words, a Kth order IIR filter that needs K feedback operations per pixel can be replaced with a summation of K firstorder filters that need one feedback operation per pixel. For a twodimensional signal f, two orthogonal 1D filters G in the horizontal direction and H in the vertical direction are used such that H∗G∗f corresponds to a 2D filtering of signal f.
which is the convolution of the two vertical and horizontal filters h and g. The reader is referred to Gastal and Oliveira [8] for more details about the filtering operations.
The crucial insight from Eqs. 12 and 13 is that the 2D filtered output signal at pixel (y,x) is expressed as a sum of two contributions, \(C^{}_{f,r,s}(y,x)\) and \(C^{}_{f,r,s}(y,x)\), which represent the contributions from all pixels before (y,x) and all pixels after (y,x) in the update sequence. We compute \(C^{}_{f,r,s}(y,x)\) in the collection pass, and \(C^{}_{f,r,s}(y,x)\) in the update pass (Fig. 4 a, b). Note that the smoothness term between a pixel and itself is zero; hence, the expected smoothness energy for a variable is a sum over all other variables. Therefore, the middle term in Eq. 12 is zero.
All values in Eq. 12 can be computed with O(K) operations; therefore, the complexity to compute the expected energy is constant in the number of pixels and linear in the order K of the kernel function. Using this scheme, a Gaussian filter can be approximated perfectly (MSE<2.5×10^{−8}) by using two recursive filters, that is K=2.
5.1.2 Efficient sequential update algorithm
Algorithm 2 shows the proposed sequential iteration of the meanfield approximation in a 2D fully connected grid with distribution Q using the update sequence from bottom right to top left (Fig. 4 b). First, the collection pass operates in reverse order (top left to bottom right) to compute and store the contributions to the expected energy from pixels in the sequence that have not been updated (Fig. 4 a).
Second, in the update pass, we now proceed in the update sequence order as in Fig. 4 b, with analogous computations to the previous pass. Here, we update the buffer \(\hat {Q}\) by adding the contributions to the expected energies from the green (previously updated) half of the variables (line 7).
Note that in our algorithm we perform the update steps described so far for all hypotheses separately, but we omitted this in the notation for simplicity. To obtain the final expected energy of a pixel, we now need to perform the summation over all hypotheses (Eq. 4) in an inner loop (line 8). We also take into account the unary term here. The compatibility function of the hypotheses \(w(d,l)=\exp (dl^{2}/{\sigma _{d}^{2}})\) corresponds to the third factor in Eq. 2. We then use the expected energy to update the distribution (line 9).
The proposed sequential update does not change the linear complexity of the algorithm in the number of pixels, however, it includes additional complex exponentials and multiplications for the IIR filtering (O(N M K) for N pixels, M hypothesis, and Kth order smoothness kernel). Although the sequential iteration is guaranteed to converge and minimize the KLdivergence, its result is biased with respect to the chosen update sequence due to the nature of the meanfield approximation (i.e., the result depends on the order in which variables are updated). To reduce this bias, in each iteration, we estimate the distribution over four sequences (toptobottom, bottomtoup, lefttoright, and righttoleft) and update with the mixture of these distributions. Methods such as Jaakkola and Jordan [14] use the KLdivergence to optimally mix meanfield distributions; however, we found that simply averaging them is enough in our case.
5.2 Convergence results
Without initialization, we observe that parallel updates (blue) converge to a higher energy and KLdivergence than the sequential approach (green). This confirms that sequential updates are more robust to local minima in the energy functional compared to the parallel approach. Initializing the distribution before parallel updates (red) using SGM (Section 4.3) leads to convergence to a lower energy and KLdivergence, closing the gap to the sequential approach. This is because SGM (as the first iteration of treereweighted message passing [6]) can find the global establishment of the variables to some extent. After the initialization, the parallel updates can refine the local configuration of the variables more independently. Note that at the beginning, the initialization increases the energy and KLdivergences sharply, because it tries to minimize a much simpler energy functional that does not necessarily have the same solution as our desired energy.
It is interesting to see that SGMinitialized parallel updates perform better than the sequential approach in terms of the KLdivergence (Fig. 6 (right)). This could be explained by the fact that, in contrast to the sequential approach, parallel updates do not suffer from directional bias. In practice, parallel updates can be implemented much more efficiently, for example, using GPU devices, since operations can be done for each pixel separately. Therefore, they are more attractive in practice. In the absence of a good initialization, however, the sequential update can be expected to obtain better results.
6 Results and conclusions
As seen above, initialized parallel updates lead to best results in practice. Hence, in this section, we are reporting results and evaluations of this technique as described in Section 3 in more detail.
6.1 KITTI stereo evaluation
Performance of the each step of the proposed method
Included terms  %>3px  Time (s) 

ϕ _{ u }  22.30  16 
ϕ _{ u },ϕ _{ s }  6.88  25 
E _{SGM}  4.52  35 
(Init.) ϕ _{ u },ϕ _{ s }  4.02  60 
(Init.) ϕ _{ u },ϕ _{ s },ϕ _{ c }  3.67  60 
6.2 Stereo sequences
To measure the temporal coherence, we compared the flicker index (IESNA standard [4]) of the final disparity maps. This index is computed in a temporal window of five frames as the ratio of the timeaveraged disparities and the disparities above that average, which indicates how much disparities deviate from their average value in a temporal window.
6.3 Conclusions
We have presented a robust method to compute disparity maps of stereo sequences in a single optimization. The optimization is solved efficiently using 4D filtering in pixeldisparity space. The proposed method ranks among the state of the art in challenging tests (KITTI) and produces less flicker artifacts in stereo videos.
We have developed a new and efficient filterbased optimization algorithm that performs sequential variable update in the meanfield approximation. This algorithm guarantees convergence along with a decrease of the KLdivergence in each iteration that is not available in previous filterbased meanfield approximation methods with parallel variable updates. In addition, our experiments showed that the new algorithm can perform well in comparison to Graph Cuts, a very wellestablished optimization method. We showed that with an intuitive initialization, the parallel scheme can perform as well as the sequential method. However, the right initialization might not be available all the time, in which case, the proposed sequential algorithm can be used instead.
7 Endnote
^{1} http://www.cgg.unibe.ch/publications/temporallyconsistentdisparitymaps
Declarations
Acknowledgements
This research was supported by the Swiss Commission for Technology and Innovation (CTI) under project nr. 15592.1 PFESES.
Authors’ contributions
SAB performed the primary development and analysis for this work and the initial drafting of the manuscript. GB participated in the design and implementation of the proposed GPU pipeline. MZ coordinated SAB and GB to complete this work and played an essential role in editing the paper. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Bigdeli SA, Budweiser G, Zwicker M (2015) Temporally coherent disparity maps using CRFs with fast 4D filtering In: Pattern Recognition (ACPR), 2015 3rd IAPR Asian Conference on,301–305.. IEEE.Google Scholar
 Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. Pattern Anal Mach Intell IEEE Trans 23(11): 1222–1239.View ArticleGoogle Scholar
 Chakrabarti A, Xiong y, Gortler SJ, Zickler T (2015) Lowlevel vision by consensus in a spatial hierarchy of regions In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4009–4017.. IEEE.Google Scholar
 DiLaura D, Houser K, Mistrick R, Steffy G (2000) The IESNA lighting handbook: reference & application,. 10edition. Illuminating Engineering Society of North America, New York.Google Scholar
 Donatsch D, Bigdeli SA, Robert P, Zwicker M (2014) Handheld 3D light field photography and applications. Visual Comput 30(68): 897–907.View ArticleGoogle Scholar
 Drorym A, Haubold C, Avidan S, Hamprecht FA (2014) Semiglobal matching: a principled derivation in terms of message passing In: German Conference on Pattern Recognition, 43–53.. Springer.Google Scholar
 Gastal ESL, Oliveira MM (2011) Domain transform for edgeaware image and video processing In: ACM Transactions on Graphics (TOG), 69.. ACM.Google Scholar
 Gastal ESL, Oliveira MM (2015) HighOrder Recursive Filtering of NonUniformly Sampled Signals for Image and Video Processing In: Computer Graphics Forum, 81–93.. Wiley Online Library.Google Scholar
 Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 3354–3361.. IEEE.Google Scholar
 Güney F, Geiger A (2015) Displets: Resolving stereo ambiguities using object knowledge In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4165–4175.Google Scholar
 Hermann S, Klette R (2012) Iterative semiglobal matching for robust driver assistance systems In: Asian Conference on Computer Vision, 465–478.. Springer.Google Scholar
 Hirschmuller H (2008) Stereo processing by semiglobal matching and mutual information. IEEE Trans. PAMI 30(2): 328–341.View ArticleGoogle Scholar
 Hosni A, Bleyer M, Gelautz M, Rhemann C (2009) Local stereo matching using geodesic support weights In: 2009 16th IEEE International Conference on Image Processing (ICIP), 2093–2096.. IEEE.Google Scholar
 Jaakkola TS, Jordan MI (1998) Improving the mean field approximation via the use of mixture distributions. Learning in graphical models: 163–173. Springer.Google Scholar
 Koller D, Friedman N (2009) Probabilistic Graphical Models: Principles and Techniques (Adaptive Computation and Machine Learning series), MIT press.Google Scholar
 Kolmogorov V (2006) Convergent treereweighted message passing for energy minimization. Pattern Anal Mach Intell IEEE Trans 28(10): 1568–1583.View ArticleGoogle Scholar
 Krähenbühl P, Koltun V (2011) Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials In: Proc. NIPS, 109–117. http://papers.nips.cc/paper/4296efficientinferenceinfullyconnectedcrfswithgaussianedgepotentials.pdf.
 Lang M, Wang O, Aydin T, Smolic A, Gross MH (2012) Practical temporal consistency for imagebased graphics applications. ACM Trans Graph 31(4): 34.View ArticleGoogle Scholar
 Mei X, Sun X, Zhou M, Jiao S, Wang H, Zhang X (2001) On building an accurate stereo matching system on graphics hardware In: Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, 467–474.. IEEE.Google Scholar
 Menze M, Geiger A (2015) Object scene flow for autonomous vehicles In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3061–3070.Google Scholar
 Min D, Lu J, Do MN (2012) Depth video enhancement based on weighted mode filtering. IEEE Trans Imag Proc 21(3): 1176–1190.Google Scholar
 Rhemann C, Hosni A, Bleyer M, Rother C, Gelautz M (2011) Fast costvolume filtering for visual correspondence and beyond In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, 3017–3024.. IEEE.Google Scholar
 Richardt C, Orr D, Davies I, Criminisi A, Dodgson NA (2010) Realtime spatiotemporal stereo matching using the dualcrossbilateral grid In: European Conference on Computer Vision, 510–523.. Springer.Google Scholar
 Spangenberg R, Langner T, Rojas R (2013) Weighted semiglobal matching and centersymmetric census transform for robust driver assistance In: International Conference on Computer Analysis of Images and Patterns, 34–41.. Springer.Google Scholar
 Vineet V, Warrell J, Sturgess P, Torr P (2012) Improved Initialization and Gaussian Mixture Pairwise Terms for Dense Random Fields with Meanfield Inference. In: Bowden R, Collomosse J, Mikolajczyk K (eds)Proceedings of the British Machine Vision Conference, 73.1–73.11.. BMVA Press. doi:http://dx.doi.org/10.5244/C.26.73.
 Vineet V, Warrell J, Torr PHS (2014) Filterbased meanfield inference for random fields with higherorder terms and product labelspaces. Int J Comput Vis 110(3): 290–307.MathSciNetView ArticleMATHGoogle Scholar
 Vogel C, Roth S, Schindler K (2014) Viewconsistent 3D scene flow estimation over multiple frames In: European Conference on Computer Vision, 263–278.. Springer, ECCV.Google Scholar
 Vogel C, Schindler K, Roth S (2015) 3D scene flow estimation with a piecewise rigid scene model. Int J Comput Vis 115(1): 1–28. Springer.MathSciNetView ArticleGoogle Scholar
 Yamaguchi K, McAllester D, Urtasun R (2013) Robust monocular epipolar flow estimation In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1862–1869.Google Scholar
 Yamaguchi K, McAllester D, Urtasun R (2014) Efficient joint segmentation, occlusion labeling, stereo and flow estimation In: European Conference on Computer Vision, 756–771.. Springer.Google Scholar
 Yu F, Gallup D (2014) 3d reconstruction from accidental motion In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3986–3993.. IEEE.Google Scholar
 Zbontar J, LeCun Y (2015) Computing the stereo matching cost with a convolutional neural network In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1592–1599.Google Scholar
 Zhang K, Fang Y, Min D, Sun L, Yang S, Yan S, Tian Q (2014) Crossscale cost aggregation for stereo matching In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1590–1597.Google Scholar
 Zhang K, Lu J, Lafruit G (2009) Crossbased local stereo matching using orthogonal integral images. Circ Syst Video Technol IEEE Trans 19(7): 1073–1079.View ArticleGoogle Scholar