Skip to main content

Phase disambiguation using spatio-temporally modulated illumination in depth sensing


Phase ambiguity is a major problem in the depth measurement in either time-of-flight or phase shifting. Resolving the ambiguity using a low frequency pattern sacrifices the depth resolution, and using multiple frequencies requires a number of observations. In this paper, we propose a phase disambiguation method that combines temporal and spatial modulation so that the high depth resolution is preserved while the number of observation is kept. A key observation is that the phase ambiguities of temporal and spatial domains appear differently with respect to the depth. Using this difference, the phase can disambiguate for a wider range of interest. We develop a prototype to show the effectiveness of our method through real-world experiments.

1 Introduction

Depth measurement is widely used in applications such as augmented reality, factory automation, robotics, and autonomous driving. In the computer vision field, there are two well-known techniques for measuring scene depth using active illumination. One is the time-of-flight camera, which uses temporally modulated illumination to measure the travel time of light; the other is the phase shifting, which uses temporally modulated illumination to find the correspondence between the projector and the camera for triangulation.

A common problem is how to resolve the periodic ambiguity of the phase because either measurement gives the phase that is defined between 0 to 2π. Typical solution is to use multiple frequencies to resolve the phase ambiguity. However, the phase ambiguity still exists in the frequency of the greatest common divisor, which requires several measurements to obtain a wider range of interest. Another possible approach is to use a low frequency that sacrifices the depth resolution. The aim of this study is to resolve the phase ambiguity in fewer observations, where both the wider range of interest and the better resolution of the depth are guaranteed.

A key observation of this paper is that the phase ambiguities of the time-of-flight (ToF) and the phase shifting appear differently on the depth domain. Since the temporal phase is proportional to the depth, the depth candidates from the phase appear at equal intervals along with the depth. On the other hand, the spatial phase is defined as the disparity domain; hence, the depth candidates appear at gradually increasing intervals. Based on this difference, the phase ambiguity can be resolved by combining temporal and spatial modulation. Because the candidate depth that satisfies both measured phases seldom appears, the number of phase can be reduced to one for each frequency. In this paper, we discuss ordinary ToF and phase shifting in the same framework. We show that precise depth can be measured in a wide range by combining temporal and spatial modulation. We also reveal the resolution and the range of interest theoretically, analyze the recoverability, and build a prototype to show the effectiveness of our method via real-world experiments.

This paper extends its preliminary version [1] with the following differences. Extensions have been made to (1) reveal the depth resolution and the range of interest of our proposed method, (2) develop an efficient implementation, and (3) confirm that the unrecoverable depth due to ambiguity seldom exists by simulation.

The rest of the paper is organized as follows. The related work is discussed in Section 2, a brief review of the ordinary time-of-flight and phase shifting algorithms are provided in Section 3, a spatio-temporal modulation technique is proposed in Section 4, the resolution and range of interest of our method is analyzed in Section 5, experiments with a prototype system is shown in Section 6, and we conclude with some discussions in Section 7.

2 Related work

Active depth measurements have been widely studied in the computer vision field. Earlier work used a projector-camera system to convert the projector’s pixel index into multiple projection images based on the gray code [2]. The phase shifting approach [3] recovers subpixel correspondences by detecting the phase of the sinusoid. Gupta and Nayer [4] unwrapped the phase from slightly different frequencies so that it became robust to indirect light transport with a small budget of projection numbers. Mirdehghan et al. [5] proposed an optimal code for the structured light technique. The time-of-flight method is another way to measure depth. It emits amplitude modulated light, and a delayed signal is detected that corresponds to the scene depth [6]. Because the range of interest and the depth resolution are tradeoffs, a better resolution is obtained by limiting the range of interest [7]. We combine these techniques to realize both better resolution and wider range of interest.

Another problem regarding the ToF is multi-path interference due to indirect light transport. Recovering the correct depth of multi-path scenes has been broadly studied using a parametric model [8, 9], K-sparsity [10, 11], frequency analysis [12], and data-driven approaches [1315]. Because the scene depth can be recovered by the first-returning photon, the depth can be obtained after recovering light-in-flight imaging [1621]. Multi-path interference is mitigated by combining ToF and projector. Naik et al. [22] combined the ToF camera and a projector-camera system to mitigate a multi-path that uses direct-global separation [23]. Similar ideas are implemented with the ToF projectors that can modulate both spatially and temporally [24, 25]. In both cases, direct-global separation is utilized to mitigate multi-path interference. We also use a similar system for phase disambiguation not only for mitigating multi-path.

To obtain fine resolution, Gupta et al. [26] proposes the optimal code for ToF modulation. Gutierrez-Barragan et al. [27] proposes an optimization approach for designing practical coding functions under hardware constraints. Kadambi et al. [28] uses the polarization cue to recover the smooth surface. Our method is more fundamental layer; hence, these techniques can be incorporated with our method to boost the resolution. An interferometer can also obtain micrometer resolution of a small size object. Interferometry gives micrometer resolution [29] in a carefully controlled environment. Li et al. [30] recover micro-resolution ToF using the superheterodyne technique. Maeda et al. [31] leverages the heterodyne technique to the polarization imaging to obtain the accurate depth.

Phase unwrapping is a subproblem in the depth measurement. The phase has to be unwrapped with either the phase shifting or the ToF; otherwise, the estimated depth have 2π ambiguity. The number of observations can be reduced by sacrificing the spatial resolution. The projector’s coordinates can be obtained from a single image using a color code [32], a wave grid pattern [33], and a light-field ToF [34]. Our method falls into this class but does not sacrifice the spatial resolution nor require many patterns. Our method leverages the asymmetric relations of spatial and temporal wrapping to solve the ambiguity of the phase.

3 Depth measurement techniques using modulated illumination

Before explaining our method, we briefly review the ToF and phase shifting methods. We respectively explain them as the phase measurements using temporally or spatially modulated light.

3.1 Temporal modulation (time-of-flight)

The ToF camera emits the temporally modulated light as shown in Fig. 1a. It measures the amplitude decay and phase delay of the modulated light, and the phase delay corresponds to the time it takes for the light to make a round trip.

Fig. 1
figure 1

Modulation variations. a ToF modulates the light temporally. b Phase shifting modulates the light spatially. c Our method combines temporal and spatial modulations at the same time to mitigate the phase ambiguity problem while preserving the depth resolution

The ToF camera measures the correlation between the signals emitted and those received. For each frequency, the phase delay is calculated from the correlations with NT reference signals, which are temporally shifted. For the k-th signal, the correlation ik(x) at the camera pixel x is represented as

$$\begin{array}{*{20}l} {i}_{k}({x}) &= g\left({t} + \frac{2\pi k}{N_{T}}\right) * s({x}, {t}) \end{array} $$
$$\begin{array}{*{20}l} &= \frac{{A}({x})}{2} \cos{\left({{\phi}_{T}}({x}) + \frac{2\pi k}{N_{T}}\right)} + {O}({x}), \end{array} $$

where \(g\left ({t} + \frac {2\pi k}{N_{T}}\right)\) is the reference signal with the shifted phase 2πk/NT, s is the returned signal, the operator represents the correlation, A is the amplitude decay, ϕT is the phase delay, and O is the ambient light. In the case of NT=4, the phase ϕT and the amplitude A of the returned signal can be recovered by a direct conversion method from multiple observations while changing the phase \(\frac {2\pi k}{N_{T}}\) as

$$\begin{array}{*{20}l} {{\phi}_{T}}({x}) &= \arctan{\left(\frac{{i}_{3}({x}) - {i}_{1}({x})}{{i}_{0}({x}) - {i}_{2}({x})} \right)}, \end{array} $$
$$\begin{array}{*{20}l} {A}({x}) &= \sqrt{({i}_{3}({x}) - {i}_{1}({x}))^{2} + \left({i}_{0}({x}) - {i}_{2}({x}) \right)^{2}}. \end{array} $$

The depth d is obtained as

$$\begin{array}{*{20}l} {d}({x}) = \frac{c}{2 {\omega_{T}}}{{\phi}_{T}}({x}), \end{array} $$

where ωT is the modulation frequency and c is the speed of light.

3.2 Spatial modulation (phase shifting)

The phase shifting spatially modulates the projection pattern. Finding the correspondences between the projector and camera pixels is the main part of the spatial phase shifting. The idea is to project the sinusoidal pattern as shown in Fig. 1b and measure the phase of the sinusoid for each pixel, which corresponds to the projector’s pixel coordinates.

The observed intensity of the camera Il(x) for l-th shift is represented a

$$\begin{array}{*{20}l} {I}_{l}({x}) = {A}({x})\cos{\left({{\phi}_{S}}({x}) - \frac{2\pi l}{N_{S}} \right)} + {O}({x}), \end{array} $$

where ϕS is the spatial phase of the projection pattern due to disparity. There are three unknown parameters, which are the offset O, the amplitude A(x), and the phase ϕS(x); therefore, they can be recovered from NS≥3 observations while changing the phase of the pattern. In the case of NS=4, the spatial phase ϕS and the amplitude A can be recovered in the same way as the ToF as

$$\begin{array}{*{20}l} {{\phi}_{S}}({x}) &= \arctan{\left(\frac{{I}_{3}({x}) - {I}_{1}({x})}{{I}_{0}({x}) - {I}_{2}({x})}\right)}, \end{array} $$
$$\begin{array}{*{20}l} {A}({x}) &= \sqrt{({I}_{3}({x}) - {I}_{1}({x}))^{2} + \left({I}_{0}({x}) - {I}_{2}({x}) \right)^{2}}. \end{array} $$

From the estimated disparity, the scene depth can be recovered using the triangulation theory. For example, when the parallel stereo is assumed, the depth is inversely proportional to the disparity as

$$\begin{array}{*{20}l} {d}({x}) = \frac{{b}{f}}{{x} - \frac{{{\phi}_{S}}({x})}{\omega_{S}}} \end{array} $$

where \({x} - \frac {{\phi }_{S}}{{\omega _{S}}({x})}\) is the disparity, ωS is the spatial angular frequency of the projection pattern, f is the focal length, and b is the baseline of the pro-cam system. Here, x represents the horizontal pixel position.

3.3 Phase ambiguity and depth resolution

A common problem in both temporal and spatial methods is 2π ambiguity, where the phase is wrapped when the depth exceeds the maximum depth of interest. A naive approach is using a low frequency to avoid the phase ambiguity. However, a tradeoff exists between the range of interest and the depth resolution. While the phase ambiguity does not appear at a lower frequency, the depth resolution becomes low as shown in Fig. 2a. With a higher frequency, the depth resolution improves while the phase ambiguity becomes significant, and the depth cannot be uniquely recovered for a wide range of interest as shown in Fig. 2b.

Fig. 2
figure 2

Tradeoff among the depth resolution, the range of interest, and the number of measurements. The dashed blue line represents the low frequency phase, and the solid line represents the high frequency. Horizontal red bands represents the resolution of the measured phase. Intersections of the blue lines and the horizontal red bands (depicted as red circles) are the candidate depth, and the corresponding depth resolution is illustrated as vertical red bands. a, b While the resolution in phase is the same, the corresponded depth resolution vary depending on the frequency. With higher frequency, better depth resolution is obtained; however, there is depth ambiguity. c Using multiple frequencies, the range of interest can be extended to the frequency of the greatest common divisor, and the depth resolution is determined by the highest frequency. d The bottom table summarizes the trade-off

The phase ambiguity is usually relaxed by using multiple frequencies in either a temporal or a spatial domain. However, multiple captures are required, and it sacrifices real-time possibility as shown in Fig. 2c. We propose a hybrid approach of disambiguation that can take advantage of a different nature in temporal and spatial modulation.

4 Proposed method

We propose a hybrid method of temporal and spatial modulation as shown in Fig. 1c. The phase ambiguity can be resolved by using both temporal and spatial phases instead of using multiple frequencies in either domain.

4.1 Spatio-temporal phase disambiguation

Our key idea is that the depth candidates from the ambiguity of the temporal and spatial phases are different. In the case of the temporal phase, the intervals of the depth candidates are constant along the depth because the depth is proportional to the phase, as shown in Eq. (5). On the other hand, the spatial phase is defined in the disparity domain. Because the depth is inversely proportional to the disparity (as shown in Eq. (9)), the intervals of depth candidates increase along with the depth. Figure 3 shows the phase observations along with the scene depth. Multiple depth candidates correspond to a single phase. The depth candidates appear at the same interval for the temporal phase, while the intervals of the spatial phase increase. This difference is a key feature of our method to resolve the phase ambiguity.

Fig. 3
figure 3

Phase observations with the depth. While depth candidates of the temporal phase appear at the same intervals, those of the spatial pattern appear at increasing intervals. This difference is the cue to disambiguate the depth candidate. The unique depth candidate that satisfy both temporal phase and spatial phase can be obtained

Depths that satisfy both temporal and spatial phases seldom appear. The unwrapped phase is not restricted by the greatest common divisor, and the set of temporal and spatial phases is unique for the wider range of interest. The candidate depths can be respectively obtained from the following equations as

$$\begin{array}{*{20}l} {d}_{T} &= \frac{{c}}{2 {\omega_{T}}}(2\pi n_{T} + {{\phi}_{T}}) \end{array} $$
$$\begin{array}{*{20}l} {d}_{S} &= \frac{{b} {f}}{{x} - \frac{2\pi n_{S} + {{\phi}_{S}}}{{\omega_{S}}} }. \end{array} $$

The integer pair (nT,nS) that satisfies dT=dS seldom exists. Therefore, the phase ambiguity problem can be resolved using phases of different domains.

4.2 Phase recovery and depth estimation

Defining I0 as the irradiance, the emitted signal from the projector with the k-th temporal shift and the l-th spatial shift I(p,t,k,l) can be expressed as

$$ \begin{aligned} I({p}, t, k, l) &= I_{0} \left(\frac{1}{2}\cos \left(\omega_{T} t + \frac{2\pi k}{N_{T}}\right) + \frac{1}{2} \right)\\&\quad \left(\frac{1}{2}\cos\left({\omega_{S}}{p} - \frac{2\pi l}{N_{S}}\right) + \frac{1}{2}\right), \end{aligned} $$

where t is time and p is the projector’s pixel. The returned signal r(x,t,k,l) at the camera pixel x is represented as

$$ {{}\begin{aligned} r(x, t, k, l)& =I_{0} \kappa(x) \left(\frac{1}{2}\cos \left(\omega_{T} t - \phi_{T}(x) - \frac{2\pi k}{N_{T}}\right) + \frac{1}{2}\right)\\& \left(\frac{1}{2}\cos\left(\phi_{S}(x) - \frac{2\pi l}{N_{S}}\right) + \frac{1}{2}\right) \\ &+ o(x), \end{aligned}} $$

where κ is the reflectance of target object, o(x) is the ambient light, ϕT(x) is the phase delay corresponding to the round trip time, and ϕS(x) is the phase corresponding to the disparity (xp). The intensity is the correlation with the reference signal \(g_{{\omega _{T}}}(t)\) [35] as

$$\begin{array}{*{20}l} i({x}, k, l) &= \int_{0}^{T} r({x}, t, k, l)g_{{\omega_{T}}}(t) dt \\ &\approx\,{A}({x}) \left(\frac{1}{2} \cos\left({{\phi}_{T}}({x}) + \frac{2\pi k}{N_{T}}\right) + \frac{1}{2}\right)\\&\quad \left(\frac{1}{2} \cos\left({{\phi}_{S}}({x}) - \frac{2\pi l}{N_{S}}\right) + \frac{1}{2}\right) \\ &\quad + {O}({x}), \end{array} $$

where T is the exposure time. The temporal phase ϕT and spatial phase ϕS are obtained from 8 observations with NT=4 and NS=4 as

$$ \left\{{\begin{aligned} {{\phi}_{T}}({x}) &= \arctan{\frac{{i}({x}, 3, 0) - {i}({x}, 1, 0)}{{i}({x}, 0, 0) - {i}({x}, 2, 0)}} \\ {{\phi}_{S}}(x) &= \arctan{\frac{{i}(x, 0, 3)-{i}(x, 0, 1) }{{i}(x, 0, 0)-{i}(x, 0, 2)}}. \end{aligned}}\right. $$

Now, we have two phases: the temporal phase ϕT and the spatial phase ϕS. Depth estimation from the two phases is similar to the unwrapping problem in both the multi-frequency phase shifting and the ToF, and it can be solved by searching a lookup table [4]. The observed phases should respectively equal to the phases computed from the same depth, the computed phase ϕT ~(d),ϕS ~(d) is obtained as

$$\begin{array}{*{20}l} {\Tilde{\phi_{T}}}(d) &= \frac{2 {\omega_{T}} d}{c} \bmod{2 \pi} \end{array} $$
$$\begin{array}{*{20}l} {\Tilde{\phi_{S}}}(d, {x}) &= {\omega_{S}} \left(x - \frac{{b} {f}}{d} \right) \bmod{2\pi}. \end{array} $$

A lookup table is built for each horizontal pixel position x of the camera because the spatial phase depends on the pixel position. The table \(\mathcal {T}_{{x}}\) at the horizontal position x consists of the vector \(\Phi _{D_{i}, {x}} = [{\Tilde {\phi _{T}}}(D_{i}), {\Tilde {\phi _{S}}}(D_{i}, {x})]\) of the candidate depth Di as

$$\begin{array}{*{20}l} \mathcal{T}_{{x}}(D_{i}) = \Phi_{{D_{i}, {x}}} = \left[{\Tilde{\phi_{T}}}(D_{i}), {\Tilde{\phi_{S}}}(D_{i}, {x})\right]. \end{array} $$

For each pixel, the depth can be estimated by searching the lookup table as

$$\begin{array}{*{20}l} \hat{d}({x}) = \arg\min_{d} \left\lVert{\mathcal{T}_{{x}}({d}) - \left[{{\phi}_{T}}({x}), {{\phi}_{S}}({x})\right]}\right\rVert^{2}_{2}. \end{array} $$

Efficient implementation In practice, building the look up table for each horizontal pixel position is not necessary. Although the spatial phase and corresponding depth depends on the position of camera pixel, the disparity does not depend on the position of the camera pixel. The depth of all camera pixels can be obtained by only one look up table by building from the pair of temporal phase and the disparity after converting the measured phase to the disparity. The disparity is obtained from the measured spatial phase ϕS and pixel position x as

$$\begin{array}{*{20}l} {\delta}({x}, {{\phi}_{S}}({x})) &= {x} - \frac{{{\phi}_{S}}({x})}{{\omega_{S}}} \end{array} $$
$$\begin{array}{*{20}l} &= \frac{bf}{\tilde{d}}, \end{array} $$

where δ represents the disparity and \(\tilde {d}\) is the wrapped depth. The table \(\mathcal {T'}\) consists of the vector \(\Phi _{D_{i}}' = [{\Tilde {\phi _{T}}}(D_{i}), {\Tilde {\delta }}(D_{i})]\) of the candidate depth Di as

$$\begin{array}{*{20}l} {\Tilde{\delta}}(D_{i}) &= \frac{{b} {f}}{D_{i}} \bmod{\frac{2\pi}{{\omega_{S}}}} \end{array} $$
$$\begin{array}{*{20}l} \mathcal{T'}(D_{i}) &= \Phi_{D_{i}}'=\left [{\Tilde{\phi_{T}}}(D_{i}), {\Tilde{\delta}}(D_{i})\right], \end{array} $$

where δ~ is the computed disparity from candidate depths. For each pixel, the depth can be estimated by searching the lookup table as

$$\begin{array}{*{20}l} \hat{d}({x}) = \arg\min_{d} \left\lVert{\mathcal{T'}({d}) - [{{\phi}_{T}}({x}), {\delta}({x}, {{\phi}_{S}})]}\right\rVert^{2}_{2}. \end{array} $$

5 Analysis of the proposed method

Depth resolution The resolution is better than ToF in a near range and better than phase shifting in a far range.

The resolution of ordinary ToF and phase shifting is respectively represented as [6, 25]

$$\begin{array}{*{20}l} {\Delta d}_{T} &= \frac{c\pi}{{\omega_{T}}}\frac{\sqrt{B}}{2\sqrt{8}A}, \end{array} $$
$$\begin{array}{*{20}l} {\Delta d}_{S} &= \frac{2\pi{d}^{2}}{{b}{f}{\omega_{S}}}\frac{\sqrt{B}}{2\sqrt{8}A}, \end{array} $$

where A and B are the number of photo-electrons that the sensor can accumulate and represents the amplitude and the DC component, respectively. We suppose that A and B are the parameters of the hardware and are independent from the scene. However, the returned light is influenced by the light falloff in real; hence, a future work is expected to include this effect to analyze more accurately.

Figure 4 shows the depth resolution of ToF and phase shifting along with the depth according to Eqs. (25) and (26). The resolution of ToF is constant at any depth while the resolution of phase shifting is proportional to the square of the depth. The proposed method achieves the resolution that is close to the better resolution of either phase shifting or time-of-flight as shown in Fig. 4.

Fig. 4
figure 4

Depth resolution along with the depth. According to Eqs. (25) and (26), the resolution of ToF is constant (blue) and the resolution of phase shifting is proportional to the square of the depth (orange). The depth dcross is the depth where the lines of the resolution of ToF and the resolution of the phase shifting is crossed. The proposed method can achieve the resolution that is close to the phase shifting in near range before dcross and the resolution that is close to the ToF in far range after dcross (green)

The depth dcross is defined by the depth where the resolution of ToF is equal to the resolution of phase shifting. In the range near than dcross, the resolution of our method is better than ToF and close to phase shifting. In the range far than dcross, the resolution of our method is better than phase shifting and close to ToF. The depth dcross is given as

$$\begin{array}{*{20}l} {\Delta d}_{S} &= {\Delta d}_{T} \end{array} $$
$$\begin{array}{*{20}l} {d_{\text{cross}}} &= \sqrt{\frac{{c} {b} {f} {\omega_{S}}}{2 {\omega_{T}}}}. \end{array} $$

When we want to improve the resolution of pure ToF, the maximum range of this system should be designated shorter than dcross.

Range of interest The range of interest (ROI) of the proposed method is determined by the relative relation between the temporal and the spatial frequencies.

Nearest range When the spatial frequency is too high compared with the temporal frequency, the phase ambiguity problem cannot be resolved because multiple candidate depths exist within the resolution of the ToF, as shown in Fig. 5a. The spatial frequency varies depending on the depth because the projection is perspective. As the distance is shorter, the spatial frequency is higher. This property gives the nearest ROI of the proposed method. The nearest ROI dmin is where the wrapping distance of spatial phase is equal to the resolution of the ToF at the given temporal and spatial frequencies as

$$\begin{array}{*{20}l} {d}_{S}|_{n_{S}=n_{S}^{\prime}} \ - {d}_{S}|_{n_{S}=n_{S}^{\prime} - 1} = \frac{{\Delta d}_{T}}{2}, \end{array} $$
Fig. 5
figure 5

Upper and lower bound of the ROI. Orange lines represent the candidate depths of spatial modulation; blue lines represent the candidate depths of temporal modulation. The width of the line shows the resolution. a If the depth is near than dmin, several candidate depths from spatial modulation (orange lines) exist within the resolution of temporal modulation (blue band). b On the other hand, if the depth is longer than dmax, several candidate depths from temporal modulation (blue lines) exist within the spatial resolution (orange band)

where \(\phantom {\dot {i}\!}{d}_{S}|_{n_{S}=n_{S}^{'}}\) is the unwrapped depth and \(\phantom {\dot {i}\!}{d}_{S}|_{n_{S}=n_{S}' - 1}\) is the neighbor depth candidate from Eq. (11). Substituting Eq. (17) and transforming the expression, the minimum depth of the range of interest dmin can be obtained asFootnote 1

$$\begin{array}{*{20}l} {d_{\text{min}}} = \frac{{\Delta d}_{T}}{4} +{\frac{1}{2}\sqrt{\frac{{\Delta d}^{2}_{T}}{4} + \frac{{\omega_{S}} {b} {f} {\Delta d}_{T}}{\pi}}}. \end{array} $$

Farthest range When the spatial frequency is too low compared with the temporal frequency, the phase ambiguity problem cannot be resolved because multiple candidate depths exist within the resolution of the spatial phase shifting, as shown in Fig. 5b. Because the resolution of the spatial phase shifting is inversely proportional to the depth, the farthest ROI dmax is determined. The farthest ROI dmax is where the wrapping distance of temporal phase is equal to the resolution of the phase shifting as

$$\begin{array}{*{20}l} {d}_{T}|_{n_{T}=n_{T}'} \ - {d}_{T}|_{n_{T}=n_{T}'-1} = \frac{\Delta d_{S}}{2}, \end{array} $$

where \(\phantom {\dot {i}\!}{d}_{T}|_{n_{T}=n_{T}'}\) is the unwrapped depth and \(\phantom {\dot {i}\!}{d}_{T}|_{n_{T}=n_{T}'-1}\) is the neighbor depth candidate from Eq. (11). Substituting Eq. (16), Eq. (26), and transforming the expression, the farthest ROI dmax can be obtained asFootnote 2

$$\begin{array}{*{20}l} {d_{\text{max}}} = \sqrt{\frac{{\omega_{S}} {b} {f} {c}^{2} \pi}{{\omega_{T}}^{2} {\Delta d}_{T}}}. \end{array} $$

Unrecoverable point There are few unrecoverable depths in the proposed method. Figure 6 shows that the pair of temporal and spatial phases corresponding to the depth. The vertical axis is the temporal phase, and the horizontal axis is the spatial phase. The color of the curves represents the depth. The intersections of the curves are unrecoverable depth because different depths have the same phase pair. This is a limitation of this method; however, these points generally appeared sparsely in the image hence can be estimated by looking at neighbor pixels of the image.

Fig. 6
figure 6

The transition of temporal and spatial phases with respect to the depth. The vertical axis represents the temporal phase and the horizontal axis represents the spatial phase. The color represents the depth. The intersections of the curves have the same phase pair at the different depths. These depths cannot be recovered uniquely

We confirm that the unrecoverable points seldom exists via simulation. We evaluate the percentage of unrecoverable pixels in an image using an indoor dataset [36]. Temporal phases and spatial phases were respectively rendered, and the depth image is estimated by our method from these phase images. The temporal frequency is set to 50 MHz, and spatial frequency is 1/0.6 mm−1. One hundred scenes were selected from the dataset randomly.

The results are shown in Fig. 7. Depths of some pixels cannot be recovered due to multiple candidates. The average ratio of the uncovered pixel in each image is less than 5%. These points exist sparsely in the image; hence, it is possible to select the candidate by looking around their pixels.

Fig. 7
figure 7

Some results of simulation. Black color means the pixels cannot be recovered due to depth ambiguity. Unrecoverable pixel seldom exists in the image

Brightness of the pattern One may think that the temporal phase cannot be obtained if the spatial pattern is completely black. Because the spatial sinusoidal pattern is projected, all the pixels have a chance to obtain the photons unless the spatial pattern is extremely low. A possible solution is to add the constant value to the spatial pattern so that there are no pixels that are always black. In this case, the observation Eq. (14) is rewritten as

$$\begin{array}{*{20}l} {i}({x}, k, l) =&{A}({x}) \left(\frac{1}{2} \cos\left({{\phi}_{T}}({x}) + \frac{2\pi k}{N_{T}} \right) + \frac{1}{2} \right) \\&\quad \left(A_{S} \cos\left({{\phi}_{S}}({x}) - \frac{2\pi l}{N_{S}} \right) + O_{S} \right) \\&+ {O}({x}), \end{array} $$

where AS and OS (0<OSAS and OS+AS≤1) are the amplitude and offset of the spatial modulation, respectively. Analogous to Eq. (14), both phases can be obtained by the same equations as Eq. (15) in the NT,NS=4 case. So, it is not necessary to increase the number of observations.

6 Experiment

We demonstrated the effectiveness of our proposed method with real-world experiments.

Hardware prototype We developed a hardware prototype that can illuminate a scene with a spatio-temporal modulated pattern. Our prototype was built onto a ToF camera (Texas Instruments OPT8241-CDK-EVM). The light source was replaced with a laser diode and a DMD system that can project the spatial pattern. The light source was an 830-nm laser diode (Hamamatsu Photonics L9277-42), and its emission was synchronized with the ToF sensor. The light emitted by the diode was collimated and expanded through lenses, and then reflected onto a DMD device (Texas Instruments DLP6500) that had 1920×1080 pixels. Finally, the spatio-temporal pattern was projected onto the scene through a projection lens, as shown in Fig. 8.

Fig. 8
figure 8

Hardware prototype. The light source unit consists of a laser diode and a DMD device. The emission of the laser diode is temporally modulated by the sync signal from the ToF camera and then spatially modulated by the DMD. The ToF camera and the projection lens of the projector are placed side by side

First, the measurement system was calibrated in a standard way for the pro-cam systems using a reference board [37]. The phase of the ToF on each pixel was then calibrated to share the same coordinates as the pro-cam system. A white plane board was captured while its position was moved for the phase calibration. For each measurement of the board, the pair of the raw phase and the ground-truth depth was obtained because the depth of the board was measured by the ordinary phase shifting. The parameter to recover the depth from the phase was calibrated by line fitting.

Result First, we measured a white planar board and placed it at approximately 350 mm from the camera and slightly slanted it, as shown in Fig. 9a. The temporal frequency was 60 MHz, and the period of the spatial pattern was 60 pixels on the projection image. The baseline between the camera and the projector was approximately 70 mm, and the focal length of the projection lens was 35 mm.

Fig. 9
figure 9

Results with a white planar board. Ordinary ToF, phase shifting (single high frequency), and our method are compared. a The object was placed at a slight slant. b The estimated depth images. Because the depth cannot be identified in the phase shifting, the depth image cannot be visualized. c The cross-section of the red line is shown. While the ordinary ToF is noisy and phase shifting has many candidates, our method recovers a smooth and unique depth candidate

The depths were obtained by an ordinary ToF with a single low frequency, phase shifting with single high frequency, and our method for the comparison. Figure 9b shows the estimated depth images. Both the ToF and our method recover the global depth. The depth image with phase shifting cannot be visualized because it has multiple depth candidates. The cross-section of the red line is shown in Fig. 9c. While the depth measured by the ordinary ToF is noisy and there are many depth candidates due to phase ambiguity in the phase shifting, our method recovers a smooth surface while resolving the phase ambiguity. The region near the edge is not correctly disambiguated because the resolution of the temporal measurement exceeds the interval of the phase shifting. The ToF resolution near the edge is lower than what we expected because the illumination is very low near the edge. However, decreasing the spatial frequency might have mitigated it.

Finally, we measured a plaster bust and placed it approximately 400 mm from the camera, as shown in Fig. 10a. The estimated depth images are shown in Fig. 10b. The cross-section of the depth is shown in Fig. 10c. Our method recovers a unique and smooth depth.

Fig. 10
figure 10

Results with a plaster bust. a The scene. b The depth maps. Black pixels represent the occlusion. c The cross-section of the red lines drawn on (b). Our method recover a unique and smooth surface

7 Conclusion

We developed a depth sensing method that uses spatio-temporally modulated illumination. We showed that the phase ambiguities of the temporal and spatial modulations are different, so it is possible to effectively resolve the ambiguities while reducing the observations and preserving the depth resolution.

Our proposed method inherits not only the strength of time-of-flight camera and active stereo using projector-camera system but also the weakness of them. While the proposed method can archive better resolution and wider range of interest, it may suffer from occlusion, which scarifies the ToF camera’s potential. However, in practice, the current ToF camera is not a co-axial setup and it does not much suffer from occlusion. If the spatial-temporal projector is configured in the micro-baseline setup similarly to a ToF camera, the system does not much suffer from occlusion.

In this paper, depths of the ToF measurement are defined as the distance between a camera and a target; on the other hand, depths of the projector-camera system of phase shifting is defined as the distance between a center of baseline and a target. In practice, the difference should be correct for implementation although this is not affected to our key idea. Indeed, this model mismatch is absorbed by calibration step to build a look up table.

Our hardware prototype has some limitations. Because the DMD produces the sinusoidal pattern by controlling the mirrors on and off, it can make artifacts to the ToF. We ignored this effect, but it should be considered to control the DMD or to use a solid spatial light modulator appropriately. The quality of the spatio-temporally modulated illumination of our prototype is not very high. The temporal phase contains a systematic distortion, and the spatial resolution of the projector is currently limited to 64 pixels on the DMD, corresponding to 4 pixels on the camera, because the pattern is blurred. This might be due to the collimation and the alignment accuracy of the optics or the diffraction on the DMD. The light source cannot emit a spatial pattern that is equal to or less than the camera pixel’s size, resulting in diminished phase shifting. In future implementations, we will develop a better light source unit to improve the temporal phase measurements and generate higher spatial resolutions.

8 Appendix

8.1 Derivation of Eq. (30)

We reshow Eq. (29) for the derivation of Eq. (30) as

$$\begin{array}{*{20}l} {d}_{S}|_{n_{S}=n_{S}'} \ - {d}_{S}|_{n_{S}=n_{S}' - 1} = \frac{{\Delta d}_{T}}{2}. \end{array} $$

\(\phantom {\dot {i}\!}{d}_{S}|_{n_{S}=n_{S}'-1}\) is the neighbor depth candidate as

$$\begin{array}{*{20}l} {d}_{S}|_{n_{S}=n_{S}^{\prime}-1} = \frac{{b} {f}}{{x} - \frac{2\pi (n_{S}^{\prime}-1) + {{\phi}_{S}}}{{\omega_{S}}}}. \end{array} $$

The unwrapped depth \(\phantom {\dot {i}\!}{d}_{S}|_{n_{S}=n_{S}'}\) that satisfy Eq. (29) is the minimum depth of the range of interest dmin as

$$\begin{array}{*{20}l} {d}_{S}|_{n_{S}=n_{S}^{\prime}} = \frac{{b} {f}}{{x} - \frac{2\pi n_{S}^{\prime} + {{\phi}_{S}}}{{\omega_{S}}}} = {d_{\text{min}}}. \end{array} $$

Substituting Eqs. (A.1) and (A.2) to Eq. (29),

$$\begin{array}{*{20}l} {d_{\text{min}}} - \frac{{b} {f}}{{x} - \frac{2\pi (n_{S}' - 1) + {{\phi}_{S}}}{{\omega_{S}}}} = \frac{{\Delta d}_{T}}{2}, \notag \\ {d_{\text{min}}} - \frac{{b} {f}}{{x} - \frac{2\pi n_{S}' + {{\phi}_{S}}}{{\omega_{S}}} - \frac{2\pi}{{\omega_{S}}}} = \frac{{\Delta d}_{T}}{2}. \end{array} $$

Substituting Eq. (A.2) to the denominator part,

$$\begin{array}{*{20}l} {d_{\text{min}}} - \frac{{b} {f}}{\frac{{b} {f}}{{d_{\text{min}}}} - \frac{2\pi}{{\omega_{S}}}} = \frac{{\Delta d}_{T}}{2}. \end{array} $$

Multiplying both sides of the equation by \(\frac {{b}{f}}{{d_{\text {min}}}} - \frac {2\pi }{{\omega _{S}}}\) and rearranging the equation,

$$\begin{array}{*{20}l} {d_{\text{min}}} \left(\frac{{b} {f}}{{d_{\text{min}}}} - \frac{2\pi}{{\omega_{S}}} \right) - {b} {f} = \frac{{\Delta d}_{T}}{2} \left(\frac{{b} {f}}{{d_{\text{min}}}} - \frac{2\pi}{{\omega_{S}}} \right) \end{array} $$
$$\begin{array}{*{20}l} {d_{\text{min}}}^{2} - \frac{{\Delta d}_{T}}{2} {d_{\text{min}}} + \frac{{\omega_{S}}}{2 \pi} \frac{{\Delta d}_{T}}{2} {b} {f} = 0. \end{array} $$

Solving the quadratic equation for dmin, we obtain

$$\begin{array}{*{20}l} {d_{\text{min}}} &= \frac{{\Delta d}_{T}}{4} +{\frac{1}{2}\sqrt{\frac{{\Delta d}^{2}_{T}}{4} + \frac{{\omega_{S}} {b} {f} {\Delta d}_{T}}{\pi}}}, \end{array} $$

where the other solution is always negative and the out of range of dmin>0.

8.2 Derivation of Eq. 32

Substituting Eqs. (10) and (26) to Eq. (31),

$$\begin{array}{*{20}l} \frac{{c}}{2 {\omega_{T}}}(2\pi n_{T} + {{\phi}_{T}}) &- \frac{{c}}{2 {\omega_{T}}}(2\pi (n_{T} - 1) + {{\phi}_{T}}) \end{array} $$
$$\begin{array}{*{20}l} &=\frac{1}{2} \frac{2\pi{d_{\text{max}}}^{2}}{{b}{f}{\omega_{S}}}\frac{\sqrt{B}}{2\sqrt{8}A}, \\ &\frac{{c} \pi}{{\omega_{T}}} = \frac{\pi{d_{\text{max}}}^{2}}{{b}{f}{\omega_{S}}}\frac{\sqrt{B}}{2\sqrt{8}A}. \end{array} $$

Rearranging the equation,

$$\begin{array}{*{20}l} {d_{\text{max}}}^{2} &= \frac{{\omega_{S}} {b} {f} {c}}{{\omega_{T}}}\frac{2\sqrt{8}A}{\sqrt{B}}. \end{array} $$

Substituting Eq. (25) to Eq. (A.9) to cancel A and B,

$$\begin{array}{*{20}l} {d_{\text{max}}}^{2} &= \frac{{\omega_{S}} {b} {f} {c}^{2}\pi}{{\omega_{T}}^{2}{\Delta d}_{T}}. \end{array} $$


$$\begin{array}{*{20}l} {d_{\text{max}}} &= \sqrt{\frac{{\omega_{S}} {b} {f} {c}^{2} \pi}{{\omega_{T}}^{2} {\Delta d}_{T}}}, \end{array} $$

because dmax>0.

Availability of data and materials

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.


  1. Please see Appendix for the derivation.

  2. Please see Appendix for the derivation.





Digital mirror device


  1. Kushida T, Tanaka K, Takahito A, Funatomi T, Mukaigawa Y (2019) Spatio-temporal phase disambiguation in depth sensing In: Proc. ICCP.

  2. Inokuchi S, Sato K, Matsuda F (1984) Range imaging system for 3-D object recognition In: Proc. International Conference on Pattern Recognition, 806–808.. IEEE Computer Society Press.

  3. Salvi J, Fernandez S, Pribanic T, Llado X (2010) A state of the art in structured light patterns for surface profilometry. Pattern Recog 43.

  4. Gupta M, Nayer S (2012) Micro phase shifting In: Proc. CVPR, 813–820.. IEEE.

  5. Mirdehghan P, Chen W, Kutulakos KN (2018) Optimal structured light à la carte In: Proc. CVPR.

  6. Lange R, Seitz P (2001) Solid-state time-of-flight range camera. IEEE J Quantum Electron 37(3):390–397.

    Article  Google Scholar 

  7. Yasutomi K, Usui T, Han S. -m., Takasawa T, Keiichiro K, Kawahito S (2016) A submillimeter range resolution time-of-flight. IEEE Trans Electron Devices 63(1):182–188.

    Article  Google Scholar 

  8. Heide F, Xiao L, Kolb A, Hullin MB, Heidrich W (2014) Imaging in scattering media using correlation image sensors and sparse convolutional coding,. Opt Express 22(21):26338–50.

    Article  Google Scholar 

  9. Kirmani A, Benedetti A, Chou PA (2013) Spumic: simultaneous phase unwrapping and multipath interference cancellation in time-of-flight cameras using spectral methods In: IEEE International Conference on Multimedia and Expo (ICME), 1–6.

  10. Freedman D, Krupka E, Smolin Y, Leichter I, Schmidt M (2014) SRA: Fast Removal of General Multipath for ToF Sensors In: Proc. ECCV, 1–15.

  11. Qiao H, Lin J, Liu Y, Hullin MB, Dai Q (2015) Resolving transient time profile in ToF imaging via log-sum sparse regularization. Opt Lett 40(6):918–21.

    Article  Google Scholar 

  12. Kadambi A, Schiel J, Raskar R (2016) Macroscopic interferometry: rethinking depth estimation with frequency-domain time-of-flight In: Proc. CVPR, 893–902.

  13. Marco J, Hernandez Q, Muñoz A, Dong Y, Jarabo A, Kim MH, Tong X, Gutierrez D (2017) DeepTof: off-the-shelf real-time correction of multipath interference in time-of-flight imaging. ACM Trans Graph 36(6):219–121912.

    Article  Google Scholar 

  14. Tanaka K, Mukaigawa Y, Funatomi T, Kubo H, Matsushita Y, Yagi Y (2018) Material classification from time-of-flight distortions. IEEE TPAMI.

  15. Su S, Heide F, Wetzstein G, Heidrich W (2018) Deep end-to-end time-of-flight imaging In: Proc. CVPR.

  16. Velten A, Willwacher T, Gupta O, Veeraraghavan A, Bawendi MG, Raskar R (2012) Recovering three-dimensional shape around a corner using ultrafast time-of-flight imaging. Nat Commun 3(745).

  17. Heide F, Hullin MB, Gregson J, Heidrich W (2013) Low-budget transient imaging using photonic mixer devices. ACM ToG 32(4):1.

    MATH  Google Scholar 

  18. Kitano K, Okamoto T, Tanaka K, Aoto T, Kubo H, Funatomi T, Mukaigawa Y (2017) Recovering temporal PSF using ToF camera with delayed light emission. IPSJ Trans Comput Vis Appl 9(15).

  19. Kadambi A, Whyte R, Bhandari A, Streeter L, Barsi C, Dorrington A, Raskar R (2013) Coded time of flight cameras: sparse deconvolution to address multipath interference and recover time profiles. ACM ToG 32(6):1–10.

    Article  Google Scholar 

  20. O’Toole M, Heide F, Xiao L, Hullin MB, Heidrich W, Kutulakos KN (2014) Temporal frequency probing for 5D transient analysis of global light transport. ACM ToG 33(4):1–11.

    Article  Google Scholar 

  21. O’Toole M, Heide F, Lindell D, Zang K, Diamond S, Wetzstein G (2017) Reconstructing transient images from single-photon sensors In: Proc. CVPR.

  22. Naik N, Kadambi A, Rhemann C, Izadi S, Raskar R, Bing Kang S (2015) A light transport model for mitigating multipath interference in time-of-flight sensors In: Proc. CVPR, 73–81.

  23. Nayar SK, Krishnan G, Grossberg MD, Raskar R (2006) Fast separation of direct and global components of a scene using high frequency illumination. ACM ToG 25(3):935–944.

    Article  Google Scholar 

  24. Whyte R, Streeter L, Cree MJ, Dorrington AA (2015) Resolving multiple propagation paths in time of flight range cameras using direct and global separation methods. Opt Eng 54:54–549.

    Article  Google Scholar 

  25. Agresti G, Zanuttigh P (2018) Combination of spatially-modulated ToF and structured light for MPI-free depth estimation In: ECCV Workshop on 3D Reconstruction in the Wild.. IEEE.

  26. Gupta M, Velten A, Nayar SK, Breitbach E (2018) What are optimal coding functions for time-of-flight imaging?. ACM ToG 37(2):13–11318.

    Article  Google Scholar 

  27. Gutierrez-Barragan F, Reza S, Velten A, Gupta M (2019) Practical coding function design for time-of-flight imaging In: Proc. CVPR.

  28. Kadambi A, Taamazyan V, Shi B, Raskar R (2015) Polarized 3D: high-quality depth sensing with polarization cues In: Proc. ICCV, 3370–3378.

  29. Gkioulekas I, Levin A, Durand F, Zickler T (2015) Micron-scale light transport decomposition using interferometry. ACM ToG 34(4):37–13714.

    Article  Google Scholar 

  30. Li F, Willomitzer F, Rangarajan P, Gupta M, Velten A, Cossairt O (2018) Sh-tof: micro resolution time-of-flight imaging with superheterodyne interferometry In: Proc. ICCP.

  31. Maeda T, Kadambi A, Schechner YY, Raskar R (2018) Dynamic heterodyne interferometry In: Proc. ICCP.. IEEE.

  32. Sagawa R, Kawasaki H, Furukawa R, Kiyota S (2011) Dense one-shot 3D reconstruction by detecting continuous regions with parallel line projection In: Proc. ICCV.

  33. Sagawa R, Sakashita K, Kasuya N, Kawasaki H, Furukawa R, Yagi Y (2012) Grid-based active stereo with single-colored wave pattern for dense one-shot 3D scan In: 3DIMPVT, 363–370.

  34. Jayasuriya S, Pediredla A, Sivaramakrishnan S, Molnar A, Veeraraghavan A (2015) Depth fields: extending light field techniques to time-of-flight imaging In: 2015 International Conference on 3D Vision, 1–9.

  35. Heide F, Heidrich W, Hullin M, Wetzstein G (2015) Doppler time-of-flight imaging. ACM ToG 34(4):36–13611.

    Article  Google Scholar 

  36. McCormac J, Handa A, Leutenegger S, J.Davison A (2017) SceneNet RGB-D: can 5m synthetic images beat generic ImageNet pre-training on indoor segmentation?

  37. Zhang Z (2000) A flexible new technique for camera calibration. TPAMI 22:1330–1334.

    Article  Google Scholar 

Download references


We thank all the people who gave us various insightful and constructive comments.


This work is partly supported by JST CREST JPMJCR1764 and JSPS Kaken grant JP18H03265 and JP18K19822.

Author information

Authors and Affiliations



TK contributed to the concept, conducted experiments, and wrote the manuscript; KT and TA contributed to the concept and optical design and edited the manuscript; and TF and YM supervised the project and improved the representation. The authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Takahiro Kushida.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kushida, T., Tanaka, K., Aoto, T. et al. Phase disambiguation using spatio-temporally modulated illumination in depth sensing. IPSJ T Comput Vis Appl 12, 1 (2020).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: