 EXPRESS PAPER
 Open Access
 Published:
Multibody motion segmentation for an arbitrary number of independent motions
IPSJ Transactions on Computer Vision and Applications volume 8, Article number: 1 (2016)
Abstract
We propose a new method for segmenting feature point trajectories tracked through a video sequence without assuming a number of independent motions. Our method realizes motion segmentation of feature point trajectories by hierarchically separating the trajectories into two affine spaces in a situation that we do not know the number of independently moving objects. We judge that input trajectories should be separated by comparing the likelihoods computed from those trajectories before/after separation. We also consider integration of the resulting separated trajectories for avoiding too much segmentations. By using real video images, we confirmed the efficiency of our proposed method.
Introduction
Separating independently moving objects in a video sequence is one of the important tasks in computer vision applications. Costeira and Kanade [1] proposed a segmentation algorithm based on the shape interaction matrix. Sugaya and Kanatani [3] proposed a multistage learning strategy using multiple models. Yan and Pollefeys [8] proposed a new local subspace fitting scheme. Vidal et al. [7] proposed a segmentation algorithm based on generalized principal component analysis (GPCA) [6]. By introducing GPCA for computing an initial segmentation, Sugaya and Kanatani [4] improved the multistage learning.
However, all these methods assume the number of moving objects. Kanatani and Matsunaga [2] proposed a method for estimating the number of independently moving objects based on the rank estimation of the affine space using the geometric minimum description length (MDL). However, estimating the number of independently moving objects based on the rank of the affine space is very difficult for real image sequences. For example, if an object motion is planar, the dimension of an affine space which includes its trajectories degenerates from 3D to 2D. Moreover, if two objects merely translate without rotation, the two 2D affine spaces are parallel to each other. This means that a 3D affine space which includes those 2D affine spaces exists.
For this problem, we propose a new method for segmenting feature point trajectories without assuming the number of objects. Based on the fact that trajectories of a rigidly moving object is constrained to a 2D or 3D affine space, we hierarchically separate input trajectories into two affine spaces until all the trajectories are divided into 2D or 3D affine spaces. In order to judge whether input trajectories should be divided or not, we compare the likelihoods before/after separation. After the separation process, we also check whether the separated trajectories should be integrated by comparing the likelihoods to avoid that the trajectories which belong to the same object are separated into different groups.
Proposed method
From the fact that trajectories of a rigidly moving object is constrained to a 2D or 3D affine space, we can separate independently moving objects by hierarchically separating input trajectories into two affine spaces until all the trajectories are divided into 2D or 3D affine spaces. In order to realize the above separation, we need to overcome two problems. One problem is to properly estimate the dimension of the affine space which includes input trajectories. The other problem is that we need to judge whether input trajectories should be divided to stop hierarchical separation.
For the first problem, we can regard the rank of the moment matrix of the input trajectory vectors. The rank of the moment matrix can be obtained as the number of positive eigenvalues of the matrix. However, in the presence of noise, all eigenvalues are nonzero in general. Hence, we need to truncate small eigenvalues, but it is difficult to determine a proper threshold. We compute the rank of the moment matrix by using the geometric MDL [2].
For the second problem, we compare the average likelihoods of the trajectories for the affine spaces which are fitted to all the trajectories and the divided ones. We compute the average likelihoods before/after division and divide those trajectories if the likelihood after division is larger than that before division.
We summarize the algorithm of our proposed method as follows:

1.
Fit an affine space to the input trajectories, and compute its dimension d by using the geometric MDL.

2.
If d≤2, then we stop the division process for the target trajectories.

3.
Divide the trajectories into two affine spaces.

(a)
Convert the trajectory vectors into 3D vectors.
Please refer to [4] for the detail computation.

(b)
Fit two planes to those 3D vectors by the Taubin method.

(c)
Convert the trajectory vectors into dD vectors.

(d)
Separate the dD vectors into two affine spaces by the EM algorithm of Sugaya and Kanatani [3, 4].

(a)

4.
Compute the average likelihoods P and P ^{′} of the trajectories before/after separation and accept the separation, and go to step 1 if the following inequality is satisfied.
$$ \lfloor{\log_{10}P'}\rfloor\lfloor{\log_{10}P}\rfloor > 0, $$(1)where ⌊·⌋ is the floor function. In our experience, if we compared the average likelihoods directly, the separation judgement was not stable. Thus, we compare the exponent part of the average likelihood.

5.
Else, reject the separation.
We hierarchically iterate the above procedures until all the input trajectories are not separated.
Rank estimation of the affine space
We compute the eigenvalues of the moment matrix of the 2MD trajectory vectors p _{ α } [4], α=1,…,N and estimate its rank by using the geometric MDL. M is the number of the image frame.

(1)
Define the 2M×2M moment matrix M by
$$ \boldsymbol{M} = \sum_{\alpha=1}^{N}(\boldsymbol{p}_{\alpha}  \boldsymbol{p}_{C})(\boldsymbol{p}_{\alpha}  \boldsymbol{p}_{C})^{\top}, \quad \boldsymbol{p}_{C} = \frac{1}{N}\sum_{\alpha=1}^{N}\boldsymbol{p}_{\alpha}. $$(2) 
(2)
Compute the eigenvalues of M, and let λ _{1}≥⋯≥λ _{2M } be the sorted eigenvalues.

(3)
Compute the residuals J _{ r }, r=2,...,2M, for the fitted rD affine space by
$$ J_{r} = \sum_{\beta=r+1}^{2M}\lambda_{\beta}. $$(3) 
(4)
Compute the geometric MDLs [2] for each rank by
$$ \text{GMDL}(r) = J_{r}  \Bigl(rN + (r + 1)(2M  r)\Bigr)\epsilon^{2} \log\Bigl(\frac{\epsilon}{L}\Bigr)^{2}, $$(4)where ε is the standard deviation of feature point tracking accuracy, which we call this noise level, and L is a reference length, for which we can use an arbitrary value whose order is approximately the same as the data, say the image size.

(5)
Estimate the rank \(\hat {r}\) of the affine space which includes the input trajectories as
$$ \hat{r} = \arg\min_{r} \text{GMDL}(r) $$(5)
Separation of the trajectories
We separate the input trajectories by using the EM algorithm of Sugaya and Kanatani [3, 4] and can compute the likelihood P(αk) of the αth point for the fitted affine space k in the separation process. The likelihood P(αk) can be computed in the form
where \(\boldsymbol {p}_{C}^{(k)}\) and V ^{(k)} are the centroid and the covariance matrix of class k, respectively.
We compute the likelihoods P(α) and P ^{′}(α) of p _{ α } for the affine spaces before/after separation and then compute the average likelihoods P and P ^{′} by
Integration of the separated trajectories
After separation, we check whether the separated trajectories should be integrated by comparing the average likelihoods to avoid that the trajectories which belong to the same object are separated into different groups.
For all the separated trajectory groups, we integrate the two groups of the separated trajectories if the average likelihood Q before integration and Q ^{′} after integration satisfy the following inequality:
where Q and Q ^{′} are computed from Eqs. (6) and (7) for each affine space.
Real image experiments
By using real video images, we tested our method and computed the separation accuracy with RANSAC and LSA of Yan and Pollefeys [8] and GPCA of Vidal et al. [7].
Separation process
Figure 1 shows a separation process of our method. Figure 1 a shows the five decimated images of the input video sequence. The red points are the tracked feature points by the KLT tracker. We explain this separation process by showing tree expression of our separation in Fig. 2.
First, we estimated the dimension of the affine space which includes all the input trajectories and obtained that its dimension was 4. Since the resulting dimension was larger than 2, then we separated those input trajectories. We show the separation result in Fig. 1 b. For this result, we computed the average likelihoods for the affine spaces before/after separation and obtained the results 5.63×10^{−7} and 8.11×10^{−5}, respectively. Since these values satisfied the inequality in Eq. (1), we accepted the separation.
In the second stage, we estimated the dimensions of the affine spaces for the separated trajectories shown in Fig. 1 b of green and blue points. The resulting dimension of the blue points was 2 and stopped separation. Since the estimated dimension of the green points was 3, we separated those trajectories. The result is shown in Fig. 1 c. For this separation, the computed average likelihoods satisfied Eq. (1), and then, we accepted the separation. In the third stage, we estimated the dimensions of the affine spaces for the separated trajectories shown in Fig. 1 c of red and green points. The resulting dimension of the red points was 2 and stopped separation. The estimated dimension of the green points was 3, but the average likelihoods before/after separation did not satisfy Eq. (1). Hence, we stop separation. We show the final separation result in Fig. 1 d. In this sequence, the number of independently moving objects, which includes background points, is 3 and our method correctly separates the input trajectories into three groups.
Integrating process
We show another result in Fig. 4. Figure 4 a shows the five decimated images of the input video sequence. We also show the separation process in Fig. 3 and show the results in each separation process in Fig. 4 b–f. Figure 4 g is the final separation result. From this result, we can see that missseparation exists in Fig. 4 e and the points which belong to the same object are separated into three groups, which are the blue, green, and orange points in Fig. 4 g.
For all the pairs of the separated groups, we computed the average likelihoods before/after integration and check whether the separated groups should be integrated or not. Table 1 shows the computed average likelihoods. From this, the blue, green, and orange points are integrated. Figure 4 g shows the integrated result.
Accuracy comparison
We compared the accuracy of our method with RANSAC and LSA of Yan and Pollefeys [8] and GPCA of Vidal et al. [7]. We used the THopkins database [5] and original data and computed the separation accuracy for each method. Table 2 shows the result. The accuracy is computed by (number of correctly separated points)/(number of input points). As we can see, our method is superior to the compared existing methods.
Conclusions
In this paper, we propose a new method for segmenting feature point trajectories without assuming the number of objects. Based on the fact that trajectories of a rigidly moving object is constrained to a 2D or 3D affine space, we hierarchically separate input trajectories into two affine spaces until all the trajectories are divided into 2D or 3D affine spaces. In order to judge whether input trajectories should be divided or not, we compare the likelihoods before/after separation. After the separation process, we also check whether the separated trajectories should be integrated by comparing the likelihoods to avoid that the trajectories which belong to the same object are separated into different groups.
By using real video sequences, we checked the separation and integration processes of our method and confirmed the accuracy of our method by comparing existing methods.
References
Costeira JP, Kanade T (1998) A multibody factorization method for independently moving objects. Int J Comput Vision 29(3): 159–179.
Kanatani K, Matsunaga C (2002) Estimating the number of independent motions for multibody motion segmentation In: Proc of the 5th Asian Conference on Computer Vision (ACCV2002), 7–12, Melbourne, Australia.
Sugaya Y, Kanatani K (2004) Multistage unsupervised learning for multibody motion segmentation. IEICE Trans Inform Syst E87D(7): 1935–1942.
Sugaya Y, Kanatani K (2010) Improved multistage learning for multibody motion segmentation In: Proc. of International Conference of Computer Vision Theory and Applications (VISAPP2010), 199–206, Angers, France.
Sugaya Y, Kanatani K (2013) Removing mistracking of multibody motion video database Hopkins155 In: Proc. of the 24th British Machine Vision Conference (BMVC2013), Bristol, U.K.
Vidal R, Ma Y, Sastry S (2005) Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 27(12): 1945–1959.
Vidal R, Tron R, Hartley R (2008) Multiframe motion segmentation with missing data using PowerFactorization and GPCA. Int J Comput Vis 79(1): 85–105.
Yan J, Pollefeys M (2006) A general framework for motion segmentation: independent, articulated, rigid, nonrigid, degenerate and nondegenerate In: Proc. of the 9th European Conference on Computer Vision (ECCV2006), 94–106, Graz, Austria.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
YSa introduced the hierarchical separation of the feature point trajectories, carried out all the experiments, and wrote the manuscript. YSu introduced the rank estimation of the affine space using the geometric MDL and separation judgement using likelihoods of the fitted affine spaces and revised the manuscript written by YSa. Both authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Sako, Y., Sugaya, Y. Multibody motion segmentation for an arbitrary number of independent motions. IPSJ T Comput Vis Appl 8, 1 (2016). https://doi.org/10.1186/s4107401600023
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s4107401600023
Keywords
 Separation Process
 Affine Space
 Green Point
 Moment Matrix
 Motion Segmentation