# Multibody motion segmentation for an arbitrary number of independent motions

- Yutaro Sako
^{1}and - Yasuyuki Sugaya
^{1}Email authorView ORCID ID profile

**8**:1

**DOI: **10.1186/s41074-016-0002-3

© The Author(s) 2016

**Received: **5 May 2016

**Accepted: **17 June 2016

**Published: **2 August 2016

## Abstract

We propose a new method for segmenting feature point trajectories tracked through a video sequence without assuming a number of independent motions. Our method realizes motion segmentation of feature point trajectories by hierarchically separating the trajectories into two affine spaces in a situation that we do not know the number of independently moving objects. We judge that input trajectories should be separated by comparing the likelihoods computed from those trajectories before/after separation. We also consider integration of the resulting separated trajectories for avoiding too much segmentations. By using real video images, we confirmed the efficiency of our proposed method.

## 1 Introduction

Separating independently moving objects in a video sequence is one of the important tasks in computer vision applications. Costeira and Kanade [1] proposed a segmentation algorithm based on the shape interaction matrix. Sugaya and Kanatani [3] proposed a multi-stage learning strategy using multiple models. Yan and Pollefeys [8] proposed a new local subspace fitting scheme. Vidal et al. [7] proposed a segmentation algorithm based on generalized principal component analysis (GPCA) [6]. By introducing GPCA for computing an initial segmentation, Sugaya and Kanatani [4] improved the multi-stage learning.

However, all these methods assume the number of moving objects. Kanatani and Matsunaga [2] proposed a method for estimating the number of independently moving objects based on the rank estimation of the affine space using the geometric minimum description length (MDL). However, estimating the number of independently moving objects based on the rank of the affine space is very difficult for real image sequences. For example, if an object motion is planar, the dimension of an affine space which includes its trajectories degenerates from 3-D to 2-D. Moreover, if two objects merely translate without rotation, the two 2-D affine spaces are parallel to each other. This means that a 3-D affine space which includes those 2-D affine spaces exists.

For this problem, we propose a new method for segmenting feature point trajectories without assuming the number of objects. Based on the fact that trajectories of a rigidly moving object is constrained to a 2-D or 3-D affine space, we hierarchically separate input trajectories into two affine spaces until all the trajectories are divided into 2-D or 3-D affine spaces. In order to judge whether input trajectories should be divided or not, we compare the likelihoods before/after separation. After the separation process, we also check whether the separated trajectories should be integrated by comparing the likelihoods to avoid that the trajectories which belong to the same object are separated into different groups.

## 2 Proposed method

From the fact that trajectories of a rigidly moving object is constrained to a 2-D or 3-D affine space, we can separate independently moving objects by hierarchically separating input trajectories into two affine spaces until all the trajectories are divided into 2-D or 3-D affine spaces. In order to realize the above separation, we need to overcome two problems. One problem is to properly estimate the dimension of the affine space which includes input trajectories. The other problem is that we need to judge whether input trajectories should be divided to stop hierarchical separation.

For the first problem, we can regard the rank of the moment matrix of the input trajectory vectors. The rank of the moment matrix can be obtained as the number of positive eigenvalues of the matrix. However, in the presence of noise, all eigenvalues are non-zero in general. Hence, we need to truncate small eigenvalues, but it is difficult to determine a proper threshold. We compute the rank of the moment matrix by using the geometric MDL [2].

For the second problem, we compare the average likelihoods of the trajectories for the affine spaces which are fitted to all the trajectories and the divided ones. We compute the average likelihoods before/after division and divide those trajectories if the likelihood after division is larger than that before division.

- 1.
Fit an affine space to the input trajectories, and compute its dimension

*d*by using the geometric MDL. - 2.
If

*d*≤2, then we stop the division process for the target trajectories. - 3.
Divide the trajectories into two affine spaces.

- (a)
Convert the trajectory vectors into 3-D vectors.

Please refer to [4] for the detail computation.

- (b)
Fit two planes to those 3-D vectors by the Taubin method.

- (c)
Convert the trajectory vectors into

*d*-D vectors. - (d)

- (a)
- 4.
Compute the average likelihoods

*P*and*P*^{′}of the trajectories before/after separation and accept the separation, and go to step 1 if the following inequality is satisfied.$$ \lfloor{\log_{10}P'}\rfloor-\lfloor{\log_{10}P}\rfloor > 0, $$(1)where ⌊·⌋ is the floor function. In our experience, if we compared the average likelihoods directly, the separation judgement was not stable. Thus, we compare the exponent part of the average likelihood.

- 5.
Else, reject the separation.

We hierarchically iterate the above procedures until all the input trajectories are not separated.

## 3 Rank estimation of the affine space

*M*-D trajectory vectors p

_{ α }[4],

*α*=1,…,

*N*and estimate its rank by using the geometric MDL.

*M*is the number of the image frame.

- (1)
Define the 2

*M*×2*M*moment matrix M by$$ \boldsymbol{M} = \sum_{\alpha=1}^{N}(\boldsymbol{p}_{\alpha} - \boldsymbol{p}_{C})(\boldsymbol{p}_{\alpha} - \boldsymbol{p}_{C})^{\top}, \quad \boldsymbol{p}_{C} = \frac{1}{N}\sum_{\alpha=1}^{N}\boldsymbol{p}_{\alpha}. $$(2) - (2)
Compute the eigenvalues of M, and let

*λ*_{1}≥⋯≥*λ*_{2M }be the sorted eigenvalues. - (3)
Compute the residuals

*J*_{ r },*r*=2,...,2*M*, for the fitted*r*-D affine space by$$ J_{r} = \sum_{\beta=r+1}^{2M}\lambda_{\beta}. $$(3) - (4)
Compute the geometric MDLs [2] for each rank by

$$ \text{G-MDL}(r) = J_{r} - \Bigl(rN + (r + 1)(2M - r)\Bigr)\epsilon^{2} \log\Bigl(\frac{\epsilon}{L}\Bigr)^{2}, $$(4)where

*ε*is the standard deviation of feature point tracking accuracy, which we call this*noise level*, and*L*is a reference length, for which we can use an arbitrary value whose order is approximately the same as the data, say the image size. - (5)
Estimate the rank \(\hat {r}\) of the affine space which includes the input trajectories as

$$ \hat{r} = \arg\min_{r} \text{G-MDL}(r) $$(5)

## 4 Separation of the trajectories

*P*(

*α*|

*k*) of the

*α*-th point for the fitted affine space

*k*in the separation process. The likelihood

*P*(

*α*|

*k*) can be computed in the form

where \(\boldsymbol {p}_{C}^{(k)}\) and V
^{(k)} are the centroid and the covariance matrix of class *k*, respectively.

*P*(

*α*) and

*P*

^{′}(

*α*) of p

_{ α }for the affine spaces before/after separation and then compute the average likelihoods

*P*and

*P*

^{′}by

## 5 Integration of the separated trajectories

After separation, we check whether the separated trajectories should be integrated by comparing the average likelihoods to avoid that the trajectories which belong to the same object are separated into different groups.

*Q*before integration and

*Q*

^{′}after integration satisfy the following inequality:

where *Q* and *Q*
^{′} are computed from Eqs. (6) and (7) for each affine space.

## 6 Real image experiments

By using real video images, we tested our method and computed the separation accuracy with RANSAC and LSA of Yan and Pollefeys [8] and GPCA of Vidal et al. [7].

### 6.1 Separation process

First, we estimated the dimension of the affine space which includes all the input trajectories and obtained that its dimension was 4. Since the resulting dimension was larger than 2, then we separated those input trajectories. We show the separation result in Fig. 1
b. For this result, we computed the average likelihoods for the affine spaces before/after separation and obtained the results 5.63×10^{−7} and 8.11×10^{−5}, respectively. Since these values satisfied the inequality in Eq. (1), we accepted the separation.

In the second stage, we estimated the dimensions of the affine spaces for the separated trajectories shown in Fig. 1 b of green and blue points. The resulting dimension of the blue points was 2 and stopped separation. Since the estimated dimension of the green points was 3, we separated those trajectories. The result is shown in Fig. 1 c. For this separation, the computed average likelihoods satisfied Eq. (1), and then, we accepted the separation. In the third stage, we estimated the dimensions of the affine spaces for the separated trajectories shown in Fig. 1 c of red and green points. The resulting dimension of the red points was 2 and stopped separation. The estimated dimension of the green points was 3, but the average likelihoods before/after separation did not satisfy Eq. (1). Hence, we stop separation. We show the final separation result in Fig. 1 d. In this sequence, the number of independently moving objects, which includes background points, is 3 and our method correctly separates the input trajectories into three groups.

### 6.2 Integrating process

Cyan | Green | Orange | Magenta | |
---|---|---|---|---|

4.89×10 | 8.76×10 | 3.76×10 | 8.81×10 | |

Red | 8.40×10 | 5.73×10 | 4.65×10 | 6.15×10 |

× | × | × | × | |

1.05×10 | 1.03×10 | 3.12×10 | 2.93×10 | |

Blue | 1.73×10 | 6.03×10 | 5.52×10 | 1.68×10 |

× | ∘ | ∘ | × | |

1.63×10 | 7.48×10 | 7.07×10 | ||

Cyan | 2.24×10 | 3.02×10 | 3.57×10 | |

× | × | × | ||

1.52×10 | 4.26×10 | |||

Green | 6.58×10 | 9.15×10 | ||

× | × |

### 6.3 Accuracy comparison

Accuracy comparison

Our method (%) | RANSAC (%) | LSA (%) | GPCA (%) | |
---|---|---|---|---|

Data1 | 100 | 92.50 | 93.43 | 89.31 |

Data2 | 98.16 | 57.60 | 77.42 | 62.21 |

Data3 | 100 | 84.82 | 96.43 | 81.25 |

Data4 | 99.52 | 80.95 | 90.48 | 80.95 |

Data5 | 100 | 80.30 | 92.75 | 80.30 |

Data6 | 99.06 | 68.40 | 99.53 | 68.40 |

Data7 | 100 | 45.49 | 96.99 | 45.49 |

Data8 | 90.82 | 78.31 | 89.54 | Not converge |

Data9 | 100 | 72.22 | 96.43 | 90.87 |

Data10 | 100 | 91.03 | 93.40 | 92.08 |

Data11 | 99.61 | 92.87 | 76.11 | 85.55 |

Data12 | 99.63 | 85.56 | 94.44 | 80.00 |

Data13 | 100 | 90.26 | 56.06 | 77.67 |

## 7 Conclusions

In this paper, we propose a new method for segmenting feature point trajectories without assuming the number of objects. Based on the fact that trajectories of a rigidly moving object is constrained to a 2-D or 3-D affine space, we hierarchically separate input trajectories into two affine spaces until all the trajectories are divided into 2-D or 3-D affine spaces. In order to judge whether input trajectories should be divided or not, we compare the likelihoods before/after separation. After the separation process, we also check whether the separated trajectories should be integrated by comparing the likelihoods to avoid that the trajectories which belong to the same object are separated into different groups.

By using real video sequences, we checked the separation and integration processes of our method and confirmed the accuracy of our method by comparing existing methods.

## Declarations

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Costeira JP, Kanade T (1998) A multibody factorization method for independently moving objects. Int J Comput Vision 29(3): 159–179.View ArticleGoogle Scholar
- Kanatani K, Matsunaga C (2002) Estimating the number of independent motions for multibody motion segmentation In: Proc of the 5th Asian Conference on Computer Vision (ACCV2002), 7–12, Melbourne, Australia.Google Scholar
- Sugaya Y, Kanatani K (2004) Multi-stage unsupervised learning for multi-body motion segmentation. IEICE Trans Inform Syst E87-D(7): 1935–1942.Google Scholar
- Sugaya Y, Kanatani K (2010) Improved multistage learning for multibody motion segmentation In: Proc. of International Conference of Computer Vision Theory and Applications (VISAPP2010), 199–206, Angers, France.Google Scholar
- Sugaya Y, Kanatani K (2013) Removing mistracking of multibody motion video database Hopkins155 In: Proc. of the 24th British Machine Vision Conference (BMVC2013), Bristol, U.K.Google Scholar
- Vidal R, Ma Y, Sastry S (2005) Generalized principal component analysis (GPCA). IEEE Trans Pattern Anal Mach Intell 27(12): 1945–1959.View ArticleGoogle Scholar
- Vidal R, Tron R, Hartley R (2008) Multiframe motion segmentation with missing data using PowerFactorization and GPCA. Int J Comput Vis 79(1): 85–105.View ArticleGoogle Scholar
- Yan J, Pollefeys M (2006) A general framework for motion segmentation: independent, articulated, rigid, non-rigid, degenerate and non-degenerate In: Proc. of the 9th European Conference on Computer Vision (ECCV2006), 94–106, Graz, Austria.Google Scholar