Open Access

Multiple fish tracking with an NACA airfoil model for collective behavior analysis

IPSJ Transactions on Computer Vision and Applications20168:4

DOI: 10.1186/s41074-016-0004-1

Received: 28 April 2016

Accepted: 17 June 2016

Published: 2 August 2016


We propose a visual tracking method with an NACA airfoil model for dense fish schools in which occlusions occur frequently. Although much progress has been made for tracking multiple objects, it remains a challenging task to track individuals due to factors such as occlusion and target appearance variation. In this paper, we first introduce a NACA airfoil model as a deformable appearance model of fish. For occluded fish, we estimate their positions, angles, and postures with template matching and simulated annealing algorithms to effectively optimize their parameters. To improve performance of tracking, we repeatedly track fish with the parameter estimation algorithm forwards and backwards. We prepared two real fish scenes in which the average number of fish is over 25 in each frame and multiple fish superimpose over 50 times. Experimental results for the scenes show that fish are practically tracked with our method compared to a tracking method based on a mixture particle filter. Over 75 % of fish in each scene have been tracked throughout the scene, and the average difference is less than 4 % of the mean body length of the school.


Visual tracking NACA airfoil model Collective behavior

1 Introduction

The tracking of multiple fish in a tank to measure their behaviors has many important applications in various fields of natural science, such as animal behavior and neuroscience [13]. Automatic surveillance of fish in aquariums and fish farms is also important for observing the growth and health of fish in order to improve their survival rate.

A number of multiple target tracking methods have been investigated, most of which are intended for tracking humans, e.g., [46]. Methods for tracking multiple fish in a shallow tank have also been developed, e.g., [3, 710]. However, it is quite difficult to track targets when they are homogeneous and their density is high, such as the fish in the school shown in Fig. 1 a. Videos of multiple fish pose many difficulties for visual tracking: fish frequently overlap with each other, their textures are weak, they deform their bodies by beating with their tails, and identification is difficult because they are homogeneous. The detection of fish, i.e., counting their number and estimating their positions and directions, in a cluster of fish such as that shown in Fig. 1 a is difficult, even manually. Therefore, it is not appropriate to simply apply the tracking-by-detection frameworks that have been widely used to track multiple targets, e.g., [5, 6], in this situation.
Fig. 1

a Snapshot of a school of sardines. b Camera setup

Terayama et al. tracked multiple fish in such a dense school using their appearance model based on the images of fish in a video [11]. They showed that if the number of fish in a cluster of fish is known, their positions and other parameters can be estimated by matching all of the combinations of the possible parameters. However, their algorithm is quite slow because of the number of their parameter combinations, and their model is not parameterized.

In this paper, we propose a novel multiple fish tracking method for a dense school of fish. First, we introduce a parameterized appearance model based on the NACA0012 airfoil model1, which has been adopted in biomechanics and computational fluid dynamics research, e.g., in [12], to represent a fish body. The model is simple but can effectively represent the deformations of fish caused by tail beating using small parameters as compared to the models in [79]. The results of our experiments, in which two types of swimming event were easily detected, show the effectiveness of this model. Second, we propose a practical tracking method, which estimates the parameters of fish with in a realistic time by using simulated annealing (SA) [13]. The approach for parameter estimation is based on that in [11]. However, their algorithm is unrealistic because of the combination of parameters. Since it is difficult to estimate the number and positions of fish in a cluster, in the proposed method, we begin to track only isolated fish that do not overlap with others. Therefore, we cannot track a fish that is initially occluded and in the middle of the video is isolated at the beginning of its trajectory. Finally, to deal with this problem, we propose a forward-backward tracking algorithm. This algorithm corresponds to manual tracking, in which we track fish in a cluster where the fish overlap by playing the video forward and backward repeatedly.

In the rest of the paper, we describe our tracking method for multiple fish in Section 2. We show the results of experiments using movies recorded in an aquarium and our event detection results in Section 3. Finally, we summarize this paper and state the plan for future work in Section 4.

2 Our method

In this section, we explain the details of our method: our appearance model and tracking algorithm using SA, and forward-backward tracking. Figure 2 a shows overviews of our tracking algorithm.
Fig. 2

Overviews of proposed tracking method. a Tracking with simulated annealing. b Forward-backward tracking

2.1 Appearance model of fish

We employ the NACA0012 airfoil model as the basis of a deformable appearance model of fish. Figure 3 a shows the NACA0012 model. As in [12], we define the deformation equation h(x,ϕ) around the initial center line in Fig. 3 a as
$$\begin{array}{*{20}l} h(x,\phi) = A(-(x-1)^{2} +1) \cos\frac{2\pi}{\lambda}(x-c \phi), \end{array} $$
Fig. 3

a NACA0012 airfoil model. b Appearance model for ϕ=0 and A=0.01. c Appearance model for ϕ=0.1 and A=0.1. d Largely deformed appearance model for ϕ=0.066 and A=0.3

where the parameters A, λ, c, x, and ϕ represent the maximum amplitude, wave length, phase velocity, position from the head as shown in Fig. 3 a, and phase of one beat cycle, respectively. For each scene, we first calculate the averaged brightness of the fish and construct 92 normal appearance models based on the NACA0012 model and Eq. (1), changing A and ϕ by filling in the form with the brightness. We call these models the NACA model. Figure 3 b, c shows examples of the NACA model. We set λ to 2 and c to 2 and change A from 0.01 to 0.3 and ϕ from 0 to 1. In order to deal with large deformation (bending), we add some largely deformed models based on h(x,ϕ) with a large amplitude. Figure 3 d shows an example of a largely deformed model.

2.2 Multiple fish tracking with simulated annealing

Initially, we track only isolated fish, because in a cluster of overlapping fish it is difficult to estimate the number of fish. When two or more fish tracked by our method begin to overlap, we estimate their parameters by matching the overlapped image and the image drawn from their parameters by applying the NACA model using SA. The details of our tracking algorithm are as follows.

We estimate not only the positions of targets but also their direction angles, parameters A and ϕ of the NACA model, length scale and thickness scale. We call these fish parameters FPs. Table 1 summarizes the parameters used in our method.
Table 1

Parameters of our method



Value explored

Position (x, y)


The entire image

Direction angle



A (amplitude)


0.01, 0.04, 0.07, 0.10, 0.15,0.22, 0.30

ϕ (phase of beat cycle)



Length scale

1 %

75–150 %

Thickness scale

1 %

70–130 %

For each frame t in a scene, we first binarize the frame and extract fish candidate regions (FCRs) using the binarized image, as shown in image (ii) in Fig. 2 a. To each FCR, we assign fish IDs from tracking results of the previous frames by calculating the minimum of the similarity2 between the FCR and all tracked fish. The image (iii) in Fig. 2 a shows examples of the assignment of IDs to each FCR. If no IDs are assigned to an FCR and its area size is in the range [a l ,a m ], we assign a new ID and begin to track a new fish as the FCR. We do not assign IDs to an FCR and terminate tracking if there is little or no overlap between the FCR and any images drawn from the FPs.

For an FCR that consists of multiple fish, it is difficult to estimate the number of fish in the FCR and their FPs simultaneously. However, if we know the number of fish in the FCR, we can accurately estimate their FPs by minimizing the sum of absolute differences (SAD) between the FCR and the image drawn from the FPs and the NACA model, as shown in [11]. We minimize the SAD using SA [13] to accelerate tracking process, although all the combinations of parameters were matched in [11].

In this paper, we define the neighbor of an FP as the FP in which one of the original parameters is changed according to their units. We define the cooling rate γ and the acceptance probability function p(s,s ,T) for the current similarity s, a candidate new similarity s , and the temperature T as
$$\begin{array}{*{20}l} \gamma &= \alpha^{1/m}\\ p(s, s^{\prime}, T) &= \left\{\begin{array}{ll} 1 &\text{if } s^{\prime}<s\\ \exp{(-(s^{\prime}-s)/T)} &\text{otherwise},\\ \end{array}\right. \end{array} $$
2) (3
where m is the number of IDs assigned to the FCR and α is constant close to, but smaller than, 1. Note that we employed the SAD as the similarity measure. We define the threshold t h s (l p) of the termination of the optimization process for the number of loops lp and threshold parameters t h min, t h max, t h Δ , l p 0 and l p max as
$$\begin{array}{*{20}l} th_{s}(lp) &= \left\{\begin{array}{ll} th_{\text{min}}+th_{\Delta}\times lp &\text{if } lp\leq lp_{0}\\ th_{\text{max}} &\text{if } lp<lp_{\text{max}}. \end{array}\right. \end{array} $$

We also terminate the optimization process if l pl p max.

Note that our optimization process is more practical than that in [11], because the order of our algorithm for an FCR that has m assigned IDs is \(\mathcal {O}(m)\) from Eq. (2). We refer to the proposed tracking method with SA as SAT (SA Tracking).

2.3 Forward-backward tracking

We repeatedly apply the SAT process to the same scene in reverse, i.e., we track the fish forward-backward. We call the former tracking process the former process. Figure 2 b shows an overview of a reverse tracking.

During reverse tracking, for an FCR, if the FPs were appropriately estimated in the former process, we simply trace the FPs (case 1 in Fig. 2 b). If the FPs were not estimated in the former process for the FCR, we calculate a novel FPs according to the SAT (case 2 in Fig. 2 b).

When the estimated tracklet tracked in the current process reaches the tracklet estimated in the former process, we connect these tracklets when they are close (case 3 in Fig. 2 b). We calculate the distance d between the FPs p p estimated by temporal tracking and p t estimated by the former process in the same frame defined by
$$\begin{array}{*{20}l} (p_{p},p_{t})=\sqrt{(x_{p}-x_{t})^{2}+ (y_{p}-y_{t})^{2}+(\theta_{p}-\theta_{t})^{2}}, \end{array} $$

where x p , x t , y p , and y t are positions and θ p and θ t are direction angles. We integrate the trajectories when the distance is smaller than d cn .

We repeat the forward-backward tracking process in order to improve the tracking performance until no new estimated FPs appear. We call the process FBT.

3 Experimental results

We conducted experiments to show the effectiveness of the proposed method. We first describe the dataset used in the experiments. In order to compare the tracking performance of the proposed method, we also performed tracking using an implementation based on [4]. We call the implementation MPF (Mixture Particle Filter). We prepared 30,000 and 40,000 particles for scene A and B in order to assign hundreds of particles to each fish. The parameters of a particle such as positions, direction angles, and NACA model parameters and the likelihood function of MPF are also based on the NACA model, as is the proposed SAT.

3.1 Dataset

We recorded videos of schools of sardines at Kujukushima Aquarium Umikirara, Nagasaki, Japan in March 2015. The videos were recorded at 30 fps using a HERO4 video camera. Figure 1 a, b shows a snapshot of the video and the camera setup, respectively.

For our experiments, we extracted 5-s scenes A and B (400 × 300 pixels) from the movies. Figure 4 a, b shows the first frames of the scenes. We set parameter a l to 100, a m to 450, α to 0.995, t h min to 40, t h max to 70, t h Δ to 4, l p 0 to 5, l p max to 20, and d cn to 10.
Fig. 4

a, b Snapshots of scenes A and B. c, d Examples of SM with MPF and with MPF and SAT. The numbers in c and d are tracking IDs. e, f Results of FBT for scenes A and B. The result of 0 in the repeat number of FBT corresponds to that of SAT

We manually prepared the ground-truth (GT) trajectories of all the fish in the scenes. In this study, we tracked only the fish having a connected (occluded) component that is completely contained in the frame of the scenes. For example, the fish in the white dotted oval in Fig. 4 b is not tracked in this frame.

Table 2 summarizes the basic data of the scenes: the average number of fish in each frame (AN), the total number of trajectories (NT) in each scene and the total number of overlappings with others (NO) seen from each trajectory.
Table 2

Basic data of scenes A and B













3.2 Evaluation metrics

We used five metrics to evaluate the tracking results based on the metrics in [6]. For each scene, we calculated the average ratio of the fish detected correctly using our tracking methods to the GT (Rcll) and to all fish detections that may contain failures (Prcn). We considered a pair of an estimated parameter and GT correctly matched if the metric distance between the estimated head position and the GT position is less than five pixels. To measure the tracking performance, we calculated the ratio of GT trajectories that are covered by the estimated tracklets for more than 95 % of their length to all the GT trajectories (MT). Our method cannot track a fish which is overlapped for the entire frames in the scene. We also measured the MT except for such fish (MT-I). To measure tracking failures, we counted the total number of ID switchings during fish crossing and particle migrations (switchings and migrations (SMs)).

3.3 Experiment 1

To evaluate the effectiveness of our method as compared to the MPF, we show the results of the SAT and the MPF in Table 3. Figure 5 a shows a snapshot of the tracking results in scene B using SAT. All isolated fish, except for one using the MPF in scene A, are correctly tracked with the SAT and the MPF. SM numbers using the SAT are smaller than those using the MPF. Figure 4 c, d shows examples of SM with the MPF and with both methods, respectively.
Fig. 5

a Snapshots of tracking results in scene B obtained using our method. The fish parameters of nine occluded fish in the white oval in b can be estimated. The numbers in a and b are tracking IDs. c, d Space-time trajectory plot of the entire sequence of scenes A and B. e, f Gliding (blue point) and bending (red point) events of scenes A and B. In c, d, e, and f, the X and Y axes represent 2D space

Table 3

Quantitative comparison of tracking results


















































The italicized values are the best scores among the methods in each scene

We tracked fish, repeating the FBT process five times. The results are shown in the SAT+FBT row in Table 3. Figure 5 c, d shows space-time trajectory plot of the entire sequence of scenes A and B. By virtue of backward tracking, we can estimate the FPs of a cluster of nine fish in the white oval in Fig. 5 b in the first frame. Figure 4 e, f shows the improvement in the tracking performance using FBT. Over 75 % (approximately 90 % in MT-I) of fish in each scene are correctly tracked by FBT, and the average differences between GT and estimated positions are less than 4 % of the mean body length in each scene.

The experiments are performed using our non-optimized implementation in Python and OpenCV. The average SAT computation times for processing one frame of scenes A and B were about 4 and 6 min.

3.4 Experiment 2

Since we employed the parametrized appearance model, we can easily find events that are useful for collective behavior analysis, such as bending (Fig. 3 d) and gliding. Gliding is a swimming phase in which there is no beating and the fish sometimes strongly bend their bodies to change the direction of their movement. We extracted such events using the amplitude parameter A. The blue points in Fig. 5 e, f show gliding events. The fish are bending at the red points in Fig. 5 e, f.

From the viewpoints of biomechanics and animal behavior, such measurements are essential, because gliding and bending are related to the energy consumption of swimming [14] and constitute a type of information transmission in a school [2, 15].

4 Conclusion

In this paper, we proposed an appearance-based tracking method for multiple fish tracking. Over 75 % (approximately 90 % in MT-I) of the fish in two scenes were successfully tracked. The experimental results indicate that our method is practical for multiple fish tracking and collective motion analysis.

Our method is suitable for fish filmed from the bottom because the NACA model represent a fish viewed from the bottom and top. We would like to extend the applicable range of our method for movies taken from other directions. Our future work also includes improving the tracking performance by introducing data association frameworks and interaction models between fishes to estimate the states in the next frame. Furthermore, it is worth accelerating our algorithm in order to track thousands of fishes in schools.

5 Endnotes

1 The NACA airfoils are models of shapes for aircraft wing sections originally developed by the National Advisory Committee for Aeronautics (NACA). The digits represent the parameters of the shapes.

2 We employed the sum of absolute differences (SAD) as the similarity measure.



This work were supported by JSPS KAKENHI Grant Number 26240023 and 26610114. The authors sincerely appreciate the cooperation of Kujukushima Aquarium Umikirara to take movies of the school of sardines.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

Graduate School of Frontier Sciences, the University of Tokyo
Faculty of Science and Engineering, Kindai University
Graduate School of Human and Environmental Studies, Kyoto University


  1. Vicsek T, Zafeiris A (2012) Collective motion. Phys Rep 517(3): 71–140.View ArticleGoogle Scholar
  2. Strandburg-Peshkin A, Twomey CR, Bode NW, Kao AB, Katz Y, Ioannou CC, Rosenthal SB, Torney CJ, Wu HS, Levin SA, et al. (2013) Visual sensory networks and effective information transfer in animal groups. Curr Biol 23(17): 709–711.View ArticleGoogle Scholar
  3. Delcourt J, Denoël M, Ylieff M, Poncin P (2013) Video multitracking of fish behaviour: a synthesis and future perspectives. Fish Fish 14(2): 186–204.View ArticleGoogle Scholar
  4. Vermaak J, Doucet A, Pérez P (2003) Maintaining multimodality through mixture tracking In: Proc 9th IEEE Int Conf Comput Vis, 1110–1116. Scholar
  5. Okuma K, Taleghani A, De Freitas N, Little JJ, Lowe DG (2004) A boosted particle filter: Multitarget detection and tracking In: Proc 8th European Conf Comput Vis, 28–39. Scholar
  6. Wu B, Nevatia R (2006) Tracking of multiple, partially occluded humans based on static body part detection In: Proc IEEE Comput Soc Conf Comput Vis Pattern Recog, 951–958. Scholar
  7. Ukita N, Kitajima T, Kidode M (2004) Estimating the positions and postures of non-rigid objects lacking sufficient features based on the stick and ellipse model In: Proc Conf Comput Vis Pattern Recog Workshop. Scholar
  8. Mitsugami I, Kakusho K, Minoh M (2009) Efficient particle filtering for a non-rigid object based on PCA about changes of its shape and motion. IEICE Trans Inf Syst192-D(8): 1270–1278.Google Scholar
  9. Butail S, Paley DA (2012) Three-dimensional reconstruction of the fast-start swimming kinematics of densely schooling fish. J R Soc Interface 9(66): 77–88.View ArticleGoogle Scholar
  10. Fukunaga T, Kubota S, Oda S, Iwasaki W (2015) Grouptracker: Video tracking system for multiple animals under severe occlusion. Comput Biol Chem 57: 39–45.View ArticleGoogle Scholar
  11. Terayama K, Hongo K, Habe H, Sakagami M-a (2015) Appearance-based multiple fish tracking for collective motion analysis In: Proc 3rd IAPR Asian Conf Pattern Recog, 361–365. Scholar
  12. Akimoto H, Miyata H (1993) Finite-volume simulation of a flow about a moving body with deformation In: Proc 5th Int Symp Comput Fluid Dynamics, 13–18. Scholar
  13. Kirkpatrick S, Vecchi MP, et al. (1983) Optimization by simulated annealing. Science 220(4598): 671–680.MathSciNetView ArticleMATHGoogle Scholar
  14. Hemelrijk CK, Reid DAP, Hildenbrandt H, Padding JT (2014) The increased efficiency of fish swimming in a school. Fish Fish 16(3): 511–521.View ArticleGoogle Scholar
  15. Radakov DV (1973) Schooling in the Ecology of Fish. Wiley, New York.Google Scholar


© The Author(s) 2016