Generic and attribute-specific deep representations for maritime vessels
- Berkan Solmaz†1Email authorView ORCID ID profile,
- Erhan Gundogdu†1,
- Veysel Yucesoy1 and
- Aykut Koc1
https://doi.org/10.1186/s41074-017-0033-4
© The Author(s) 2017
Received: 21 June 2017
Accepted: 7 November 2017
Published: 11 December 2017
Abstract
Fine-grained visual categorization has recently received great attention as the volumes of labeled datasets for classification of specific objects, such as cars, bird species, and air-crafts, have been increasing. The availability of large datasets led to significant performance improvements in several vision-based classification tasks. Visual classification of maritime vessels is another important task, assisting naval security and surveillance applications. We introduced, MARVEL, a large-scale image dataset for maritime vessels, consisting of 2 million user-uploaded images and their various attributes, including vessel identity, type, category, year built, length, and tonnage, collected from a community website. The images were categorized into vessel type classes and also into superclasses defined by combining semantically similar classes, following a semi-automatic clustering scheme. For the analysis of the presented dataset, extensive experiments have been performed, involving several potentially useful applications: vessel type classification, identity verification, retrieval, and identity recognition with and without prior vessel type knowledge. Furthermore, we attempted interesting problems of visual marine surveillance such as predicting and classifying maritime vessel attributes such as length, summer deadweight, draught, and gross tonnage by solely interpreting the visual content in the wild, where no additional cues such as scale, orientation, or location are provided. By utilizing generic and attribute-specific deep representations for maritime vessels, we obtained promising results for the aforementioned applications.
Keywords
1 Introduction
The coastal and marine surveillance systems are mainly based on sensors such as radar and sonar, which allow detecting marine vessels and taking responsive actions. Vision-based surveillance systems containing electro-optic imaging sensors can also be exploited for developing robust and cost-effective systems. Categorization of maritime vessels is of utmost importance to improve the capabilities of such systems. For a given image of a ship, the goal is to automatically identify it using computer vision and machine learning techniques. Vessel images include important clues regarding different attributes such as vessel type, category, gross tonnage, length and draught. A large-scale dataset would be beneficial for extracting such clues and learning compelling models from images containing several types of vessels.
Presence of benchmark datasets [1] with large quantities of images and manual labels with meaningful attributes has resulted in a significant increase in visual object categorization performance by allowing the use of convenient machine learning methods such as deep architectures [2]. Later, these powerful deep architectures have been employed in a more challenging problem, fine-grained visual categorization, by either training on datasets from scratch [3], by fine-tuning deep architectures trained on large-scale datasets [4], or by exploiting the previously trained architectures with specific modifications [5].
To classify images with a fine-grained resolution, a considerable amount of training data is necessary for a respectable model generalization. Thus, fine-grained datasets were collected for specific object categories. Some examples are aircraft datasets [6, 7]; Caltech-UCSD bird species dataset [8] consisting of 12 K images, car make, and model datasets; Standford cars dataset [9] containing 16 K car images; and CompCars dataset [10] of 130 K images. One work related to marine vessel recognition is [11], where 130,000 random example images from the Shipspotting website [12] is utilized and a convolutional neural network [2] is trained for classifying vessel types. In our dataset, 140,000 images are engaged for vessel type classification among 26 superclasses constructed using a semi-supervised clustering approach. Furthermore, constructed vessel superclasses are balanced; the training set is arranged to have an equal number of examples from each superclass, after augmenting data for vessel type classes with lower number of examples. However, there is a significant imbalance of examples among the classes in [11], which may result in a bias in classification towards the dominant classes with more examples. Hence, imbalance makes it more difficult to validate the performance of different classifiers. In this work, for measuring vessel classification performance, we report mean per class accuracies. In addition, we accomplish further important tasks with a vast amount of vessel images and obtain pleasing results, which will be described in details in the following sections.
In order to utilize the-state-of-the-art fine-grained visual classification methods for maritime vessel categorization, we collected a dataset consisting of a total of 2 million images downloaded from the Shipspotting website [12], where hobby photographers upload images of maritime vessels and corresponding detailed annotations including types, categories, tonnage, draught, length, summer deadweight, year built, and International Maritime Organization (IMO) numbers, which uniquely identify ships. To the best of our knowledge, the collected dataset, MARitime VEsseLs (MARVEL) [13, 14], is the largest-scale dataset with meta-data composed of the aforementioned attributes, suited for fine-grained visual categorization, recognition, retrieval, and verification tasks, as well as any possible future applications.
In addition to the introduced large-scale dataset, our other major contributions are presenting generic representations for maritime vessels, as well as targeting visual vessel analysis from five different aspects: (1) vessel type classification, (2) vessel identity verification, (3) vessel retrieval, (4) vessel identity recognition with and without prior type knowledge, and (5) specific vessel attributes (draught, length, gross tonnage, and summer deadweight) prediction and classification. To verify the practicality of MARVEL and encourage researchers, we present baseline results for these tasks. By providing relevant splits of the dataset for each application and inspecting the consistency of associated labels, we form a comparison basis for visual analysis of maritime vessels. Moreover, we believe our structured dataset will be a benchmark for evaluating approaches designed for fine-grained recognition. The researchers may also develop several new applications with the help of this dataset in addition to the aforementioned applications.
2 MARVEL dataset properties
MARVEL dataset consists of 2 million marine vessel images collected from Shipspotting website [12]. For most of the images in the dataset, the following attributes are available: beam, year built, draught, flag, gross tonnage, IMO number, name, length, category, summer deadweight, MMSI, vessel type.
Distribution of collected vessel images: Number of images belonging to each photo category, individual vessel, and vessel type are depicted in a, b, and c, respectively. The largest group among photo categories is chemical and product tankers. General cargo is the vessel type including highest number of images. Further statistics are provided on the right columns: In b, 8388 marine vessels are present containing at least 50 images. In c, there are 132 vessel type categories including at least 100 images
Histograms of four vessel attribute values on MARVEL dataset: a draught, b length, c gross tonnage, and d summer deadweight
3 Potential computer vision tasks on MARVEL dataset
Huge quantity of images and their annotations, existing in MARVEL, makes it applicable to directly employ recent methods utilizing deep architectures such as AlexNet [2] for vessel categorization. One may choose one of the provided vessel attributes such as vessel type or category and apply classification methods for categorizing images according to the selected attribute.
In MARVEL there are more than 8000 unique vessels (carrying unique IMO numbers) having more than 50 example images as shown in Fig. 1 b. It is also feasible to use the dataset for both vessel verification and identity recognition, which could be a vital part of a maritime security system, analogous to a scenario where vehicle make and model recognition is crucial for a traffic security system.
The main foci of this study on MARVEL dataset are five folds: (1) vessel classification since content of cargo that a ship carries, specified by its type, is crucial for maritime surveillance, (2) identity verification where the ultimate goal is to find out if a pair of images belong to the same vessel with a unique IMO number, (3) retrieval where one might desire to query a vessel image and retrieve a certain number of similar images from a database, (4) identity recognition which is a challenging though interesting task which aims at recognizing a specific vessel within vessels of same type or among all other vessels (This might be likened to a facial recognition task.), and finally (5) specific attribute prediction and classification, where the objective is to grasp draught, length, gross tonnage, and summer deadweight of a vessel by simply analyzing the 2-D visual content. With an aim to achieve these goals, we design generic and attribute specific representations which are powerful in describing marine vessel images.
Visual comparison of two very similar classes: crude oil tanker (top row) and oil products tanker (bottom row)
Vessel verification task serves for deciding whether a pair of vessel images belong to the same vessel or not. This may be beneficial for a naval surveillance scenario, where a specific vessel is required to be tracked using an electro-optic imaging system.
For the task of vessel retrieval relating to vessel classification, the goal is to retrieve images belonging to providing a query image, several images with similar content are retrieved from the database.
Vessel recognition aims at revealing the accurate identity of a vessel by analyzing an unseen example image of it and finding out the matching vessel within a group of vessels. This task may be particularly useful for scenarios of marine surveillance and port registration. For this task, first, we performed recognition for vessels considering their type labels, for instance, identifying a passenger ship among other passenger ships. Next, we attempt a more challenging recognition problem, identifying vessels where no additional cues such as vessel type labels or category labels are given.
Moreover, as novel problems, we attempt tasks of predicting and classifying vessel attributes: draught, gross tonnage, length, and summer deadweight. The objective here is to quantify these attributes based on 2-D visual content only, which may ameliorate the practicality of coastal surveillance systems, since that avoids the need for retaining meta-data for optical systems, namely camera parameters, camera position, and distance to the vessel, while estimating physical dimensions of a vessel based on its appearance. Another beneficial use of this task may be for safe marine traffic routing as well as for the calculation of port access and transit fees, when vessel dimensions need to be known. Furthermore, there are studies, proving that presence of attribute-based representations are helpful for several computer vision tasks including object recognition [16], detection [17], and identification [18]. The attribute-based learned representations for marine vessels in this work may be utilized in a similar fashion aiding other visual analysis tasks.
4 Superclasses for vessel types
To generate superclasses from vessel types, the first 50 major vessel types containing the largest amount of example images are selected and sorted according to their quantity. The vessel type with the largest number of images which is employed in our superclass generation, is general cargo, consisting of 324,561 example images. The class with the smallest number of images is the timber carrier, accommodating only 1837 images. In this work, to investigate the visual similarities among vessel types, MatConvNet Toolbox [19] implementation of a pre-trained convolutional neural network (CNN) architecture, VGG-F [20], is adopted. Features are extracted posterior to resizing images to 224×224. Utilizing the penultimate layer acctivations of VGG-F [20] as visual representations of images, each image is described by a 4096-dimensional feature vector. Based on these feature vectors, we calculated a dissimilarity matrix for the 50 major vessel classes. To generate superclasses, 1/10 of all collected images belonging to 50 major classes are randomly selected (approximately 130,000 images) and individual class statistics are estimated. Prior to calculating a dissimilarity matrix, we removed outliers following the preprocessing step explained below.
4.1 Outlier removal
Although image annotations for most categories are valid and correct, interior images of vessels are also present in MARVEL dataset. Thus, we prune outliers within individual vessel types and avoid them while computing the dissimilarity matrix. First, feature vector dimensionality is reduced to 10 by principal component analysis (PCA) using all examples of 50 major vessel type classes, since Kullback-Leibler divergence is utilized in dissimilarity computation and determinants of very high dimensional matrices become unbounded. After dimensionality reduction, each vessel type class is processed independently and Gaussian distributions are fitted; means and covariances of each distribution are estimated. The feature vectors of corresponding classes are whitened to obtain unit variance within each class. We intent to filter out unlikely examples in the dataset to obtain a clear dissimilarity matrix. Next, we utilize χ 2 distribution since the dataset is already whitened. For each example in individual classes, the sum of the square values of the 10-dimensional feature vectors are used as samples drawn from the χ 2 distribution with 10° of freedom. Cumulative distribution function (cdf) value for each sample is calculated and removed from the class set if the cdf value is greater than 0.95, which corresponds to the samples drawn from the 5% tail of the χ 2 distribution.
4.2 Dissimilarity matrix and superclass generation
Dissimilarity matrix for 50 major vessel type classes, computed based on symmetrized divergence. Lower values indicate more similarity
Distribution of the vessel types. In total, 1,190,169 images, belonging to one of 26 superclasses, are available for vessel type classification
4.3 Superclass classification
Normalized confusion matrix for categorization of 26 superclasses representing vessel types. Accuracy, computed by averaging diagonal entries, is 73.14%
5 Experiments on potential applications
In this section, we make use of our dataset, MARVEL, for potential maritime applications and vessel verification, retrieval, identity recognition, and attribute prediction and classification. In the following subsections, these applications and necessary experimental settings are explained.
During all experiments, we follow training and testing strategies similar to [10]. First, 8000 vessels with unique IMO numbers are selected such that each vessel will have 50 example images, resulting in a total of 400,000 images. This data is divided into two splits: training and testing. The training set consists of 4035 vessels (201,750 example images in total), and the test set contains 3965 vessels (198,250 example images in total). There exist 109 vessel type labels among 400,000 examples, and training and test sets are split in a way that the number of vessel types are identical in both sets. In the rest of the paper, we call the training split of this subset as IMO training set, and the test split as IMO test set.
We propose three deep CNN-based generic representations for marine vessels on IMO training set by making use of vessel type and/or vessel IMO labels. Hence, we train the same architecture of [2] as in vessel classification task and modify it accordingly with an aim to capture more details in vessel images: For the last layer, rather than 26 label classes, we use 109, 4035, and 4144 label classes. These three different classifiers focus on discriminating vessel types, vessel IMO numbers (classifying individual vessels on IMO training set), and both vessel types and IMO numbers (jointly classifying type and IMO numbers of vessels on IMO training set), respectively. We compare the performances of these three representations over computer vision tasks, which are described below in details.
Deep representations for example images are extracted as the penultimate layer activations of the trained networks (as in the superclass generation part in Section 4) with 4096 dimensions. More discriminative features being desired, we extract the penultimate layer activations prior to the rectified linear unit (ReLU) layer, which carry more information than the layer after ReLU since the negative values are cast to zero after ReLU. This choice makes our vessel verification performance better than the case with the deep representations after ReLU case.
During all experiments utilizing convolutional neural networks, we select batch sizes as 256 without normalization and decaying learning rates, consisting of logarithmically equally spaced values between 0.01 and 0.0001. For superclass classification, we train the networks for 60 epochs and for attribute classification and prediction, we train the networks for 50 epochs, since we notice that the training error does not decrease with further training. The implementation of the networks are based on the MatConvNet Toolbox [19].
5.1 Vessel verification, retrieval, and recognition
5.1.1 Vessel verification
Akin to face verification [24], car model verification is applied in CompCars dataset [10] to serve for conceivable purposes in transportation systems. That kind of task is claimed to be more complicated compared to face verification, since car model verification is performed on images with unconstrained viewpoints. On MARVEL dataset, we perform maritime vessel verification where the attribute to be verified is the vessel identity. Please note that our task is more challenging compared to identifying other attributes such as category or vessel type. Furthermore, this problem is more challenging than both car model and face verification tasks, since it is desired to identify/verify pairs of individual vessels by looking only at their appearances which have more diversity.
Precision-recall curves for vessel verification task for three representations designed for marine vessels: 109 (shown in blue), 4144 (shown in green) dimensional output, and Siamese network based (shown in orange)
Vessel verification results on 50,000 positive pairs and 50,000 negative pairs of vessels for the nearest neighbor and SVM classifiers by utilizing the generic and end-to-end learning-based vessel representations learned in IMO training set, which does not contain any images of the vessels in IMO test set
Representation | True positives | True negatives | False positives | False negatives | Accuracy | Precision | Recall | |
---|---|---|---|---|---|---|---|---|
NN | 109-dimensional output based | 44,978 | 40,198 | 9,802 | 5,022 | 85.18% | 0.82 | 0.90 |
SVM | 109-dimensional output based | 45,503 | 45,422 | 4,578 | 4,497 | 90.93% | 0.91 | 0.91 |
NN | 4144-dimensional output based | 47,305 | 41,148 | 8,852 | 2,695 | 88.45% | 0.84 | 0.95 |
SVM | 4144-dimensional output based | 46,225 | 47,744 | 2,256 | 3,775 | 93.97% | 0.95 | 0.92 |
NN | Siamese network based | 44,459 | 40,390 | 9,610 | 5,541 | 84.85% | 0.82 | 0.89 |
SVM | Siamese network based | 45,869 | 46,150 | 3,850 | 4,131 | 92.02% | 0.92 | 0.92 |
5.1.2 Vessel retrieval
Compelling amount of research efforts [27–30] have been put on content-based image retrieval (CBIR) as volumes of image databases are dramatically growing. Particularly, vessel retrieval is another promising application, potentially required in a maritime security system, where a user would like to query a database with a vessel image and retrieve similar images. It may also help annotating vessel images uploaded to a database when no meta-data is present. In our application, the retrieved content is not chosen as either the superclasses of vessel types that we constructed as the coarse attribute in Section 4.3, or the IMO number (aiming to identify the exact vessel), which is too fine for a retrieval task (This is studied as a recognition problem in Section 5.1.3.). Instead, we use 109 vessel types of the 8000 unique vessels with 50 example images, as the content for the retrieval task. We perform content based vessel retrieval (CBIR), using Euclidean (L 2) and chi-squared (χ 2) distances as the similarity metric for four different vessel representations.
Vessel retrieval results for four representations: the feature vectors of pre-trained VGG-F network (shown in magenta), AlexNet network based 109 (shown in blue), 4144 (shown in green) dimensional output based, and Siamese network (shown in orange) representations
Here, the deep representations learned specifically for maritime vessels significantly outperform the deep representation (VGG-F) learned for general object categorization for 1000 classes [2, 20] for both distance metrics. In addition, χ 2 distance is superior in CBIR than L 2 distance, for the tested representations. A 109-dimensional output-based generic representation performs the best in this experiment, since it is specifically designed for learning vessel types. The retrieval performance of Siamese network, utilizing end-to-end learning, is lower, compared to 109 and 4144-dimensional representations.
5.1.3 Vessel recognition
Visual object recognition is one of the most crucial topics of computer vision. Especially, face recognition has been studied extensively, and state-of-the-art methods [31, 32], which perform effectively on the benchmark datasets [33–35], have been proposed. Since encouraging performance results are obtained with recent methods, another application performed, utilizing MARVEL, is vessel recognition task, where the ultimate goal is to perceive a vessel’s identity by its visual appearance. It might not be meaningful for object types, other than maritime vessels or faces, such as cars, since same car models with same color have no visual differences and technically are not distinguishable. Nevertheless, individual vessels generally carry distinctive features, as shapes of vessels belonging to the same vessel type category may vary significantly due to their customized construction processes. Here, we utilize the learned generic vessel representations as feature vectors for vessels.
Vessel type specific recognition: Average recognition accuracies computed within each of the 29 vessel types on IMO testing set are depicted for extracted 109- (blue), 4035- (red), and 4144- (green) dimensional output-based representations and VGG-VD-19-based 4144-dimensional output-based representation (gray) learned in IMO training set
Vessels belonging to research survey vessels, suction dredgers, and supply vessels type classes of are the most distinguishable ones with recognition accuracies above 90%. On the other hand, vessels of crude oil tankers, vehicle carriers, and containership classes have less distinct differences and a slightly lower recognition performances are achieved, compared to the rest of the classes. Please note that, as number of unique vessels increase in a vessel type group, the random chance and recognition rates slightly decrease as expected, since it becomes a more challenging recognition problem. Yet, recognition accuracies over 77% can be obtained even though the number of unique vessels exceeds a hundred, such as in ro-ro cargo and chemical tanker vessel types.
Vessel recognition performance on IMO testing set, composed of 3965 marine vessels, by utilizing nearest neighbor search on 109-, 4035-, and 4144-dimensional output-based representations learned in IMO training set
109-dim. output based representation | 4035-dim. output based representation | 4144-dim. output based representation | 4144-dim. output based representation (VGG-VD-19) | |
---|---|---|---|---|
Recognition accuracy | 23.87% | 59.25% | 65.13% | 65.78% |
5.2 Vessel attribute prediction and classification
MARVEL dataset includes several labeled vessel attributes some of which relate to the visual content. Here, as interesting applications, by studying only the visual content, we targeted predicting and classifying four important attributes: draught, gross tonnage, length, and summer deadweight.
The draught of a vessel is a measure describing the vertical distance between the waterline and the bottom of vessel hull. Draught, defining the minimum depth of water a vessel can operate, is an important factor for navigating and routing vessels while avoiding shallow water pathways. Length of a vessel does matter for navigation and marine traffic routing, as well as for calculating fees during vessel registration. Consequently, estimating length of a vessel effectively from a single image may be very beneficial for maritimeapplications. Gross tonnage is a nonlinear measure calculated based on overall interior volume (from keel to funnel) of a vessel. It is important in determining the number of staff, safety rules, registration fees, and port dues. Summer deadweight defines how much mass a ship can safely carry. It excludes the weight of the ship and includes the sum of the weights of cargo, fuel, fresh water, ballast water, provisions, passengers, and crew [36].
Such efforts of attribute estimation is especially valuable for coastal guarding and surveillance, since it allows grasping the physical specifications of a vessel remotely and only by a captured image. In order to achieve these objectives, we both test the use of our powerful 4144-dimensional output-based generic vessel representation and also employ specific attribute-based deep representations. Please note that estimating these attributes are very challenging due to the lack of notion of scale, pose, perspective, camera parameters, etc. The only available information is the appearance of a vessel. For all experiments of attribute prediction, we learn models in IMO training set and evaluate performances of the learned models in IMO testing set. Images missing valid attribute labels were not used in these experiments. Attribute labels, as opposed to being discrete numbers as in vessel type labels or IMO number labels, are continuous and might be unique for each vessel.
We design two sets of experiments: regression and classification. Approaching the problem as a regression task, we represent vessel images by either generic deep models we designed for marine vessels or deep models trained for estimating specific attributes. As in the previous experiments, we extract the penultimate layer activations of the trained networks as feature vectors and utilize a support vector regressor [25, 37] for prediction. For learning attribute-specific deep models, we use AlexNet as a base CNN architecture and modify the last loss layer with an objective to minimize an L2-norm loss, approaching the problem as a least squares regression. For performance evaluation, we compute two measures.
Vessel attribute prediction performance, measured as correlation of manual truth and predicted labels for 158,850 images in IMO testing set
Draught | Gross tonnage | Length | Summer deadweight | |
---|---|---|---|---|
SVM | 0.7556 | 0.8301 | 0.8696 | 0.7930 |
CNN | 0.7911 | 0.2699 | 0.9042 | 0.0830 |
Vessel attribute prediction performance, measured as coefficient of determination between manual truth and predicted labels for 158,850 images in IMO testing set
Draught | Gross tonnage | Length | Summer deadweight | |
---|---|---|---|---|
SVM | 0.598 | 0.554 | 0.743 | 0.481 |
CNN | 0.770 | 0.419 | 0.863 | 0.466 |
Predicted and true values of draught within example vessel categories: Significant correlations (r) are found after hypothesis testing as indicated by p values for asphalt/bitumen tankers (a), cable layer (b), patrol vessels (c), and supply vessels (d)
Confusion matrices for classifying draught: a generic vessel features combined with a support vector machine classifier and b learned draught-specific representation combined with a softmax classifier
Confusion matrices for classifying gross tonnage: a generic vessel features combined with a support vector machine classifier and b learned gross tonnage-specific representation combined with a softmax classifier
Confusion matrices for classifying length: a generic vessel features combined with a support vector machine classifier and b learned length-specific representation combined with a softmax classifier
Confusion matrices for classifying summer-deadweight: a generic vessel features combined with a support vector machine classifier and b learned summer deadweight-specific representation combined with a softmax classifier
Vessel attribute classification performance of generic and attribute-specific representations, calculated for four attributes on 158,850 images of IMO testing set
Classified attribute | Employed representation | Top 1 accuracy | Top 2 accuracy | Top 3 accuracy | Top 4 accuracy | Top 5 accuracy |
---|---|---|---|---|---|---|
Draught | Generic model + SVM | 0.1302 | 0.3104 | 0.4432 | 0.5506 | 0.6320 |
Gross tonnage | Generic model + SVM | 0.4755 | 0.6393 | 0.7418 | 0.8178 | 0.8678 |
Length | Generic model + SVM | 0.4539 | 0.6345 | 0.7317 | 0.8019 | 0.8510 |
Summer deadweight | Generic model + SVM | 0.4304 | 0.6209 | 0.7310 | 0.7998 | 0.8525 |
Draught | Attribute-specific trained CNN | 0.1834 | 0.4159 | 0.5761 | 0.6884 | 0.7774 |
Gross tonnage | Attribute-specific trained CNN | 0.5515 | 0.7492 | 0.8556 | 0.9131 | 0.9454 |
Length | Attribute-specific trained CNN | 0.5289 | 0.7266 | 0.8257 | 0.8896 | 0.9328 |
Summer deadweight | Attribute-specific trained CNN | 0.5155 | 0.7364 | 0.8317 | 0.8938 | 0.9288 |
6 Discussions
Introducing MARVEL, a large-scale dataset for maritime vessels, our goal is to point out several research problems and applications for maritime images. MARVEL dataset, composed of a massive number of images and their meta-data, carries interesting attributes to be considered for visual analysis tasks. In this work, we presented our efforts for visual classification of maritime vessel types, retrieval, identity verification, identity recognition, and estimation of physical attributes such as draught, length, and tonnage of vessels. For each of these tasks, we provide the details (experimental settings, labels, training and testing splits) to make results reproducible.
For organizing the dataset, first, we performed semantic analysis and combined vessel type classes which are visually indistinguishable. Next, we pruned annotations for attributes semi-automatically, converting them to certain metric units, filtering out the missing and wrong entries and ensured reliability of the labels. We also present baseline results for several computer vision tasks to inspire future applications on MARVEL. Moreover, we provide generic deep representations for maritime vessels and prove their success in aforementioned tasks by performing extensive experiments. We achieve promising performance in vessel classification, recognition, and retrieval. Moreover, we observe that attributes are predictable as long as they are visually distinguishable. Hence, attributes such as length and draught can be estimated accurately and by solely exploiting visual data. What remains of key interest for future work is the enhancement of performance for the aforesaid tasks, which can be fulfilled by utilizing more powerful visual representations, developing sophisticated methods.
7 Endnote
1 A negative pair indicates a pair of different vessel images, whereas a positive pair corresponds to a pair of vessel images belonging to a unique vessel.
Declarations
Acknowledgements
We would like to thank to Koray Akçay for his invaluable support and special consultancy for maritime vessels.
Authors’ contributions
VY took charge of data collection and organization. VY and EH generated the statistics of the collected dataset. EH implemented and performed the representation learning for marine vessels and carried out vessel type classification experiments. BS designed the marine vessel applications (verification, retrieval, recognition, attribute estimation) and implemented and carried out the related experiments. BS proposed superclass generation and EH implemented and executed the task. EH organized the initial manuscript and BS created the supplemental. BS later revised and extended the work and writing. AK coordinated the work during the study and did English revising. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252. doi:10.1007/s11263-015-0816-y.MathSciNetView ArticleGoogle Scholar
- Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, 1097–1105.. Curran Associates Inc., Lake Tahoe, Nevada. http://dl.acm.org/citation.cfm?id=2999134.2999257.Google Scholar
- Lin D, Shen X, Lu C, Jia J (2015) Deep lac: Deep localization, alignment and classification for fine-grained recognition In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1666–1674. doi:10.1109/CVPR.2015.7298775.
- Xie S, Yang T, Wang X, Lin Y (2015) Hyper-class augmented and regularized deep learning for fine-grained image classification In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2645–2654. doi:10.1109/CVPR.2015.7298880.
- Liu L, Shen C, van den Hengel A (2015) The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4749–4757. doi:10.1109/CVPR.2015.7299107.
- Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv preprint arXiv:1306.5151.Google Scholar
- Vedaldi A, Mahendran S, Tsogkas S, Maji S, Girshick R, Kannala J, Rahtu E, Kokkinos I, Blaschko MB, Weiss D, Taskar B, Simonyan K, Saphra N, Mohamed S (2014) Understanding objects in detail with fine-grained attributes In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3622–3629.. Institute of Electrical and Electronics Engineers, USA.Google Scholar
- Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.Google Scholar
- Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization In: Computer Vision Workshops (ICCVW), 2013 IEEE International Conference On, 554–561. doi:10.1109/ICCVW.2013.77.
- Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3973–3981. doi:10.1109/CVPR.2015.7299023.
- Dao-Duc C, Xiaohui H, Morère O (2015) Maritime vessel images classification using deep convolutional neural networks In: Proceedings of the Sixth International Symposium on Information and Communication Technology. SoICT 2015, 276–281.. ACM, New York. doi:10.1145/2833258.2833266. http://doi.acm.org/10.1145/2833258.2833266.Google Scholar
- Ship Photos and Ship Tracker. www.shipspotting.com. Accessed 1 May 2017.
- Gundogdu E, Solmaz B, Yücesoy V, Koç A (2016) MARVEL: a large-scale image dataset for maritime vessels In: Asian Conference on Computer Vision, 165–180.. Springer International Publishing, Cham.Google Scholar
- Solmaz B, Gundogdu E, Karaman K, Koç A, et al (2017) Fine-grained visual marine vessel classification for coastal surveillance and defense applications In: Electro-Optical Remote Sensing XI. vol. 10434, 104340A.. International Society for Optics and Photonics, USA.Google Scholar
- Zhang X, Zhou F, Lin Y, Zhang S (2016) Embedding label structures for fine-grained feature representation In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1114–1123.. Institute of Electrical and Electronics Engineers, USA.Google Scholar
- Farhadi A, Endres I, Hoiem D, Forsyth D (2009) Describing objects by their attributes In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, 1778–1785.. Institute of Electrical and Electronics Engineers, USA. doi:10.1109/CVPR.2009.5206772.View ArticleGoogle Scholar
- Lampert CH, Nickisch H, Harmeling S (2009) Learning to detect unseen object classes by between-class attribute transfer In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference On, 951–958. doi:10.1109/CVPR.2009.5206594.
- Sun Y, Bo L, Fox D (2013) Attribute based object identification In: 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, May 6-10, 2013, 2096–2103. doi:10.1109/ICRA.2013.6630858.
- Vedaldi A, Lenc K (2015) In: Proceedings of the 23rd ACM international conference on Multimedia, 689–692.. ACM.Google Scholar
- Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:1405.3531.Google Scholar
- Crammer K, Singer Y (2002) On the learnability and design of output codes for multiclass problems. Mach Learn 47(2):201–233. doi:10.1023/A:1013637720281.View ArticleMATHGoogle Scholar
- Keerthi SS, Sundararajan S, Chang KW, Hsieh CJ, Lin CJ (2008) A sequential dual method for large scale multi-class linear svms In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’08, 408–416.. ACM, New York. doi:10.1145/1401890.1401942. http://doi.acm.org/10.1145/1401890.1401942.View ArticleGoogle Scholar
- Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874.MATHGoogle Scholar
- Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1891–1898.. Institute of Electrical and Electronics Engineers, USA. doi:10.1109/CVPR.2014.244.View ArticleGoogle Scholar
- Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27–12727. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.View ArticleGoogle Scholar
- Hadsell R, Chopra S, LeCun Y (2006) Dimensionality reduction by learning an invariant mapping In: Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference On. vol. 2, 1735–1742.. IEEE, USA.Google Scholar
- Guo JM, Prasetyo H (2015) Content-based image retrieval using features extracted from halftoning-based block truncation coding. IEEE Trans Image Process 24(3):1010–1024. doi:10.1109/TIP.2014.2372619.MathSciNetView ArticleGoogle Scholar
- Qiu G (2003) Color image indexing using btc. IEEE Trans Image Process 12(1):93–101.View ArticleGoogle Scholar
- Lai CC, Chen YC (2011) A user-oriented image retrieval system based on interactive genetic algorithm. IEEE Trans Instrum Meas 60(10):3318–3325. doi:10.1109/TIM.2011.2135010.View ArticleGoogle Scholar
- Gordo A, Almazan J, Revaud J, Larlus D (2017) End-to-end learning of deep visual representations for image retrieval. Int J Comput Vis 124(2):237–254.MathSciNetView ArticleGoogle Scholar
- Lai J, Jiang X (2016) Classwise sparse and collaborative patch representation for face recognition. IEEE Trans Image Process 25(7):3261–3272. doi:10.1109/TIP.2016.2545249.MathSciNetView ArticleGoogle Scholar
- Gong D, Li Z, Tao D, Liu J, Li X (2015) A maximum entropy feature descriptor for age invariant face recognition In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5289–5297.. Institute of Electrical and Electronics Engineers, USA. doi:10.1109/CVPR.2015.7299166.View ArticleGoogle Scholar
- Lee KC, Ho J, Kriegman DJ (2005) Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans Pattern Anal Mach Intell 27(5):684–698. doi:10.1109/TPAMI.2005.92.View ArticleGoogle Scholar
- Sim T, Baker S, Bsat M (2003) The cmu pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618. doi:10.1109/TPAMI.2003.1251154.View ArticleGoogle Scholar
- Ricanek K, Tesafaye T (2006) Morph: a longitudinal image database of normal adult age-progression In: 7th International Conference on Automatic Face and Gesture Recognition (FGR06), 341–345. doi:10.1109/FGR.2006.78.
- Turpin EA, McEwen WA (1980) Merchant Marine Officers’ Handbook. 4th edn.. Cornell Maritime Press, Centreville, Maryland.Google Scholar
- Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural computation 12(5):1207–1245.View ArticleGoogle Scholar