Deep-sea polymetallic nodule image recognition method based on an improved Mask R-CNN model

WENG Zebang; LI Xiaohu; LI Jie; LI Zhenggang; WANG Hao; ZHU Zhimin; MENG Xingwei; LI Huaiming

doi:10.3969/j.issn.1001-909X.2025.03.004

PDF(11174 KB)

Journal of Marine Sciences ›› 2025, Vol. 43 ›› Issue (3) : 32-39. DOI: 10.3969/j.issn.1001-909X.2025.03.004

Deep-sea polymetallic nodule image recognition method based on an improved Mask R-CNN model

Author information +

History +

Abstract

Optical survey and evaluation of deep-sea polymetallic nodules face challenges such as low contrast, small object detection, and boundary ambiguity. This study proposes an improved Mask R-CNN model incorporating dynamic sparse convolution (DSConv) and simple parameter-free attention module (SimAM) for nodule image segmentation. SimAM effectively suppresses sediment background interference, while DSConv alleviates boundary blurring. The combined model achieves an accuracy of 91.5%, precision of 78.0%, recall of 75.1%, and IoU of 69.4%. When applying the improved model and the original model to the actual survey lines, it was found that in the identification results of the seabed nodules coverage rate, the proportion of data with an error less than 5%, increased from 57% of the original model to 77% of the improved model. This research can provide a reliable technical solution for the calculation of deep-sea polymetallic nodule coverage rate, and its modular design can also be extended to other fields of target recognition and image segmentation.

Key words

polymetallic nodules / image segmentation / Mask R-CNN / coverage rate / SimAM / DSConv

Cite this article

EndNote

Ris (Procite)

Bibtex

Download Citations

WENG Zebang , LI Xiaohu , LI Jie , et al . Deep-sea polymetallic nodule image recognition method based on an improved Mask R-CNN model[J]. Journal of Marine Sciences. 2025, 43(3): 32-39 https://doi.org/10.3969/j.issn.1001-909X.2025.03.004

References

List( Publishing order | Descend order by publishing year | Descend order by cited within ) Chart analysis

[1]	HEIN J R, KOSCHINSKY A, KUHN T. Deep-ocean poly-metallic nodules as a resource for critical materials[J]. Nature Reviews Earth & Environment, 2020, 1(3): 158-169. Cited in this article [1]

[2]	BALARAM V, MATHUR R, BANAKAR V K, et al. Determination of the platinum-group elements (PGE) and gold (Au) in the manganese nodule reference samples by nickel sulfide fire-assay and Te coprecipitation with ICP-MS[J]. Indian Journal of Marine Sciences, 2006, 35(1): 7-16. Cited in this article [1]

[3]

SCHOENING

, JONES

D O B

, GREINERT

. Compact-morphology-based poly-metallic nodule delineation[J]. Scientific Reports, 2017, 7: 13338.

https://doi.org/10.1038/s41598-017-13335-x

https://www.ncbi.nlm.nih.gov/pubmed/29042585

Cited in this article [1] Abstract

Poly-metallic nodules are a marine resource considered for deep sea mining. Assessing nodule abundance is of interest for mining companies and to monitor potential environmental impact. Optical seafloor imaging allows quantifying poly-metallic nodule abundance at spatial scales from centimetres to square kilometres. Towed cameras and diving robots acquire high-resolution imagery that allow detecting individual nodules and measure their sizes. Spatial abundance statistics can be computed from these size measurements, providing e.g. seafloor coverage in percent and the nodule size distribution. Detecting nodules requires segmentation of nodule pixels from pixels showing sediment background. Semi-supervised pattern recognition has been proposed to automate this task. Existing nodule segmentation algorithms employ machine learning that trains a classifier to segment the nodules in a high-dimensional feature space. Here, a rapid nodule segmentation algorithm is presented. It omits computation-intense feature-based classification and employs image processing only. It exploits a nodule compactness heuristic to delineate individual nodules. Complex machine learning methods are avoided to keep the algorithm simple and fast. The algorithm has successfully been applied to different image datasets. These data sets were acquired by different cameras, camera platforms and in varying illumination conditions. Their successful analysis shows the broad applicability of the proposed method.

[4]	TOMCZAK A, KOGUT T, KABAŁA K, et al. Automated estimation of offshore polymetallic nodule abundance based on seafloor imagery using deep learning[J]. Science of the Total Environment, 2024, 956: 177225. Cited in this article [1]

[5]	SONG W, WANG H L, ZHANG X P, et al. Deep-sea nodule mineral image segmentation algorithm based on Pix2PixHD[J]. Computers, Materials & Continua, 2022, 73(1): 1449-1462. Cited in this article [1]

[6]	WANG H L, DONG L H, SONG W, et al. Improved U-net-based novel segmentation algorithm for underwater mineral image[J]. Intelligent Automation & Soft Computing, 2022, 32(3): 1573-1586. Cited in this article [2]

[7]

LIU

L P

, LI

, YANG

J M

, et al. Target recognition and segmentation in turbid water using data from non-turbid conditions: A unified approach and experimental validation[J]. Optics Express, 2024, 32(12): 20654.

https://doi.org/10.1364/OE.524714

https://www.ncbi.nlm.nih.gov/pubmed/38859442

Cited in this article [2] Abstract

Semantic segmentation of targets in underwater images within turbid water environments presents significant challenges, hindered by factors such as environmental variability, difficulties in acquiring datasets, imprecise data annotation, and the poor robustness of conventional methods. This paper addresses this issue by proposing a novel joint method using deep learning to effectively perform semantic segmentation tasks in turbid environments, with the practical case of efficiently collecting polymetallic nodules in deep-sea while minimizing damage to the seabed environment. Our approach includes a novel data expansion technique and a modified U-net based model. Drawing on the underwater image formation model, we introduce noise to clear water images to simulate images captured under varying degrees of turbidity, thus providing an alternative to the required data. Furthermore, traditional U-net-based modified models have shown limitations in enhancing performance in such tasks. Based on the primary factors underlying image degradation, we propose a new model which incorporates an improved dual-channel encoder. Our method significantly advances the fine segmentation of underwater images in turbid media, and experimental validation demonstrates its effectiveness and superiority under different turbidity conditions. The study provides new technical means for deep-sea resource development, holding broad application prospects and scientific value.

[8]	SHAO M Y, SONG W, ZHAO X B. Polymetallic nodule resource assessment of seabed photography based on denoising diffusion probabilistic models[J]. Journal of Marine Science and Engineering, 2023, 11(8): 1494. Cited in this article [1]

[9]	AKKAYNAK D, TREIBITZ T. Sea-thru: A method for removing water from underwater images[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, CA, USA. IEEE, 2019. DOI:10.1109/cvpr.2019.00178. Cited in this article [1]

[10]	HE K M, GKIOXARI G, DOLLAR P, Mask R-CNN[C]// IEEE International Conference on Computer Vision (ICCV). Venice. IEEE, 2017. DOI:10.1109/iccv.2017.322. Cited in this article [1]

[11]

QUOC

T T P

, LINH

T T

, MINH

T N T

. Comparing U-Net convolutional network with Mask R-CNN in agricultural area segmentation on satellite images[C]// 2020 7th NAFOSTED Conference on Information and Computer Science (NICS). Ho Chi Minh City, Vietnam. IEEE, 2020. DOI:10.1109/nics51282.2020.9335856.

Cited in this article [1]

[12]	ERDEM F, OCER N E, MATCI D K, et al. Apricot tree detection from UAV-images using Mask R-CNN and U-Net[J]. Photogrammetric Engineering & Remote Sensing, 2023, 89(2): 89-96. Cited in this article [1]

[13]	VERELST T, TUYTELAARS T. Dynamic convolutions: Exploiting spatial sparsity for faster inference[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, USA. IEEE, 2020. DOI:10.1109/cvpr42600.2020.00239. Cited in this article [1]

[14]	XIE J S, WU Z Z, ZHU R J, et al. Melanoma detection based on swin transformer and SimAM[C]// IEEE 5th Information Technology, Networking, Electronic and Automa-tion Control Conference (ITNEC). Xi’an, China. IEEE, 2021. DOI:10.1109/itnec52019.2021.9587071. Cited in this article [1]

[15]	EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. Cited in this article [1]

[16]

ZHOU

Y T

, LI

W J

, YANG

. Instance segmen-tation of single cells using a transformer-based semantic-aware model and space-filling augmentation[C]// IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Waikoloa, HI, USA. IEEE, 2023. DOI:10.1109/wacv56688.2023.00589.

Cited in this article [1]

[17]	GILLANI I S, MUNAWAR M R, TALHA M, et al. YOLOv5, YOLO-X, YOLO-R, YOLOv7 performance comparison: A survey[C]//Artificial Intelligence and Fuzzy Logic System. Academy and Industry Research Collaboration Center (AIRCC), 2022. DOI:10.5121/csit.2022.121602. Cited in this article [1]