Modern AI Techniques for Image Segmentation: A Comprehensive Analysis

Authors

  • Sameeksha Shrivastava IPS Academy, Institute of Engineering and Science, Indore
  • Mayank Shrivastava Sri Aurobindo Institute of Technology, Indore

DOI:

https://doi.org/10.26821/IJSHRE.12.7.2024.120707

Keywords:

Image segmentation, Transformer architectures, Foundation models, Computer vision, Deep learning

Abstract

Image segmentation has undergone a revolutionary transformation from 2018 to 2023, driven by the evolution from CNN-based architectures to transformer-based approaches and the emergence of foundation models. This analysis examines the current landscape of AI-powered segmentation techniques, evaluating their performance across diverse applications from medical imaging to autonomous driving.

Key findings reveal that transformer-based methods achieved 54.0% mIoU on ADE20K compared to 45.7% for traditional CNN approaches, while foundation models demonstrated unprecedented zero-shot capabilities across domains. Real-time methods achieved over 125 FPS while maintaining competitive accuracy, enabling widespread deployment in resource-constrained environments.

References

E. Xie et al., "SegFormer: Simple and efficient design for semantic segmentation with transformers," in Proc. NeurIPS, 2021.

L.-C. Chen et al., "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proc. ECCV, 2018.

A. Kirillov et al., "Segment anything," in Proc. ICCV, 2023.

K. Sun et al., "High-resolution representations for labeling pixels and regions," arXiv preprint arXiv:1904.04514, 2019.

B. Cheng et al., "Masked-attention mask transformer for universal image segmentation," in Proc. CVPR, 2022.

M. Everingham et al., "The pascal visual object classes (VOC) challenge," Int. J. Comput. Vision, vol. 88, no. 2, pp. 303–338, 2010.

F. Milletari et al., "V-net: Fully convolutional neural networks for volumetric medical image segmentation," in Proc. 3DV, 2016.

D. Acuna et al., "Efficient interactive annotation of segmentation datasets with polygon-rnn++," in Proc. CVPR, 2018.

M. Everingham et al., "The pascal visual object classes challenge: A retrospective," Int. J. Comput. Vision, vol. 111, no. 1, pp. 98–136, 2015.

T.-Y. Lin et al., "Microsoft coco: Common objects in context," in Proc. ECCV, 2014.

M. Cordts et al., "The cityscapes dataset for semantic urban scene understanding," in Proc. CVPR, 2016.

B. Zhou et al., "Scene parsing through ade20k dataset," in Proc. CVPR, 2017.

O. Ronneberger et al., "U-net: Convolutional networks for biomedical image segmentation," in Proc. MICCAI, 2015.

Z. Zhou et al., "Unet++: A nested u-net architecture for medical image segmentation," in Deep Learning in Medical Image Analysis, 2018.

O. Oktay et al., "Attention u-net: Learning where to look for the pancreas," arXiv preprint arXiv:1804.03999, 2018.

S. Zheng et al., "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proc. CVPR, 2021.

A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proc. ICLR, 2021.

J. Chen et al., "Transunet: Transformers make strong encoders for medical image segmentation," arXiv preprint arXiv:2102.04306, 2021.

S. Ren et al., "Faster r-cnn: Towards real-time object detection with region proposal networks," in Proc. NeurIPS, 2015.

K. He et al., "Mask r-cnn," in Proc. ICCV, 2017.

N. Carion et al., "End-to-end object detection with transformers," in Proc. ECCV, 2020.

D. Bolya et al., "Yolact: Real-time instance segmentation," in Proc. ICCV, 2019.

X. Wang et al., "Solov2: Dynamic and fast instance segmentation," in Proc. NeurIPS, 2020.

T. Brown et al., "Language models are few-shot learners," in Proc. NeurIPS, 2020.

Ö. Çiçek et al., "3d u-net: learning dense volumetric segmentation from sparse annotation," in Proc. MICCAI, 2016.

F. Milletari et al., "V-net: Fully convolutional neural networks for volumetric medical image segmentation," in Proc. 3DV, 2016.

F. Isensee et al., "nnu-net: a self-configuring method for deep learning-based biomedical image segmentation," Nature Methods, vol. 18, no. 2, pp. 203–211, 2021.

S. Bakas et al., "Advancing the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features," Scientific Data, vol. 4, no. 1, pp. 1–13, 2017.

Y. Gal and Z. Ghahramani, "Dropout as a bayesian approximation: Representing model uncertainty in deep learning," in Proc. ICML, 2016.

M. Fan et al., "Rethinking bisenet for real-time semantic segmentation," in Proc. CVPR, 2021.

C. Yu et al., "Bisenet: Bilateral segmentation network for real-time semantic segmentation," in Proc. ECCV, 2018.

Y. Hong et al., "Deep dual-resolution networks for real-time and accurate semantic segmentation," arXiv preprint arXiv:2101.06085, 2021.

L. Hoyer et al., "Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation," in Proc. CVPR, 2022.

M. E. Paoletti et al., "Deep learning classifiers for hyperspectral imaging: A review," ISPRS J. Photogramm. Remote Sens., vol. 158, pp. 279–317, 2019.

R. C. Daudt et al., "Urban change detection for multispectral earth observation using convolutional neural networks," in Proc. IGARSS, 2018.

C. E. Woodcock et al., "Free access to landsat imagery," Science, vol. 320, no. 5879, pp. 1011–1011, 2008.

L. Hoyer et al., "Mic: Masked image consistency for context-enhanced domain adaptation," in Proc. CVPR, 2023.

J. Dai and X. Lu, "Dran: Distributed residual-attention network for nighttime image semantic segmentation," Neurocomputing, vol. 431, pp. 1–11, 2021.

A. Chartsias et al., "Multimodal mr synthesis via modality-invariant latent representation," IEEE Trans. Med. Imaging, vol. 37, no. 3, pp. 803–814, 2017.

C. Li et al., "Grounded language-image pre-training," in Proc. CVPR, 2022.

J. Behley et al., "Semantickitti: A dataset for semantic scene understanding of lidar sequences," in Proc. ICCV, 2019.

B. Zoph and Q. V. Le, "Neural architecture search with reinforcement learning," in Proc. ICLR, 2017.

T. Li et al., "Federated optimization in heterogeneous networks," Proc. MLSys, 2020.

Downloads

Published

2025-07-24

How to Cite

Shrivastava, S., & Shrivastava, M. (2025). Modern AI Techniques for Image Segmentation: A Comprehensive Analysis. iJournals:International Journal of Software & Hardware Research in Engineering ISSN:2347-4890, 12(7). https://doi.org/10.26821/IJSHRE.12.7.2024.120707