Kechen Song(Associate professor)


  • Supervisor of Doctorate Candidates  Supervisor of Master's Candidates
  • Name (English):Kechen Song
  • E-Mail:
  • Education Level:With Certificate of Graduation for Doctorate Study
  • Gender:Male
  • Degree:博士
  • Status:Employed
  • Alma Mater:东北大学
  • Teacher College:机械工程与自动化学院


Open time:..

The Last Update Time:..

Switch language:CN

Mobile Version
Current position: Home > RESEARCH INTEREST

Strip Steel Surface Defect Detection

    Surface Defect  Segmentation and Classification based on Few-shot Learning

    This paper proposed a simple but effective few-shot segmentation method named cross position aggregation network (CPANet), which intends to learn a network that can segment untrained S3D categories with only a few labeled defective samples. Using a cross-position proxy (CPP) module, our CPANet can effectively aggregate long-range relationships of discrete defects, and support auxiliary (SA) can further improve the feature aggregation capability of CPP. Moreover, CPANet introduces a space-squeeze attention (SSA) module to aggregate multi-scale context information of defect features and suppresses disadvantageous interference from background information. In addition, a novel S3D few-shot semantic segmentation dataset FSSD-12 is proposed to evaluate our CPANet. Through extensive comparison experiments and ablation experiments, we explicitly evaluate that our CPANet with the ResNet-50 backbone achieves state-of-the-art performance on dataset FSSD-12. CPANet.png
    Hu Feng, Kechen Song, et al. Cross Position Aggregation Network for Few-shot Strip Steel Surface Defect Segmentation [J]. IEEE Transactions on Instrumentation and Measuremente, 2023, 72, 5007410. (paper) (code & dataset)

    TGRNet.jpg Metal surface defect segmentation can play an important role in dealing with the issue of quality control during the production and manufacturing stages. There are still two major challenges in industrial applications. One is the case that the number of metal surface defect samples is severely insufficient, and the other is that the most existing algorithms can only be used for a specific surface defects and it is difficult to generalize to other metal surfaces. In this work, a theory of few-shot metal generic surface defect segmentation is introduced to solve these challenges. Simultaneously, the Triplet-Graph Reasoning Network (TGRNet) and a novel dataset Surface Defects-4i are proposed to achieve this theory. For Surface Defects-4i, it includes multiple categories of metal surface defect images to verify the generalization performance of our TGRNet and adds the non-metal categories (leather and tile) as extensions.
    Yanqi Bao, Kechen Song, et al. Triplet- Graph Reasoning Network for Few-shot Metal Generic Surface Defect Segmentation [J]. IEEE Transactions on Instrumentation and Measuremente, 2021. (paper) (code & dataset)(ESI highly cited, 5/2022)HighlyCitedPaper.png

    In this article, we propose a novel few-shot defect classification method, which aims to recognize novel defective classes with few labeled samples. Specifically, the proposed method follows a transductive paradigm and consists of two modules, i.e., graph embedding and distribution transformation (GEDT) module and optimal transport (OPT) module. The GEDT module not only makes full use of the relevant correlation information between different features in the support set and the query set but also ensures the consistent distribution of the graph embedding results. Then, the OPT module is leveraged to implement few-shot classification in a transductive manner. Finally, experiments conducted on the proposed metal surface defect dataset, and the results demonstrate that the proposed method achieves the state-of-the-art performance under both one-shot and five-shot settings. GTnet.jpg
    Weiwei Xiao, Kechen Song, et al. Graph Embedding and Optimal Transport for Few-Shot Classification of Metal Surface Defect [J]. IEEE Transactions on Instrumentation and Measuremente, 2022. (paper) (code & dataset)


    In this paper, we propose a feature-aware network (FaNet) for a few shot defect classification, which can effectively distinguish new classes with a small number of labeled samples. In our proposed FaNet, we use ResNet12 as our baseline. The feature-attention convolution module (FAC) is applied to extract the comprehensive feature information from the base classes, as well as to fuse semantic information by capturing the long-range feature relationships between the upper and lower layers. Meanwhile, during the test phase, an online feature-enhance integration module (FEI) is adopted to average the noise from the support set and query set defect images, further enhancing image features among the different tasks. In addition, we construct a large-scale strip steel surface defects few shot classification dataset (FSC-20) with 20 different types. Experimental results show that the proposed method achieves the best performance compared to state-of-the-art methods for the 5-way 1-shot and 5-way 5-shot tasks. 

    Wenli Zhao, Kechen Song, et al. FaNet: Feature-aware Network for Few Shot Classification of Strip Steel Surface Defects [J]. Measurement, 2023. (paper) (code & dataset)

    Surface Defect Detection, Segmentation and Classification based on Supervised/ Semi-supervised


    NEU surface defect database-Fig.2.png

    In this paper, we proposed a novel defect detection system based on deep learning and focused on a practical industrial application: steel plate defect inspection. In order to achieve strong classification-ability, this system employs a baseline convolution neural network (CNN) to generate feature maps at each stage. And then the proposed multilevel-feature fusion network (MFN) combines multiple hierarchical features into one feature, which can include more location details of defects. Based on these multilevel features, a region proposal network (RPN) is adopted to generate regions of interest (ROIs). For each ROI, a detector, consisting of a classifier and a bounding box regressor, produces the final detection results. Finally, we set up a defect detection dataset NEU-DET for training and evaluating our method. On the NEU-DET, our method achieves 74.8/82.3 mAP with baseline networks ResNet34/50 by using 300 proposals. In addition, by using only 50 proposals, our method can detect at 20 fps on a single GPU and reach 92% of the above performance, hence the potential for real-time detection.
    Yu He, Kechen Song, et al. An End-to-end Steel Surface Defect Detection Approach via Fusing Multiple Hierarchical Features[J]. IEEE Transactions on Instrumentation and Measuremente, 2020. (paper) (dataset) (Popular Articles, 12/2020--1/2023) (ESI highly cited, 12/2020-1/2023)HighlyCitedPaper.png(ESI Hot Paper,5/2022)ESI Hot Paper.png



    This article proposes a pyramid feature fusion and global context attention network for pixel-wise detection of surface defect, called PGA-Net. In the framework, the multiscale features are extracted at first from backbone network. Then the pyramid feature fusion module is used to fuse these features into five resolutions through some efficient dense skip connections. Finally, the global context attention module is applied to the fusion feature maps of adjacent resolution, which allows effective information propagate from low-resolution fusion feature maps to high-resolution fusion ones. In addition, the boundary refinement block is added to the framework to refine the boundary of defect and improve the result of the prediction. The final prediction is the fusion of the five resolutions fusion feature maps. The results of evaluation on four real-world defect datasets demonstrate that the proposed method outperforms the state-of-the-art methods on mean intersection of union and mean pixel accuracy (NEU-Seg: 82.15%, DAGM 2007: 74.78%, MT_defect: 71.31%, Road_defect: 79.54%).

    Hongwen Dong, Kechen Song, et al. PGA-Net: Pyramid Feature Fusion and Global Context Attention Network for Automated Surface Defect Detection [J]. IEEE Transactions on Industrial Informatics, 2020,16(12),7448-7458. (paper) (dataset)(ESI highly cited, 7/2021--1/2023)HighlyCitedPaper.png

    Classification of Semi-supervised
    Defect inspection is very important for guaranteeing the surface quality of industrial steel products, but related methods are based primarily on supervised learning which requires ample labeled samples for training. However, there can be no doubt that inspecting defects on steel surface is always a data-limited task due to difficult sample collection and expensive expert labeling. Unlike the previous works in which only labeled samples are treated using supervised classifiers, we propose a semi-supervised learning (SSL) defect classification approach based on multi-training of two different networks: a categorized generative adversarial network (GAN) and a residual network. This method uses the GAN to generate a large number of unlabeled samples. And then the multi-training algorithm that uses two classifiers based on different learning strategies is proposed to integrate both labeled and unlabeled into SSL process. Finally, through the multiple training process, our SSL method can acquire higher accuracy and better robustness than the supervised one using only limited labeled samples. Experimental results clearly demonstrate that the effectiveness of our proposed method, achieving the classification accuracy of 99.56%.


    Yu He, Kechen Song, et al. Semi-supervised Defect Classification of Steel Surface Based on Multi-training and Generative Adversarial Network [J]. Optics and Lasers in Engineering, 2019, 122: 294-302. (paper

    Surface Defect Feature Extraction and Recognition based on Traditional Methods

    Adjacent Evaluation Completed Local Binary Patterns (AECLBP):
    NEU surface defect database.jpg Automatic recognition method for hot-rolled steel strip surface defects is important to the steel surface inspection system. In order to improve the recognition rate, a new, simple, yet robust feature descriptor against noise named the adjacent evaluation completed local binary patterns (AECLBP) is proposed for defect recognition. In the proposed approach, an adjacent evaluation window which is around the neighbor is constructed to modify the threshold scheme of the completed local binary pattern (CLBP). Experimental results demonstrate that the proposed approach presents the performance of defect recognition under the influence of the feature variations of the intra-class changes, the illumination and grayscale changes. Even in the toughest situation with additive Gaussian noise, the AECLBP can still achieve the moderate recognition accuracy. In addition, the strategy of using adjacent evaluation window can also be used in other methods of local binary pattern (LBP) variants.

    Kechen Song and Yunhui Yan. A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects [J].Applied Surface Science, 2013, 285: 858-864. (database)

    Scattering Convolution Network(SCN):
    In order to improve the tolerance ability of local deformations for current feature extraction methods, a scattering operator is applied to extract features for defect recognition. Firstly, a scattering transform builds non-linear invariants representation by cascading wavelet transforms and modulus pooling operators, which average the amplitude of iterated wavelet coefficients. Then, an improved network named the scattering convolution network (SCN) is introduced to build largescale invariants. Finally, a surface defect database named the Northeastern University (NEU) surface defect database is constructed to evaluate the effectiveness of the feature extraction methods for defect recognition. Experimental results demonstrate that the SCN method presents the excellent performance of defect recognition under the influence of the feature variations of the intra-class changes, the illumination and grayscale changes. Even in the less number of training, the SCN method can still achieve the moderate recognition accuracy.


    Kechen Song, Shaopeng Hu and Yunhui Yan. Automatic Recognition of Surface Defects on Hot-Rolled Steel Strip Using Scattering Convolution Network [J].Journal of Computational Information Systems, 2014, 10(7):3049-3055 . (paper)

    Surface Defect Detection based on Traditional Methods

    Saliency Linear Scanning Morphology(SLSM):
    SLSM.jpg Surface defect detection of silicon steel strip is an important section for non-destructive testing system in iron and steel industry. To detect the interesting defect objects for silicon steel strip under oil pollution interference, a new detection method based on saliency linear scanning morphology is proposed. In the proposed method, visual saliency extraction is employed to suppress the clutter background. Meanwhile, a saliency map is obtained for the purpose of highlighting the potential objects. Then, the linear scanning operation is proposed to obtain the region of oil pollution. Finally, the morphology edge processing is proposed to remove the edge of oil pollution interference and the edge of reflective pseudo-defect. Experimental results demonstrate that the proposed method presents the well performance for detecting surface defects including wipe-crack-defect, scratch-defect and small-defect.

    Kechen Song, Shaopeng Hu, Yunhui Yan and Jun Li. Surface defect detection method using saliency linear scanning morphology for silicon steel strip under oil pollution interference[J]. ISIJ International, 2014, 54(11):2598-2607 .

    Saliency Convex Active Contour Model(SCACM):
    Accurate detection of surface defects is an indispensable section in steel surface inspection system. In order to detect the micro surface defect of silicon steel strip, a new detection method based on the saliency convex active contour model is proposed. In the proposed method, visual saliency extraction is employed to suppress the clutter background for the purpose of highlighting the potential objects. The extracted saliency map is then exploited as a feature, which is fused into a convex energy minimization function of local-based active contour. Meanwhile, a numerical minimization algorithm is introduced to separate the micro surface defects from cluttered background. Experimental results demonstrate that the proposed method presents the well performance for detecting micro surface defects including spot-defect and steel-pit-defect. Even in the cluttered background, the proposed method almost detects all of the micro defects without any false objects.


    Kechen Song and Yunhui Yan. Micro surface defect detection method for silicon steel strip based on saliency convex active contour model[J].Mathematical Problems in Engineering, 2013,  (paper)

    Surface Defect Image Segmentation based on Traditional Methods

    Convex Active Contour Segmentation Model:


    In order to solve problems existing in Chan-Vese model and Local Binary Fitting (LBF) model, such as model sensitivity to the initial contour position and running slow in the segmentation of strip steel defect image, a novel model local information-based convex active contour (LICAC) is proposed. By converting non-convex optimization problem to a convex optimization problem via convex optimization technology , and applying the Split Bregman method for fast solutionthe issues of the sensitivity to the initial contour position occurring in Chan-Vese model and LBF model are solved. With introduction of the local information, the new model is efficient in the segmentation of the strip surface defect image which is non-uniform gray. By using this model to segment single-target region strip defect image, four common defect categories, including weld, rust, holes and scratches are experimented, and experimental results show that the segmentation effect and operation time of the proposed model are better than the rest two kinds. In addition, this model can also be used to segment multi-target regions defect image, four common defect categories are experimented, including scratches, inclusion, pitting, and wrinkles, and experimental results have verified the validity of the model.

    SONG Kechen, YAN Yunhui, PENG Yishu, DONG Dewei. Convex Active Contour Segmentation Model of Strip Steel Defects Image Based on Local Information[J].JOURNAL OF MECHANICAL ENGINEERING,2012,48(20):1-7. (Chinese)

    Structure Tensor and Active Contour:
    In order to address the segmentation problem for cold rolled silicon steel surface defect based on the texture background, a novel method based on structure tensor and active contour model is proposed. Firstly, image local information is introduced to the structure tensor. In the extracted feature space of structure tensor, KL distance is treated as a regional similarity measure of the probability density to establish active contour model for image segmentation. Finally the numerical solution of Split-Bregman is used to solve the model. The proposed method is introduced to segment silicon steel surface defects, which are longitudinal scratches, horizontal scratches, foreign bodies, and holes. The experimental results show that this method can segment the silicon steel surface defect areas accurately.


    SONG Kechen, YAN Yunhui,WANG Zhan, HU Changfa. Research on segmentation method for silicon steel surface defect based on structure tensor and active contour[J].Computer Engineering and Applications,2012,48(32):224-228.(Chinese)

Rail Surface Defect Detection

    In-service and No-service Rail 

    Rail surface defect inspection.png

    No-service Rail Surface Defect Detection based on Stereoscopic Images (RGB-D)

    Unsupervised Saliency Detection


    An unsupervised stereoscopic saliency detection method based on a binocular line-scanning system is proposed in this article. This method can simultaneously obtain a highly precise image as well as profile information while also avoids the decoding distortion of the structured light reconstruction method. 
    Menghui Niu, Kechen Song, et al.  Unsupervised Saliency Detection of Rail Surface Defects using Stereoscopic Images [J]. IEEE Transactions on Industrial Informatics, 2021,17(3),2271-2281.  (paper) (code)(RSDDS-113 dataset) (ESI highly cited, 7/2021-1/2023)HighlyCitedPaper.png  Reported by 《Imaging & Machine Vision Europe

    Collaborative Learning Attention Network

    We propose a neural network named collaborative learning attention network (CLANet) for no-service rail surface defect inspection. The proposed method consists of three main stages: feature extraction, cross-modal information fusion, and defect location and segmentation. A multimodal attention block is proposed to highlight complex defect object with a new cross-modal fusion strategy. Furthermore, dual stream decoder enriches the representation of advanced features and avoids the dilution of information in the decoding stage. Suffering from the scarcity of defective data, an industrial RGB-D dataset NEU RSDDS-AUG is built. Finally, ablation studies verify the effectiveness of our proposed method.

    2022-Collaborative Learning Attention Network Based on RGB Image and Depth Image for Surface Defect Inspection of No-Service Rail.jpg

    Jingpeng Wang, Kechen Song, et al. Collaborative Learning Attention Network Based on RGB Image and Depth Image for Surface Defect Inspection of No-Service Rail [J]. IEEE/ASME Transactions on Mechatronics, 2022 . (paper)

    No-service Rail Surface Defect Detection based on RGB Images



    In this article, we propose an acquisition scheme with two lamp light and color scan line charge-coupled device (CCD) to alleviate uneven illumination. Then, a multiple context information segmentation network is proposed to improve NRSD segmentation. The network makes full use of context information based on dense block, pyramid pooling module, and multi-information integration. Besides, the attention mechanism is applied to optimize extracted information by filtering noise. For the problem of real sample shortage, we propose to utilize artificial samples to train the network. And an NRSD data set NRSD-MN is built with artificial NRSDs and natural NRSDs. Experimental results show that our method is feasible and has a good segmentation effect on artificial and natural NRSDs. 
    Defu Zhang, Kechen Song, et al.  MCnet: Multiple Context Information Segmentation Network of No-service Rail Surface Defects [J]. IEEE Transactions on Instrumentation and Measuremente, 2021, 70,5004309  (paper) (code) (dataset)

    Image-level weakly supervised segmentation

    A novel image-level weakly supervised segmentation formulation is proposed for no-service rail surface defects. These defects are decomposed into three sub-categories (strip-shaped, spot-shaped, block-shaped) according to the size prior information (area and shape). Then, a method is presented with a pooling combination module. The pooling combination module makes full use of the size attributes of the sub-category by utilizing different pooling functions for different sub-categories. Experimental results demonstrate that our method is effective and outperforms the state-of-the-art methods.

    Image-level weakly supervised segmentation.jpg

    Defu Zhang, Kechen Song, et al. An image-level weakly supervised segmentation method for No-service rail surface defect with size prior [J]. Mechanical Systems and Signal Processing, 2022, 165, 108334. (paper) (code)

    In-service Rail Surface Defect Detection

    Line-Level Label

    Line-Level Label.png

    A novel inspection scheme for RSDs is presented for limited samples with a line-level label, which regards defect images as sequence data and classifies pixel lines. Thousands of pixel lines are easy to be collected and labeling line-level is a simple task in labeling works. Then two methods OC-IAN and OC-TD are designed for inspecting express rail defects and common/heavy rail defects, respectively. OC-IAN and OC-TD both employ one-dimensional convolutional neural network (ODCNN) to extract features and long- and short-term memory (LSTM) network to extract context information. The main differences between OC-IAN and OC-TD are that OC-TD applies a double-branch structure and removes the attention module.

    Defu Zhang, Kechen Song, et al. Two Deep Learning Networks for Rail Surface Defect Inspection of Limited Samples with Line-Level Label [J]. IEEE Transactions on Industrial Informatics, 2021,17(10),6731-6741.  (paper)

    In-service and No-service Rail Surface Defect Detection

    We propose a novel one-shot unsupervised domain adaptation framework. Specifically, we introduce a shape consistent style transfer module that performs pixel-level distribution alignment between the training and test images. Based on the one-shot test image, the training image is reconstructed to have the same appearance as the test image. Meanwhile, we employ a multi-task learning strategy to prevent content distortion of the reconstructed images. To improve the robustness of the model to distribution differences, we design an edge-aware defect segmentation model and train the model using the reconstructed training images. The experimental results show that our method effectively improves the robustness of the model to distribution differences and achieves satisfying results in the task of rail surface defect segmentation. SC-OSDA.jpg
    Shuai Ma, Kechen Song, et al. Shape Consistent One-Shot Unsupervised Domain Adaptation for Rail Surface Defect Segmentation [J]. IEEE Transactions on Industrial Informatics, 2022 .  (paper 

    CFDANet.jpg We propose a cross-scale fusion and domain adversarial network (CFDANet) to improve the generalization ability of deep neural networks on unseen datasets. To alleviate the domain shift caused by defect scale differences, we design a dual-encoder to extract multi-scale features from images of different resolutions. Then, those features are adaptively fused through a cross-scale fusion module. For the domain shift caused by inconsistent rail appearance, we introduce transferable-aware domain adversarial learning to extract domain invariant features from different datasets. Moreover, we further propose a transferable curriculum to suppress the negative impact of images with low transferability. Experimental results show that our CFDANet can accurately segment defects in unseen datasets and surpass other state-of-the-art domain generalization methods in all five target domain settings.

    Shuai Ma, Kechen Song, et al. Cross-scale Fusion and Domain Adversarial Network for Generalizable Rail Surface Defect Segmentation on Unseen Datasets [J]. Journal of Intelligent Manufacturing, 2022 . (paper)

    Anomaly Detection
    An innovative generative adversarial network based on adaptive pyramid graph (APG) and variation residuals (APGVR-GAN) is proposed, aiming to improve the robustness of anomaly detection in railway products and other complex industrial supplies. First, the APG module is embedded in the encoder–decoder–encoder pattern, capturing the correlation description between neighbor regions, which is utilized to enhance the detection of abnormal defects with weak texture. Next, the variation residual module is employed to enhance the expression of various normal samples in the latent space and improve the identification ability for abnormal samples. Then, the dual-probability prototype loss is proposed to make different normal samples have more concentrated expression and more similar probability distribution centers in latent space. Finally, an adaptive focal-gate loss and a regularized log-likelihood loss are designed to overcome the imbalance problem in training samples with different background information. The effectiveness of the model is verified on three new railway datasets and three other industrial public datasets. 


    Menghui Niu, Kechen Song, et al. An Adaptive Pyramid Graph and Variation Residual-Based Anomaly Detection Network for Rail Surface Defects [J]. IEEE Transactions on Instrumentation and Measuremente, 2021,70,5020013 . (paper)

Pavement Distress Detection

    Automatic Inspection and Evaluation System for Pavement Distress

    Image acquisition system.jpg Overview.png


    We propose a three-stage automatic inspection and evaluation system for pavement distress based on improved deep convolutional neural networks (CNNs). First, the system integrates multi-level context information from the CNN classification model to construct discriminative super-features to determine whether there is distress in the pavement image and the type of the distress, so as to achieve rapid detection of pavement distress. Then, the pavement images with distress are fed into the CNN segmentation model to highlight the distress region with pixel-wise. In the segmentation model, a novel pyramid feature extraction module and a novel guidance attention mechanism are introduced. Finally, we evaluate the degree of pavement damage according to the segmentation results of the CNN segmentation model.  
    Hongwen Dong, Kechen Song, et al.  Automatic Inspection and Evaluation System for Pavement Distress [J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(8),12377-12387.  (paper

    Detection and Classification for Pavement Distress Images

    Few-Shot Classification

    Deep Metric Learning.jpg

    We propose a new few-shot pavement distress detection method based on metric learning, which can effectively learn new categories from a few labeled samples. We adopt the backend network (ResNet18) to extract multilevel feature information from the base classes and then send the extracted features into the metric module. In the metric module, we introduce the attention mechanism to learn the feature attributes of “what” and “where” and focus the model on the desired characteristics. We also introduce a new metric loss function to maximize the distance between different categories while minimizing the distance between the same categories. In the testing stage, we calculate the cosine similarity between the support set and query set to complete novel category detection. 
    Hongwen Dong, Kechen Song, et al.  Deep metric learning-based for multi-target few-shot pavement distress classification [J]. IEEE Transactions on Industrial Informatics,  2022,18(3),1801-1810.   (paper)  (code)  (ESI highly cited, 7/2022-1/2023)HighlyCitedPaper.png

    Patch-aware Mutual Reasoning Network (PMRN)
    We propose a novel Patch-aware Mutual Reasoning Network (PMRN) that utilizes only the prior knowledge of non-defective samples for defect detection. Concretely, a patch-aware mutual reasoning module and a spatial shuffle perception module are devised to reason mutual dependencies and explore dislocations relationships. Besides, an adaptive soft gated anomaly measurement function is developed to calculate reconstruction deviations, which can soft control the information flow according to the complexity of the current scenario.  PMRN.jpg
    Yanyan Wang, Kechen Song, et al.  Unsupervised defect detection with patch-aware mutual reasoning network in image data [J]. Automation in Construction, 2022, 142, 104472.   (paper



    RENet is proposed for accurate and robust pavement crack detection. The rectangular convolution pyramid module is first built on deep layers so that the features can describe defects with different structures. The optimized contextual information and features of shallower layers are gradually merged into three resolutions. Subsequently, the hierarchical feature fusion refinement module and the boundary refinement module are applied to each branch. These two modules effectively promote the seamless fusion of features at various scales and make the model pay more attention to boundaries. Finally, the outputs of the three branches are integrated to obtain the final prediction map.

    Yanyan WangKechen Song, et al. RENet: Rectangular Convolution Pyramid and Edge Enhancement Network for Salient Object Detection of Pavement Cracks [J]. Measurement, 2021, 170, 108698. (paper

    Relevance-aware and Cross-reasoning Network (RCN)

    This paper proposes a relevance-aware and cross-reasoning network (RCN) for anomaly segmentation of pavement defects, which can segment defects using merely non-defective images for training. A relevance-aware transformer-based encoder is first devised to model intrinsic interdependencies across local features, thus improving representations of complex non-defective images. Next, a dual decoder strategy is proposed to remap the encoder-generated latent dependencies at the local semantic and global detailed levels, respectively. Specifically, a cross-reasoning refinement module is built in the local decoder to reason the crossrelationship between spatial and channel dimensions. Finally, a context-aware abnormal distillation measurement is developed to evaluate the semantic reconstruction deviations during the inference. Under the guidance of semantic affinity, this measurement allows our model to highlight defective areas adaptively. Extensive experimental results on four datasets indicate that RCN outperforms other leading anomaly segmentation methods.


    Yanyan Wang, Menghui Niu, Kechen Song, et al. Normal-knowledge-based Pavement Defect Segmentation Using Relevance-aware and Cross-reasoning Mechanisms [J]. IEEE Transactions on Intelligent Transportation Systems, 2022. (paper 

Multi-Modal Image Analysis and Application

    A Novel Visible-Depth-Thermal Image Dataset of Salient Object Detection

    1.jpg 2.jpg


    Visual perception plays an important role in industrial information field, especially in robotic grasping application. In order to detect the object to be grasped quickly and accurately, salient object detection (SOD) is employed to the above task. Although the existing SOD methods have achieved impressive performance, they still have some limitations in the complex interference environment of practical application. To better deal with the complex interference environment, a novel triple-modal images fusion strategy is proposed to implement SOD for robotic visual perception, namely visible-depth-thermal (VDT) SOD. Meanwhile, we build an image acquisition system under variable lighting scene and construct a novel benchmark dataset for VDT SOD (VDT-2048 dataset). Multiple modal images will be introduced to assist each other to highlight the salient regions. But, inevitably, interference will also be introduced. In order to achieve effective cross-modal feature fusion while suppressing information interference, a hierarchical weighted suppress interference (HWSI) method is proposed. The comprehensive experimental results prove that our method achieves better performance than the state-of-the-art methods.  
     Kechen Song, et al.  A Novel Visible-Depth-Thermal Image Dataset of Salient Object Detection for Robotic Visual Perception [J]. IEEE/ASME Transactions on Mechatronics, 2022.  (paper) (Dataset & Code

    RGB-T Image Analysis Technology and Application: A Survey

    RGB-T Image Analysis Technology and Application: A Survey
    RGB-Thermal infrared (RGB-T) image analysis has been actively studied in recent years. In the past decade, it has received wide attention and made a lot of important research progress in many applications. This paper provides a comprehensive review of RGB-T image analysis technology and application, including several hot fields: image fusion, salient object detection, semantic segmentation, pedestrian detection, object tracking, and person re-identification. The first two belong to the preprocessing technology for many computer vision tasks, and the rest belong to the application direction. This paper extensively reviews 400+ papers spanning more than 10 different application tasks. Furthermore, for each specific task, this paper comprehensively analyzes the various methods and presents the performance of the state-of-the-art methods. This paper also makes an in-deep analysis of challenges for RGB-T image analysis as well as some potential technical improvements in the future. RGB-T image analysis.png
     Kechen Song, Ying Zhao, et al.  RGB-T Image Analysis Technology and Application: A Survey [J]. Engineering Applications of Artificial Intelligence,  2023, 120, 105919. (paper)

    RGB-T Salient Object Detection

    A Variable Illumination Dataset: VI-RGBT1500

    We propose a variable illumination dataset named VI-RGBT1500 for RGBT image SOD. This is the first time that different illuminations are taken into account in the construction of the RGBT SOD dataset. Three illumination conditions, which are sufficient illumination, uneven illumination and insufficient illumination, are adopted to collect 1500 pairs of RGBT images.

     Kechen Song, Liming Huang, et al.  Multiple Graph Affinity Interactive Network and A Variable Illumination Dataset for RGBT Image Salient Object Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023.   (paper)(code & dataset)

    The current RGB-T datasets contain only a tiny amount of low-illumination data. The RGB-T SOD method trained based on these RGB-T datasets does not detect the salient objects in extremely low-illumination scenes very well. To improve the detection performance of low-illumination data, we can spend a lot of labor to label low-illumination data, but we tried another new idea to solve the problem by making full use of the characteristics of Thermal (T) images. Therefore, we propose a T-aware guided early fusion network for cross-illumination salient object detection. 


    Han Wang, Kechen Song, et al.  Thermal Images-Aware Guided Early Fusion Network for Cross-Illumination RGB-T Salient Object Detection [J]. Engineering Applications of Artificial Intelligence, 2023, 118, 105640.   (paper)(code)

    CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection


    A novel Cross-Guided Fusion Network (CGFNet) for RGB-T salient object detection is proposed. Specifically, a Cross-Scale Alternate Guiding Fusion (CSAGF) module is proposed to mine the high-level semantic information and provide global context support. Subsequently, we design a Guidance Fusion Module (GFM) to achieve sufficient cross-modality fusion by using single modal as the main guidance and the other modal as auxiliary. Finally, the Cross-Guided Fusion Module (CGFM) is presented and serves as the main decoding block.  
    Jie Wang, Kechen Song, et al.  CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(5),2949-2961.   (paper)(code)(ESI highly cited11/2022-1/2023)HighlyCitedPaper.png

    Multi-graph Fusion and Learning for RGBT Image Saliency Detection
    This research presents an unsupervised RGBT saliency detection method based on multi-graph fusion and learning. Firstly, RGB images and T images are adaptively fused based on boundary information to produce more accurate superpixels. Next, a multi-graph fusion model is proposed to selectively learn useful information from multi-modal images. Finally, we implement the theory of finding good neighbors in the graph affinity and propose different algorithms for two stages of saliency ranking. Experimental results on three RGBT datasets show that the proposed method is effective compared with the state-of-the-art algorithms. 

    Multi-graph Fusion and Learning for RGBT Image Saliency Detection.jpg

    Liming Huang, Kechen Song, et al.  Multi-graph Fusion and Learning for RGBT Image Saliency Detection [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022,32(3),1366-1377.   (paper)(code)(ESI highly cited1/2023)HighlyCitedPaper.png

    Unidirectional RGB-T salient object detection

    Unidirectional RGB-T.png

    The U-shaped encoder–decoder architecture based on CNNs has been rooted in salient object detection (SOD) tasks, and it have revealed two drawbacks while driving the rapid development of saliency detection. (1) The inherent characteristics of CNNs dictate that it is difficult to learn long-range dependencies and model global correlations. (2) For the common purpose of improving the performance of saliency detection, the encoder and decoder should complement each other and work together. However, the existing encoder–decoder architecture treats encoder and decoder independently of each other. Specifically, the encoder is responsible for extracting features and the decoder fuses multi-level or multi-modal features to produce prediction maps. That is, the encoder alone needs to be responsible for the decoder, while the valuable information after the decoder fusion will not facilitate feature extraction. Therefore, we propose a unidirectional RGB-T salient object detection network with intertwined driving of encoding and fusion to solve the above problems.

    Jie Wang,  Kechen Song, et al. Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion [J]. Engineering Applications of Artificial Intelligence, 2022, 114, 105162. (paper

    We propose a novel Modal Complementary Fusion Network (MCFNet) to alleviate the contamination effect of low-quality images from both global and local perspectives. Specifically, we design a modal reweight module (MRM) to evaluate the global quality of images and adaptively reweight RGB-T features by explicitly modelling interdependencies between RGB and thermal images. Furthermore, we propose a spatial complementary fusion module (SCFM) to explore the complementary local regions between RGB-T images and selectively fuse multi-modal features. Finally, multi-scale features are fused to obtain the salient detection result.


    Shuai Ma Kechen Song, et al. Modal Complementary Fusion Network for RGB-T Salient Object Detection [J]. Applied Intelligence, 2023, 53, 9038-9055  (paper)  (code)

    Low-rank Tensor Learning and Unified Collaborative Ranking

    Low-rank Tensor.png

    We propose a novel RGB-T saliency detection method in this letter. To this end, we first regard superpixels as graph nodes and calculate the affinity matrix for each feature. Then, we propose a low-rank tensor learning model for the graph affinity, which can suppress redundant information and improve the relevance of similar image regions. Finally, a novel ranking algorithm is proposed to jointly obtain the optimal affinity matrix and saliency values under a unified structure. Test results on two RGB-T datasets illustrate the proposed method performs well when against the state-of-the-art algorithms.
    Liming Huang Kechen Song, et al. RGB-T Saliency Detection via Low-rank Tensor Learning and Unified Collaborative Ranking [J]. IEEE Signal Processing Letters, 2020, 27,1585-1589.  (paper) (code and datasets)

    A novel information flow fusion network (IFFNet) method is proposed for the RGB-T cross-modal images. The proposed IFFNet consists of an information filtering module and a novel information flow paradigm. Validation on three available RGB-T salient object detection datasets shows that our proposed method performs more competitive than the state-of-the-art methods. IFFNet.png
    Kechen Song, Liming Huang, et al. A Potential Vision-Based Measurements Technology: Information Flow Fusion Detection Method Using RGB-Thermal Infrared Images [J]. IEEE Transactions on Instrumentation & Measurement, 2023, 72, 5004813.    (paper(code)

    GRNet: Cross-Modality Salient Object Detection Network with Universality and Anti-interference
    GRNet.png  Although cross-modality salient object detection has achieved excellent results, the current methods need to be improved in terms of universality and anti-interference. Therefore, we propose a cross-modality salient object detection network with universality and anti-interference. First, we offer a feature extraction strategy to enhance the features in the feature extraction stage. It can promote the mutual improvement of different modal information and avoid the influence of interference on the subsequent process. Then we use the graph mapping reasoning module (GMRM) to infer the high-level semantics to obtain valuable information. It enables our proposed method to accurately locate the objects in different scenes and interference to improve the universality and anti-interference of the method. Finally, we adopt a mutual guidance fusion module (MGFM), including a modality adaptive fusion module (MAFM) and across-level mutual guidance fusion module (ALMGFM), to carry out an efficient and reasonable fusion of multi-scale and multi-modality information. 
     Hongwei Wen, Kechen Song,et al. Cross-Modality Salient Object Detection Network with Universality and Anti-interference [J]. Knowledge-Based Systems, 2023, 264, 110322.    (paper(code)

    RGB-T Few-shot Semantic Segmentation

    V-TFSS.jpg Few-shot semantic segmentation (FSS) has drawn great attention in the community of computer vision, due to its remarkable potential for segmenting novel objects with few pixel-annotated samples. However, some interference factors, such as insufficient illumination and complex background, can impose more challenge to the segmentation performance than fully-supervised when the number of samples is insufficient. Therefore, we propose the visible and thermal (V-T) few-shot semantic segmentation task, which utilize the complementary and similar information of visible and thermal images to boost few-shot segmentation performance. 

    Yanqi Bao, Kechen Song, et al. Visible and Thermal Images Fusion Architecture for Few-shot Semantic Segmentation [J]. Journal of Visual Communication and Image Representation, 2021, 80, 103306.  (paper)(code and datasets)

    RGB-T Object Tracking

    Learning Discriminative Update Adaptive Spatial-Temporal Regularized Correlation Filter for RGB-T Tracking
    RGB-T tracker.png We propose a novel adaptive spatial-temporal regularized correlation filter model to learn an appropriate regularization for achieving robust tracking and a relative peak discriminative method for model updating to avoid the model degradation. Besides, to make better integrate the unique advantages of the two modes and adapt the changing appearance of the target, an adaptive weighting ensemble scheme and a multi-scale search mechanism are adopted, respectively. To optimize the proposed model, we designed an efficient ADMM algorithm, which greatly improved the efficiency. Extensive experiments have been carried out on two available datasets, RGBT234 and RGBT210, and the experimental results indicate that the tracker proposed by us performs favorably in both accuracy and robustness against the state-of-the-art RGB-T trackers. 

    Mingzheng Feng, Kechen Song, et al. Learning Discriminative Update Adaptive Spatial-Temporal Regularized Correlation Filter for RGB-T Tracking [J]. Journal of Visual Communication and Image Representation, 2020, 72, 102881.  (paper)

Robotic Visual Grasping Detection

    Robotic Visual Grasping Detection

    Data-driven robotic visual grasping detection for unknown objects
    framework.jpg This paper presents a comprehensive survey of data-driven robotic visual grasping detection (DRVGD) for unknown objects. We review both object-oriented and scene-oriented aspects, using the DRVGD for unknown objects as a guide. Object-oriented DRVGD aims for the physical information of unknown objects, such as shape, texture, and rigidity, which can classify objects into conventional or challenging objects. Scene-oriented DRVGD focuses on unstructured scenes, which are explored in two aspects based on the position relationships of objectto-object, grasping isolated or stacked objects in unstructured scenes. In addition, this paper provides a detailed review of associated grasping representations and datasets. Finally, the challenges of DRVGD and future directions are pointed out.
    Hongkun Tian, Kechen Song, et al.  Data-driven Robotic Visual Grasping Detection for Unknown Objects A Problem-oriented Review [J]. Expert Systems With Applications, 2023, 211, 118624. (paper

    Lightweight Pixel-Wise Generative Robot Grasping Detection
    Grasping detection is one of the essential tasks for robots to achieve automation and intelligence. The existing grasp detection mainly relies on data-driven discriminative and generative strategies. Generative strategies have significant advantages over discriminative strategies in terms of efficiency. RGB and depth (RGB-D) data are widely used in grasping data sources due to the sufficient amount of information and low cost of acquisition. RGB-D fusion has shown advantages over only using RGB or depth. However, existing research has mainly focused on early fusion and late fusion, which is challenging to utilize information from both modalities fully. Improving the accuracy of grasping while leveraging the knowledge of both modalities and ensuring lightweight and real time is crucial. Therefore, this article proposes a pixel-wise RGB-D dense fusion method based on a generative strategy. The technique is doubly experimentally validated on public datasets and real robot platform. Accuracy rates of 98.9% and 94.0% are achieved on Cornell and Jacquard datasets, and the efficiency of only 15 ms is achieved for single-image processing. The average success rate of the AUBO i5 robotic platform with DH-AG-95 parallel gripper reached 94.0% for single-object scenes, 86.7% for three-object scenes, and 84% for five-object scenes. Our approach has outperformed existing state-of-the-art methods. Lightweight Pixel-Wise.jpg
    Hongkun Tian, Kechen Song, et al. Light-weight Pixel-wise Generative Robot Grasping Detection Based on RGB-D Dense Fusion [J]. IEEE Transactions on Instrumentation and Measuremente, 2022,71, 5017912(paper) (video)

    Rotation Adaptive Grasping Estimation Network Oriented to Unknown Objects Based on Novel RGB-D Fusion Strategy

    This paper proposes a framework for rotation adaptive grasping estimation based on a novel RGB-D fusion strategy. Specifically, the RGB-D is fused with shared weights in stages based on the proposed Multi-step Weight-learning Fusion (MWF) strategy. The spatial position is encoding learned autonomously based on the proposed Rotation Adaptive Conjoin (RAC) encoder to achieve spatial and rotational adaptiveness oriented to unknown objects with unknown poses. In addition, the Multi-dimensional Interaction-guided Attention (MIA) decoding strategy based on the fused multiscale features is proposed to highlight the practical elements and suppress the invalid ones. The method has been validated on the Cornell and Jacquard grasping datasets with cross-validation accuracies of 99.3% and 94.6%. The single-object and multi-object scene grasping success rates on the robot platform are 95.625% and 87.5%, respectively. 

    Hongkun Tian, Kechen Song, et al. Rotation Adaptive Grasping Estimation Network Oriented to Unknown Objects Based on Novel RGB-D Fusion Strategy [J]. Engineering Applications of Artificial Intelligence, 2023(paper)

Multi-Exposure Fusion for Curved Workpieces

    Multi-Exposure Fusion for Curved Workpieces

    CW-MEF dataset:

    CW-MEF dataset.jpg

    To fill the gap of MEF datasets in the industrial field, a novel curved workpieces dataset called CW-MEF for the MEF task is proposed. The samples in the dataset have been carefully selected to cover mainstream mechanical workpieces, critical parts like engine blades, and hardware tools that are very common in daily life. All samples have a characteristic that is prone to produce reflection under the light, so the workpiece samples we selected are highly representative. The CW-MEF dataset is divided into 44 categories and contains a total of 4113 images, all of which are 1280 × 1024 in size. The dataset is available at:

    Chongyan Sun, Kechen Song, et al. A Multi-Exposure Fusion Method for Reflection Suppression of Curved Workpieces [J]. IEEE Transactions on Instrumentation and Measuremente,2022,71,5021104. (paper) (code & dataset)