Object detection is a very fundamental and well-studied task in the community of computer vision. The purpose of the object detection task is to classify and localize target objects in an image. With the recent advancements in deep learning technologies over the years, several state-of-the-art methods for object detection have emerged . Object detection has been widely applied to many real-world applications, including autonomous driving, robot vision, intelligent transportation, remote sensing, military operations, and surveillance , , .
Several object detectors usually perform well on large objects but poorly on small objects. We refer to objects as small when their occupying pixel area or field of view is small in an input image. In the case of generic object detectors, the features of small objects lose importance as they are processed through multiple layers of their backbone. The accurate detection of small objects is indispensable and challenging due to poor visual appearance, insufficient context information, noisy representation, indistinguishable features, complicated backgrounds, limited resolution, severe occlusion, etc. , . Although modern systems designed to implement object detection in real-time mainly focus on speed at the cost of computational resources, they lack feasibility due to their poor detection accuracy. Thus, improvements in this specific area would benefit practical implications in autonomous driving systems.
Detecting target objects on the road is an essential task for autonomous driving. For most existing road object detectors, the detection accuracy for small objects is less than half that of large objects. This is because they usually cover fewer pixels, and it is difficult to extract features from low resolution, so the model can easily confuse it with the background, resulting in missed or incorrect detection . Moreover, one of the most critical challenges of an object detector is that the accurate detection of different scale objects is not well-balanced. In the context of autonomous driving, traffic signs and traffic lights can be regarded as small objects. Although many studies ,  suggest increasing the representational capacity of the network in terms of depth and width for accurate detection, this impacts the complexity and cost of the model. Accordingly, such models are less suited for autonomous driving systems because of their real-time resource constraints.
In general, deep learning-based object detection models are categorized into (1) Two-stage detection algorithms and (2) One-stage detection algorithms , . The two-stage models achieve higher accuracy than one-stage models at the cost of speed and complexity but may not directly benefit the practical driving scenarios. Recently, efforts have been put to match or even improve the performance of one-stage models , . Hence, many new one-stage detectors have been developed for such applications. In this letter, we focus on the popular one-stage detector, i.e., You Only Look Once version 5 (YOLOv5) model . This is the most recent version in the YOLO family with a clear and flexible structure aiming for high performance and speed on accessible platforms. However, the current systems that apply this model rely either on conventional training methods, regularization/normalization techniques, or adjusting specific parameters to improve performance, with limited or no consideration for architectural modifications. Although YOLOv5 is a generic object detector, it is not optimized for the detection of small objects, and therefore cannot adapt to specific use cases in practice.
This letter proposes architectural improvements to the original YOLOv5 model to perform better in terms of small object detection. For this, we consider the actual road environment in autonomous driving systems to detect small road objects like traffic signs and traffic lights. Moreover, we will discuss the effects of our modifications on how to accurately perform this task while maintaining real-time speed and with a slight increase in the computational complexity of the system. The highlights of our contributions are
We optimize the existing YOLOv5 model and design a modified YOLOv5 architecture, with the name iS-YOLOv5, aiming for better detection of small objects in autonomous driving scenarios.
We investigate the applicability of our model in diverse weather scenarios to highlight its significance in the context of more robust and efficient object detection.
Extensive experimentation on BDD100K dataset demonstrates the efficacy of the proposed model. Moreover, we provide empirical results for traffic sign and traffic light detection on TT100K and DTLD datasets, respectively.
Over the years, many researchers have shown a significant interest in developing and employing deep learning-based models for performance enhancement in object detection tasks. With the advent of the YOLO series , , various applications have utilized YOLO and its architectural successors for object detection due to their real-time detection speed rather than considering detection accuracy. Hence, many investigative studies have proposed applying the YOLO models in autonomous driving
We first discuss the motivations of our work (Section3.1). Then, we provide a brief overview of YOLOv5 architecture and discuss its shortcomings (Section3.2). Finally, a series of novel architectural changes are introduced to optimize and improve the detection performance of small objects (Section3.3).
In this section, we describe the autonomous driving datasets, training environment, and performance evaluation indicators. Thereafter, we verify the superiority of the proposed method through several experiments.
In this letter, we study and analyze the effect of different architectural modifications applied to the popular YOLOv5 structure for improving the detection performance of small-scale objects without sacrificing the detection accuracy of large objects. To achieve this, we make refinements for optimizing the flow of information through different network layers. Accordingly, we propose the iS-YOLOv5 model, which is capable of boosting the detection accuracy and speed without greatly increasing
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Recommended articles (6)
Tomato cluster detection and counting using improved YOLOv5 based on RGB-D fusion
Computers and Electronics in Agriculture, Volume 207, 2023, Article 107741
Accurate estimation of tomato cluster yields is critical to the advancement of intelligent and unmanned greenhouses, guiding horticultural management and adjusting sales and marketing strategies. However, due to the complex natural environment and tracking stability, there are still considerable challenges for automated yield estimation to be deployed in practice. Therefore, this paper presents an improved tomato cluster counting method that combines object detection, multiple object tracking, and specific tracking region counting. To reduce background tomato misidentification, we proposed the YOLOv5-4D that fuses RGB images and depth images as input. Next, we adopted ByteTrack to track tomato clusters in continuous frames and designed a specific tracking region counting method to overcome the problem of tracked tomato cluster ID shift. In the test set, the improved YOLOv5-4D had a detection accuracy of 97.9% and a [emailprotected]:0.95 of 0.748. Field experiments showed that the counting method achieved a statistical average counting accuracy of 95.1% and the integrated algorithm ran at more than 40 FPS, enabling stable real-time yield estimation.(Video) Self Driving Object detection Tutorial using YOLOV5 Custom Object detection
A systematic review and analysis of deep learning-based underwater object detection
Neurocomputing, Volume 527, 2023, pp. 204-232
Underwater object detection is one of the most challenging research topics in computer vision technology. The complex underwater environment makes underwater images suffer from high noise, low visibility, blurred edges, low contrast and color deviation, which brings significant challenges to underwater object detection tasks. In underwater object detection tasks, traditional object detection methods often perform poorly in terms of accuracy and generalization capabilities. Underwater object detection requires accurate, stable, generalizable, real-time and lightweight detection models, for which many researchers have proposed various underwater object detection techniques based on deep learning. Although many outstanding results have been achieved on underwater object detection over the years, the research status of underwater object detection techniques are still lack of unified induction, and some existing problems need to be further probed from the latest perspective. In addition, previous reviews lack analysis on the relationship between underwater image enhancement and object detection. Therefore, this paper provides a comprehensive review of the current research challenges, future development trends, and potential applications of underwater object detection techniques. More importantly, this paper has explored the internal relationship between underwater image enhancement and object detection, and analyzed the possible implementation manners of underwater image enhancement in the object detection task in order to further enhance its benefits. The experiments show the performances of current underwater image enhancement and state-of-the-art object detection algorithms, point out their limitations, and indicate that there is not a strict positive correlation between underwater image enhancement and the accuracy improvement of object detection. The domain shift caused by underwater image enhancement cannot be ignored. This paper can be regarded as a guide for future works on underwater object detection.
A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5
Computers and Electronics in Agriculture, Volume 204, 2023, Article 107534(Video) Lane and object detection (Yolo V5 & openCV)
A fast and lightweight detection algorithm is presented to improve the recognition rate and shorten the detection time of passion fruit pest detection. Based on the traditional YOLOv5 model, a new point-line distance loss function is proposed to reduce redundant computations and shorten detection time. Then, the attention module is added to the network for adaptive attention, which can focus on the target object in the channel and space dimensions to improve the detection and identification rates. Finally, the mixup online data augmentation algorithm is added to expand the online training set, which increases the model robustness and prevents over-fitting. The experimental results demonstrate the effectiveness of the proposed model. The results show the mean Average Precision is 96.51%, and the mean detection time is 7.7 ms, fulfilling the requirements of accuracy and real-time. Meanwhile, the proposed model keeps the lightweight characteristics of the traditional YOLOv5, which has a good application prospect in intelligent agriculture.
Focus-and-Detect: A small object detection framework for aerial images
Signal Processing: Image Communication, Volume 104, 2022, Article 116675
Despite recent advances, object detection in aerial images is still a challenging task. Specific problems in aerial images makes the detection problem harder, such as small objects, densely packed objects, objects in different sizes and with different orientations. To address small object detection problem, we propose a two-stage object detection framework called “Focus-and-Detect”. The first stage which consists of an object detector network supervised by a Gaussian Mixture Model, generates clusters of objects constituting the focused regions. The second stage, which is also an object detector network, predicts objects within the focal regions. Incomplete Box Suppression (IBS) method is also proposed to overcome the truncation effect of region search approach. Results indicate that the proposed two-stage framework achieves an AP score of 42.06 on VisDrone validation dataset, surpassing all other state-of-the-art small object detection methods reported in the literature, to the best of authors’ knowledge.
A survey on machine learning from few samples
Pattern Recognition, Volume 139, 2023, Article 109480(Video) [YOLO V5] Open source UAV + small pod target tracking experiment
The capability of learning and generalizing from very few samples successfully is a noticeable demarcation separating artificial intelligence and human intelligence. Despite the long history dated back to the early 2000s and the widespread attention in recent years with booming deep learning, few surveys for few sample learning (FSL) are available. We extensively study almost all papers of FSL spanning from the 2000s to now and provide a timely and comprehensive survey for FSL. In this survey, we review the evolution history and current progress on FSL, categorize FSL approaches into the generative model based and discriminative model based kinds in principle, and emphasize particularly on the meta learning based FSL approaches. We also summarize several recently emerging extensional topics of FSL and review their latest advances. Furthermore, we highlight the important FSL applications covering many research hotspots in computer vision, natural language processing, audio and speech, reinforcement learning and robotic, data analysis, etc. Finally, we conclude the survey with a discussion on promising trends in the hope of providing guidance and insights to follow-up researches.
An improved YOLOv5 method for large objects detection with multi-scale feature cross-layer fusion network
Image and Vision Computing, Volume 125, 2022, Article 104518
SSD and YOLOv5 are the one-stage object detector representative algorithms. An improved one-stage object detector based on the YOLOv5 method is proposed in this paper, named Multi-scale Feature Cross-layer Fusion Network (M-FCFN). Firstly, we extract shallow features and deep features from the PANet structure for cross-layer fusion and obtain a feature scale different from 80 × 80, 40 × 40, and 20 × 20 as output. Then, according to the single shot multi-box detector, we propose the different scale features which are obtained by cross-layer fusion for dimension reduction and use it as another output for prediction. Therefore, two completely different feature scales are added as the output. Features of different scales are necessary for detecting objects of different sizes, which can increase the probability of object detection and significantly improve detection accuracy. Finally, aiming at the Autoanchor mechanism proposed by YOLOv5, we propose an EIOU k-means calculation. We have compared the four model structures of S, M, L, and X of YOLOv5 respectively. The problem of missed and false detections for large objects is improved which has better detection results. The experimental results show that our methods achieve 89.1% and 67.8% mAP@0.5 on the PASCAL VOC and MS COCO datasets. Compared with the YOLOv5_S, our methods improve by 4.4% and 1.4% mAP@ [0.5:0.95] on the PASCAL VOC and MS COCO datasets. Compared with the four models of YOLOv5, our methods have better detection accuracy for large objects. It should be more attention that our method on the large-scale mAP@ [0.5:0.95] is 5.4% higher than YOLOv5_S on the MS COCO datasets.(Video) yolov5 counting vehicles in parking area | yolov5 vehicles counting | computer vision | yolov5
© 2023 Elsevier B.V. All rights reserved.