Ranjan Sapkota - ESS Open Archive

Ranjan Sapkota

Graduate Student

Member of: Washington State University

Public Documents 7

YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in C...

Ranjan Sapkota

and 1 more

October 28, 2024

In this study, a robust method for 3D pose estimation of immature green apples (fruitlets) in commercial orchards was developed, utilizing the YOLO11 object pose detection model alongside Vision Transformers (ViT) for depth estimation. For object detection and pose estimation, performance comparisons of YOLO11 (YOLO11n, YOLO11s, YOLO11m, YOLO11l and YOLO11x) and YOLOv8 (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l and YOLOv8x) were made under identical hyperparameter settings among the all configurations. Likewise, for RGB to RGB-D mapping, Dense Prediction Transformer (DPT) and Depth Anything V2 were investigated. It was observed that YOLO11n surpassed all configurations of YOLO11 and YOLOv8 in terms of box precision and pose precision, achieving scores of 0.91 and 0.915, respectively. Conversely, YOLOv8n exhibited the highest box and pose recall scores of 0.905 and 0.925, respectively. Regarding the mean average precision at 50% intersection over union (mAP@50), YOLO11s led all configurations with a box mAP@50 score of 0.94, while YOLOv8n achieved the highest pose mAP@50 score of 0.96. In terms of image processing speed, YOLO11n outperformed all configurations with an impressive inference speed of 2.7 ms, significantly faster than the quickest YOLOv8 configuration, YOLOv8n, which processed images in 7.8 ms. This demonstrates a substantial improvement in inference speed over previous iterations, particularly evident when comparing YOLO11n and YOLOv8n. Subsequent integration of ViTs for the green fruit's pose depth estimation revealed that Depth Anything V2 outperformed Dense Prediction Transformer in 3D pose length validation, achieving the lowest Root Mean Square Error (RMSE) of 1.52 and Mean Absolute Error (MAE) of 1.28, demonstrating exceptional precision in estimating immature green fruit lengths. Following this, the DPT showed notable accuracy improvements with a RMSE of 3.29 and an MAE of 2.62. In contrast, measurements derived from Intel RealSense point clouds exhibited the highest discrepancies from the ground truth, with a RMSE of 9.98 and an MAE of 7.74. These findings emphasize the effectiveness of YOLO11 in detecting and estimating the pose of immature green fruits, illustrating how Vision Transformers like Depth Anything V2 adeptly convert RGB images into RGB-D data, thus enhancing the precision and computational requirement of 3D pose estimations for future robotic thinning applications in commercial orchards.

Comprehensive Performance Evaluation of YOLO11, YOLOv10, YOLOv9 and YOLOv8 on Detecti...

Ranjan Sapkota

and 5 more

October 21, 2024

Object detection, specifically fruitlet detection, is a crucial image processing technique in agricultural automation, enabling the accurate identification of fruitlets on orchard trees within images. It is vital for early fruit load management and overall crop management, facilitating the effective deployment of automation and robotics to optimize orchard productivity and resource use. This study systematically performed an extensive evaluation of the performances of all configurations of YOLOv8, YOLOv9, YOLOv10, and YOLO11 object detection algorithms in terms of precision, recall, mean Average Precision at 50% Intersection over Union (mAP@50), and computational speeds including pre-processing, inference, and post-processing times immature green apple (or fruitlet) detection in commercial orchards. Additionally, this research performed and validated in-field counting of fruitlets using an iPhone and machine vision sensors in 4 different apple varieties (Scifresh, Scilate, Honeycrisp & Cosmic crisp). This investigation of total 22 different configurations of YOLOv8, YOLOv9, YOLOv10 and YOLO11 (5 for YOLOv8, 6 for YOLOv9, 6 for YOLOv10, and 5 for YOLO11) revealed that YOLOv9 gelan-base and YOLO11s outperforms all other configurations of YOLOv10 , YOLOv9 and YOLOv8 in terms of mAP@50 with a score of 0.935 and 0.933 respectively. In terms of precision, specifically, YOLOv9 Gelan-e achieved the highest mAP@50 of 0.935, outperforming YOLOv11s's 0.0.933, YOLOv10s's 0.924, and YOLOv8s's 0.924. In terms of recall, YOLOv9 gelan-base achieved highest value among YOLOv9 configurations (0.899), and YOLO11m performed the best among the YOLO11 configurations (0.897). In comparison for inference speeds, YOLO11n demonstrated fastest inference speeds of only 2.4 ms, while the fastest inference speed across YOLOv10, YOLOv9 and YOLOv8 were 5.5, 11.5 and 4.1 ms for YOLOv10n, YOLOv9 gelan-s and YOLOv8n respectively.

Robotics for crop pollination: recent advances and future direction

Ranjan Sapkota

and 3 more

October 08, 2024

There is great interest in alternative pollination strategies for crop production in the face of climate change and perennial threats to the traditional pollination mechanisms. This review explores the potential for robotic pollination in response to these challenges to crop fertilization and global food production. Herein we describe the viability of novel robotic systems equipped with artificial intelligence and machine vision, as alternatives to traditional insect pollination. We examine the technological progress and challenges for both aerial and ground-based robotic artificial pollination systems and emphasize the need for continued research and development in this area to ensure sustainable agricultural productivity. This paper highlights the importance of robotic pollination as a practical and environmentally sustainable approach in modern agriculture, amidst burgeoning ecological threats and a dwindling agricultural workforce.

YOLOv10 to Its Genesis: A Decadal and Comprehensive Review of The You Only Look Once...

Ranjan Sapkota

and 9 more

July 02, 2024

This review systematically examines the progression of the You Only Look Once (YOLO) object detection algorithms from YOLOv1 to the recently unveiled YOLOv10. Employing a reverse chronological analysis, this study examines the advancements introduced by YOLO algorithms, beginning with YOLOv10 and progressing through YOLOv9, YOLOv8, and subsequent versions to explore each version's contributions to enhancing speed, accuracy, and computational efficiency in real-time object detection. The study highlights the transformative impact of YOLO across five critical application areas: automotive safety, healthcare, industrial manufacturing, surveillance, and agriculture. By detailing the incremental technological advancements in subsequent YOLO versions, this review chronicles the evolution of YOLO, and discusses the challenges and limitations in each earlier versions. The evolution signifies a path towards integrating YOLO with multimodal, context-aware, and General Artificial Intelligence (AGI) systems for the next YOLO decade, promising significant implications for future developments in AI-driven applications.

Multi-Modal LLMs in Agriculture: A Comprehensive Review

Ranjan Sapkota

and 12 more

September 16, 2024

Given the rapid emergence and applications of Large Language Models (LLMs) across various scientific fields, insights regarding their applicability in agriculture are still only partially explored. This paper conducts an in-depth review of LLMs in agriculture, focusing on understanding how LLMs can be developed and implemented to optimize agricultural processes, increase efficiency, and reduce costs. Recent studies have explored the capabilities of LLMs in agricultural information processing and decision-making. Nevertheless, a comprehensive understanding of the capabilities, challenges, limitations, and future directions of LLMs in agricultural information processing and application remains in its early stages. Such exploration is essential to provide the community with a broader perspective and clearer understanding of LLMs' applications, serving as a baseline for the current state and trends in the subject matter. To bridge this gap, this survey reviews the progress of LLMs and their utilization in agriculture, with an additional focus on 11 key research questions (RQs), where 4 RQs are general and 7 RQs are agriculture focused. By addressing these RQs, this review outlines the current opportunities and challenges, limitations, and future roadmap for LLMs in agriculture. The findings indicate that multi-modal LLMs not only simplify complex agricultural challenges but also significantly enhance decision-making and improve the efficiency of agricultural image processing. These advancements position LLMs as an essential tool for the future of farming. For continued research and understanding, an organized and regularly updated list of papers on LLMs is available at https://github.com/JiajiaLi04/ Multi-Modal-LLMs-in-Agriculture.

Robotics for crop pollination: recent advances and future direction

Ranjan Sapkota

and 3 more

October 07, 2024

There is great interest in alternative pollination strategies for crop production in the face of climate change and perennial threats to the traditional pollination mechanisms. This review explores the potential for robotic pollination in response to these challenges to crop fertilization and global food production. Herein we describe the viability of novel robotic systems equipped with artificial intelligence and machine vision, as alternatives to traditional insect pollination. We examine the technological progress and challenges for both aerial and ground-based robotic artificial pollination systems and emphasize the need for continued research and development in this area to ensure sustainable agricultural productivity. This paper highlights the importance of robotic pollination as a practical and environmentally sustainable approach in modern agriculture, amidst burgeoning ecological threats and a dwindling agricultural workforce.

Comprehensive Performance Evaluation of YOLOv10, YOLOv9 and YOLOv8 on Detecting and C...

Ranjan Sapkota

and 5 more

July 09, 2024

A document by Ranjan Sapkota. Click on the document to view its contents.