EP-4371086-B1 - IMAGE PROCESSING

EP4371086B1EP 4371086 B1EP4371086 B1EP 4371086B1EP-4371086-B1

Inventors

LANE, TOM
SILVESTER, ROBERT
TREVARTHEN, Nigel

Dates

Publication Date: 20260506
Application Date: 20220819

Claims (16)

A method, comprising: · receiving an image (101) of a crop (4), the image comprising an array of elements which includes depth information, wherein the image is a multiple-channel image comprising at least a colour channel and a depth channel providing per-element depth information (87); · providing the image to a trained convolutional neural network (120) to generate a response map (121) comprising an image comprising intensity values (123) having respective peaks (122) corresponding to the stem of a plant in the crop; · obtaining, from the response map, coordinates (131) corresponding to the respective peaks (122); and · converting the coordinates (131) in image coordinates into stem locations (29) in real-world dimensions using the provided depth information.
The method of claim 1, wherein the trained convolutional neural network includes at least one encoder-decoder module or wherein the trained convolutional neural network is a multi-stage pyramid network.
The method of claim 1 or 2, further comprising: · receiving a mapping array (81) for mapping depth information (85) to a corresponding element, wherein the mapping array comprises per-element depth information.
The method of any one of claims 1 to 3, further comprising: · receiving the per-element depth information (83) from a depth sensing image sensor (17).
The method of any one of claims 1 to 4, wherein the multiple-channel image (101) further includes an infrared channel.
The method of any one of claims 1 to 5, wherein the multiple-channel image (101) further includes an optical flow image.
The method of any one of claims 1 to 6, wherein the intensity values have Gaussian distributions (123) in the vicinity of each detected location.
The method of any one of claims 1 to 7, wherein the trained convolutional neural network (120) comprises a series of at least two encoder-decoder modules (302).
The method of claim 1 further comprising: · converting the coordinates (131) in image coordinates into coordinates (132) in camera coordinates; and · amalgamating the camera coordinates corresponding to the same stem of the same plant from more than one image into an amalgamated stem location (29).
The method of any one of claim 1 to 9, further comprising: · calculating a trajectory (601) in dependence on the detected stem location (29); and · transmitting a control message (30) to a control system (23) in dependence on the trajectory.
A computer program which, when executed by at least one processor, performs the method of any one of claims 1 to 10.
A computer program product comprising a computer-readable medium storing a computer program which, when executed by at least one processor, performs the method of any one of claims 1 to 10.
An image processing system (20) comprising: · at least one processor (31, 32); · memory (33); the at least one processor configured to perform the method of any one of claims 1 to 10.
A system (11) comprising: · a multiple-image sensor system (12) for obtaining images (14, 16); and · the image processing system (20) of claim 13; wherein the multiple-image sensor system is arranged to provide the images (14, 16) to the image processing system and the image processing system is configured to process the images.
The system of claim 14, further comprising: · a control system (23), wherein the system is configured to control the control system.
A vehicle (1, 2) comprising the system (11) of claim 14 or 15.

Description

Field The present invention relates to image processing. Background Computer vision and image recognition are being increasingly used in agriculture and horticulture, for example, to help manage crop production and automate farming. T. Hague, N. D. Tillett and H. Wheeler: "Automated Crop and Weed Monitoring in Widely Spaced Cereals", Precision Agriculture, volume 7, pp. 21-32 (2006) describes an approach for automatic assessment of crop and weed area in images of widely-spaced (0.25 m) cereal crops, captured from a tractor-mounted camera. WO 2013/134480 A1 describes a method of real-time plant selection and removal from a plant field. The method includes capturing an image of a section of the plant field, segmenting the image into regions indicative of individual plants within the section, selecting the optimal plants for retention from the image based on the image and previously thinned plant field sections, and sending instructions to the plant removal mechanism for removal of the plants corresponding to the unselected regions of the image before the machine passes the unselected regions. A. English, P. Ross, D. Ball and P. Corke: "Vision based guidance for robot navigation in agriculture", 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, pp. 1693-1698 (2014) describes a method of vision-based texture tracking to guide autonomous vehicles in agricultural fields. The method works by extracting and tracking the direction and lateral offset of the dominant parallel texture in a simulated overhead view of the scene and hence abstracts away crop-specific details such as colour, spacing and periodicity. A. English, P. Ross, D. Ball, B. Upcroft and P. Corke: "Learning crop models for vision-based guidance of agricultural robots", 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 2015, pp. 1158-1163 describes a vision-based method of guiding autonomous vehicles within crop rows in agricultural fields. The location of the crop rows is estimated with an SVM regression algorithm using colour, texture and 3D structure descriptors from a forward-facing stereo camera pair. P. Lottes, J. Behley, N. Chebrolu, A. Milioto and C. Stachniss: "Joint Stem Detection and Crop-Weed Classification for Plant-Specific Treatment in Precision Farming", 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 2018, pp. 8233-8238 describes an approach which outputs the stem location for weeds, which allows for mechanical treatments, and the covered area of the weed for selective spraying. The approach uses an end-to-end trainable fully-convolutional network that simultaneously estimates stem positions as well as the covered area of crops and weeds. It jointly learns the class-wise stem detection and the pixel-wise semantic segmentation. Additional relevant prior art includes: WU XIAOLONG ET AL: "Robotic weed control using automated weed and crop classification", JOURNAL OF FIELD ROBOTICS, vol. 37, no. 2 March 2020 (2020-03), pages 322-340, XP055830470, US, ISSN: 1556-4959, DOI: 10.1002/ rob.21938, Retrieved from the Internet:URL:http://www.ipb.uni-bonn.de/pdfs/ wu2020jfr.pdf;PHILIPP LOTTES ET AL: "Fully Convolutional Networks with Sequential Information for Robust Crop and Weed Detection in Precision Farming",ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 9 June 2018 (2018-06-09), XP080888818, DOI: 10.1109/LRA.2018.2846289;SA INKYU ET AL: "weedNet: Dense Semantic Weed Classification Using Multispectral Images and MAV for Smart Farming",IEEE ROBOTICS AND AUTOMATION LETTERS, IEEE, vol. 3, no. 1, 2018, pages 588-595, XP011674527, DOI: 10.1109/LRA.2017.2774979;ENGLISH ANDREW ET AL: "Learning crop models for vision-based guidance of agricultural robots",2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), IEEE, 28 September 2015 (2015-09-28), pages 1158-1163, XP032831784,DOI: 10.1109/IROS.2015.7353516; andASAD MUHAMMAD HAMZA ET AL: "Weed Density Estimation Using Semantic Segmentation", 27 January 2020 (2020-01-27), ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, PAGE(S) 162 - 171, XP047534567. Summary According to a first aspect of the present invention there is provided a method comprising receiving an image of a crop, the image comprising an array of elements which includes depth information, wherein the image is a multiple-channel image comprising at least a colour channel and a depth channel providing per-element depth information, providing the image to a trained convolutional neural network to generate a response map comprising an image comprising intensity values having respective peaks corresponding to the stem of a plant in the crop, obtaining, from the response map, coordinates corresponding to the respective peaks, and converting the coordinates in image coordinates into stem locations in