CN-116958180-B - Image processing method, device, apparatus, storage medium, and program

CN116958180BCN 116958180 BCN116958180 BCN 116958180BCN-116958180-B

Abstract

The embodiment of the disclosure provides an image processing method, device, equipment, storage medium and program, wherein the method comprises the steps of obtaining an interface image corresponding to a first user interface, sliding the interface image in the interface image by adopting at least one detection frame to determine N detection areas in the interface image and respectively determine detection results of the N detection areas, wherein the detection result of each detection area comprises coordinate information of the detection area, first probability that the detection area is an area occupied by a super-frame object and/or second probability that the detection area is an area occupied by a super-frame part of the super-frame object, and determining a first target area occupied by the super-frame object in the interface image and/or a second target area occupied by the super-frame part of the super-frame object according to the detection results of the N detection areas. Through the process, the detection efficiency of the super-frame object can be improved, and the labor cost and the time cost are saved.

Inventors

LIANG XIAOYUN
GAO YONGQIANG
YANG PING

Assignees

北京字节跳动网络技术有限公司

Dates

Publication Date: 20260508
Application Date: 20220420

Claims (11)

1. An image processing method, comprising: acquiring an interface image corresponding to a first user interface, wherein the interface image comprises at least one display object; sliding at least one detection frame in the interface image to determine N detection areas in the interface image and respectively determine detection results of the N detection areas, wherein the detection result of each detection area comprises coordinate information of the detection area, first probability that the detection area is an area occupied by a super-frame object and second probability that the detection area is an area occupied by a super-frame part of the super-frame object, and N is an integer larger than 1; Determining a plurality of first candidate areas and a plurality of second candidate areas in the N detection areas according to the first probability and the second probability corresponding to the N detection areas; the method comprises the steps of obtaining a first candidate region, a second candidate region, a first probability, a second probability, a coordinate information and a first target region, wherein the first probability is larger than a first threshold value, the second probability is larger than a second threshold value, the second candidate region is corresponding to the first candidate region, the display objects in the interface image are identified to obtain the coordinate information of S display objects, S is an integer larger than or equal to 1, the first target region occupied by a super-frame object is determined in the first candidate regions according to the coordinate information of the first candidate regions, the coordinate information of the second candidate regions and the coordinate information of the S display objects, and the second target region occupied by the super-frame object is determined in the second candidate regions, wherein at least part of the super-frame object is a display object of which the area exceeds a preset display boundary in the interface image, and the super-frame object is a part of the super-frame object which exceeds the preset display boundary; the second target area meets the following two conditions that a first overlapping area exists between the second target area and at least one display object in the S display objects, a second overlapping area exists between the second target area and the first target area, and the second overlapping area is positioned at the edge of the first target area.
2. The method of claim 1, wherein sliding the interface image with at least one detection frame to determine N detection areas in the interface image and to determine detection results of the N detection areas, respectively, comprises: Extracting K features with different scales from the interface image to obtain feature images corresponding to the K scales respectively, wherein K is an integer greater than 1; Sliding the feature map corresponding to the ith scale by adopting at least one detection frame so as to determine the feature map corresponding to the ith scale A plurality of detection areas and respectively determine the The detection results of the detection areas are that Is an integer greater than 1; wherein i takes K, K-1, K-2, and 1 sequentially, the following 。
3. The method of claim 2, wherein when i < K, sliding the feature map corresponding to the ith scale with at least one detection frame to determine from the feature map corresponding to the ith scale A plurality of detection areas and respectively determine the The detection results of the detection areas comprise: Carrying out fusion treatment on the feature images corresponding to the ith scale and the feature images corresponding to the (i+1) th scale to obtain a feature image corresponding to the ith scale after fusion; Sliding the feature map corresponding to the fused ith scale by adopting at least one detection frame so as to determine the feature map corresponding to the fused ith scale A plurality of detection areas and respectively determine the Detection results of the individual detection areas.
4. The method of claim 3, wherein the ith scale is greater than the (i+1) th scale, and wherein the fusing of the feature map corresponding to the ith scale and the feature map corresponding to the (i+1) th scale to obtain the fused feature map corresponding to the ith scale comprises: carrying out up-sampling treatment on the feature map corresponding to the i+1th scale to obtain a sampling feature map, wherein the size of the sampling feature map is the same as that of the feature map corresponding to the i scale; And carrying out fusion processing on the sampling feature map and the feature map corresponding to the ith scale to obtain the feature map corresponding to the ith scale after fusion.
5. The method of claim 4, wherein the first target region satisfies two conditions: a first overlapping area exists between the first target area and at least one display object in the S display objects; And a second overlapping area exists between the first target area and the second target area, and the second overlapping area is positioned at the edge of the first target area.
6. The method of any one of claims 1 to 5, wherein sliding the interface image with at least one detection frame to determine N detection areas in the interface image, and determining detection results of the N detection areas, respectively, includes: sliding the interface image by adopting at least one detection frame through a preset model to determine N detection areas in the interface image, and respectively determining detection results of the N detection areas; According to the detection results of the N detection areas, determining a first target area occupied by the super-frame object in the interface image, and/or determining a second target area occupied by the super-frame part of the super-frame object, wherein the method comprises the following steps: determining a first target area occupied by the super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to detection results of the N detection areas through the preset model; The preset model is obtained by training a plurality of groups of training samples, and each group of training samples comprises a sample image, an area occupied by a super-frame object in the sample image and an area occupied by a super-frame part of the super-frame object in the sample image.
7. The method of any one of claims 1 to 5, wherein the display object is any one of text, an icon, an image, and an interface element.
8. An image processing apparatus, comprising: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an interface image corresponding to a first user interface, and the interface image comprises at least one display object; The detection module is used for sliding in the interface image by adopting at least one detection frame to determine N detection areas in the interface image and respectively determine detection results of the N detection areas, wherein the detection result of each detection area comprises coordinate information of the detection area, first probability that the detection area is an area occupied by a super-frame object and second probability that the detection area is an area occupied by a super-frame part of the super-frame object, and N is an integer larger than 1; The display device comprises a determination module, a display object identification module, a display object overlapping module and a display object overlapping module, wherein the determination module is used for determining a plurality of first candidate areas and a plurality of second candidate areas in the N detection areas according to the first probability and the second probability corresponding to the N detection areas, the first probability corresponding to the first candidate areas is larger than a first threshold value, the second probability corresponding to the second candidate areas is larger than a second threshold value, the display object in the interface image is identified to obtain coordinate information of an S display object, the S is an integer larger than or equal to 1, according to the coordinate information of the plurality of first candidate areas, the coordinate information of the plurality of second candidate areas and the coordinate information of the S display object, a first target area occupied by a super-frame object is determined in the plurality of first candidate areas, the second target area occupied by the super-frame object is determined in the plurality of second candidate areas, the super-frame object is at least partially overlapped with the first target area, the super-frame object is at least partially overlapped with the second target area in the interface image, and the super-frame object is partially overlapped with the second target area.
9. An electronic device is characterized by comprising a processor and a memory; The memory stores computer-executable instructions; The processor executes computer-executable instructions stored in the memory, causing the processor to perform the image processing method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which computer-executable instructions are stored, which, when executed by a processor, implement the image processing method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program which, when executed by a processor, implements the image processing method according to any one of claims 1 to 7.

Description

Image processing method, device, apparatus, storage medium, and program Technical Field The embodiment of the disclosure relates to the technical field of artificial intelligence, in particular to an image processing method, an image processing device, a storage medium and a program. Background With the development of technology, man-machine interaction is becoming more and more common. In the process of a User interacting with an application program in an electronic device, an operation Interface that the User intuitively faces is generally called a User Interface (UI). Various interface elements are included in the user interface, such as views, windows, dialog boxes, menus, buttons, tabs, and the like. Text may be displayed on some of the interface elements. In some cases, a text hyper-box (i.e., text that exceeds interface element boundaries) may occur. This may affect the aesthetic appearance of the interface on the one hand and may also mask other interface elements on the other hand. At present, whether a text hyper-box exists in a user interface is mainly checked by naked eyes, so that the detection efficiency is low, and a great deal of labor cost and time cost are required. Disclosure of Invention The embodiment of the disclosure provides an image processing method, an image processing device, a storage medium and a program, which are used for improving the detection efficiency of a super-frame object, thereby reducing the labor cost and the time cost. In a first aspect, an embodiment of the present disclosure provides an image processing method, including: acquiring an interface image corresponding to a first user interface, wherein the interface image comprises at least one display object; Sliding at least one detection frame in the interface image to determine N detection areas in the interface image and respectively determine detection results of the N detection areas, wherein the detection result of each detection area comprises coordinate information of the detection area, first probability that the detection area is an area occupied by a super-frame object and/or second probability that the detection area is an area occupied by a super-frame part of the super-frame object, and N is an integer larger than 1; And determining a first target area occupied by a super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to the detection results of the N detection areas, wherein the super-frame object is a display object of which at least part of the area in the interface image exceeds a preset display boundary, and the super-frame part is a part of the super-frame object exceeding the preset display boundary. In a second aspect, an embodiment of the present disclosure provides an image processing apparatus including: the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an interface image corresponding to a first user interface, and the interface image comprises at least one display object; the detection module is used for sliding in the interface image by adopting at least one detection frame to determine N detection areas in the interface image and respectively determine detection results of the N detection areas, wherein the detection result of each detection area comprises coordinate information of the detection area, first probability that the detection area is an area occupied by a super-frame object and/or second probability that the detection area is an area occupied by a super-frame part of the super-frame object, and N is an integer larger than 1; The display device comprises a determination module, a display module and a display module, wherein the determination module is used for determining a first target area occupied by a super-frame object and/or a second target area occupied by a super-frame part of the super-frame object in the interface image according to detection results of the N detection areas, wherein the super-frame object is a display object of which at least part of the area in the interface image exceeds a preset display boundary, and the super-frame part is a part of the super-frame object exceeding the preset display boundary. In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor and a memory; The memory stores computer-executable instructions; The processor executes computer-executable instructions stored in the memory, causing the processor to perform the image processing method as described in the first aspect above. In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, implement the image processing method as described in the first aspect above. In a fifth aspect, embodiments of the present disclosure provide a c