CN-116975343-B - Interface image labeling method and device and electronic equipment

CN116975343BCN 116975343 BCN116975343 BCN 116975343BCN-116975343-B

Abstract

The embodiment of the disclosure provides a method, a device and electronic equipment for labeling interface images, wherein the method comprises the steps of carrying out aggregation treatment on an interface image set to be labeled to obtain at least one sub-image set, wherein the similarity between the interface images in each sub-image set is larger than or equal to a similarity threshold value, selecting one interface image in each sub-image set as a reference interface image, receiving labeling instructions of a user on objects to be labeled in each reference interface image to obtain labeling results, and labeling the rest interface images in at least one sub-image set by adopting the labeling results. A part of the interface images can be marked in a manual marking mode, and the marked interface images are used for marking the unmarked interface images, so that marking accuracy can be improved.

Inventors

ZHANG GONG
GAO YONGQIANG
YANG PING

Assignees

北京字节跳动网络技术有限公司

Dates

Publication Date: 20260508
Application Date: 20220420

Claims (17)

1. A method for labeling an interface image, comprising: The method comprises the steps of carrying out aggregation treatment on interface image sets to be marked by adopting different similarity thresholds to obtain a plurality of sub-image sets corresponding to each similarity threshold, wherein the similarity between interface images in each sub-image set corresponding to each similarity threshold is greater than or equal to the corresponding similarity threshold; Selecting an interface image from each sub-image set corresponding to the first similarity threshold as a first reference interface image, wherein the first similarity threshold is the minimum value in the different similarity thresholds; Receiving a first labeling instruction of a user on an object to be labeled in each first reference interface image, and obtaining a first labeling result of the first similarity threshold; selecting an interface image from each second sub-image set corresponding to the second similarity threshold as a second reference interface image, and selecting an interface image from each third sub-image set corresponding to the third similarity threshold as a third reference interface image, wherein the second similarity threshold is smaller than the third similarity threshold; Marking a second reference interface image in each second sub-image set by adopting the first marking result to obtain a second marking result of the second similarity threshold; marking each third reference interface image by adopting the second marking result; and acquiring the mixed similarity of each third reference interface image and each other interface image, and marking the other interface images in each third sub-image set according to the mixed similarity.
2. The method of claim 1, wherein the aggregating the interface image sets to be annotated with different similarity thresholds to obtain a plurality of sub-image sets corresponding to each similarity threshold includes: Performing aggregation processing on the interface images with similarity threshold values larger than or equal to the first similarity threshold value in the interface image set to be marked to obtain a plurality of first sub-image sets corresponding to the first similarity threshold value; Performing aggregation processing on interface images with similarity threshold values larger than or equal to second similarity threshold values in a plurality of first sub-image sets corresponding to the first similarity threshold values to obtain a plurality of second sub-image sets corresponding to the second similarity threshold values, wherein the first similarity threshold values are smaller than the second similarity threshold values; And in the plurality of second sub-image sets corresponding to the second similarity threshold, carrying out aggregation processing on the interface images with the similarity threshold larger than or equal to a third similarity threshold to obtain a plurality of third sub-image sets corresponding to the third similarity threshold.
3. The method according to claim 2, wherein selecting an interface image as the first reference interface image in each sub-image set corresponding to the first similarity threshold includes: taking any one of the interface images in each of the first sub-image sets as a first reference interface image of each of the first sub-image sets, or And taking the interface image with the largest number of similarity larger than or equal to the first similarity threshold value in each first sub-image set as a first reference interface image of each first sub-image set.
4. The method of claim 1, wherein before labeling the second reference interface image in each second sub-image set with the first labeling result, further comprises: obtaining the similarity between each second reference interface image and each first reference interface image; acquiring whether the number of the similarity which is larger than or equal to the first preset similarity is 0 or not in the similarity of each second reference interface image and each first reference interface image; the marking of the second reference interface image in each second sub-image set by adopting the first marking result comprises the following steps: And marking the second reference interface images in each second sub-image set by adopting the first marking result in response to the number not being 0.
5. The method of claim 4, wherein the first labeling includes a position of at least one first region in each first reference interface image, and wherein the obtaining the similarity between each second reference interface image and each first reference interface image includes: Acquiring first similarity of each second reference interface image and each first reference interface image, wherein the first similarity is overall similarity between the interface images; acquiring second similarity at positions corresponding to the first areas in each first reference interface image in each second reference interface image; And acquiring the similarity between each second reference interface image and each first reference interface image according to the first similarity and the second similarity.
6. The method of claim 5, wherein labeling the second reference interface image in each second sub-image set with the first labeling result in response to the number not being 0, comprises: And in response to the number being greater than the preset number, marking the target second reference interface image by adopting a marking result of the target first reference interface image, wherein the first similarity of the target second reference interface image and the target first reference interface image is greater than or equal to the second preset similarity, and the second similarity of the target second reference interface image and the target first reference interface image is greater than or equal to the third preset similarity.
7. The method of claim 6, wherein the first labeling results include labels of at least one first region in each first reference interface image, and wherein labeling the target second reference interface image using the first labeling results includes: Taking the position of each first area in the target first reference interface image as a key value of a first labeling dictionary, and taking a first p-hash value and a label of each first area in the target first reference interface image as a value of the first labeling dictionary to construct the first labeling dictionary; Traversing each first region in the first labeling dictionary, and acquiring a third similarity between a second p-hash value at a position corresponding to each first region in the target second reference interface image and a first p-hash value at the position of each first region; And responding to the third similarity being greater than the third preset similarity, and marking a second region corresponding to the position of each first region in the target second reference interface image according to the label of each first region.
8. The method of claim 7, wherein the third similarity is a plurality; labeling a second region corresponding to the position of each first region in the target second reference interface image according to the label of each first region, including: And taking the label of each first area corresponding to the maximum third similarity as the label of the second area corresponding to each first area.
9. The method of claim 8, wherein said traversing each first region in said first annotation dictionary comprises: traversing each first area in the first labeling dictionary by adopting a non-maximum suppression method.
10. The method of claim 6, wherein the method further comprises: Marking each second reference interface image according to a second marking instruction of the user on the object to be marked in each second reference interface image in response to the number of the second reference interface images being 0, or And in response to the number being greater than the preset number, the first similarity between the target second reference interface image and any first reference interface image is smaller than the second preset similarity, and/or the second similarity between the target second reference interface image and any first reference interface image is smaller than the third preset similarity, marking the target second reference interface image according to a third marking instruction of the user on the object to be marked in the target second reference interface image.
11. The method according to any one of claims 1-10, wherein after labeling each third reference interface image with the second labeling result, further comprising: And taking the interface images in the marked interface image set as training samples, training to obtain an identification model, wherein the identification model is used for identifying the object to be marked in the interface images.
12. The method according to any one of claims 1-10, further comprising, before the aggregating the set of interface images to be annotated with different similarity thresholds: And performing frame extraction processing on the video to be marked to obtain the interface image set to be marked.
13. The method of claim 12, wherein the video is a video of a game-like application and the object to be annotated is a user-operable control in an interface image.
14. An apparatus for labeling an interface image, comprising: a processing module for: The method comprises the steps of carrying out aggregation treatment on interface image sets to be marked by adopting different similarity thresholds to obtain a plurality of sub-image sets corresponding to each similarity threshold, wherein the similarity between interface images in each sub-image set corresponding to each similarity threshold is greater than or equal to the corresponding similarity threshold; Selecting an interface image from each sub-image set corresponding to the first similarity threshold as a first reference interface image, wherein the first similarity threshold is the minimum value in the different similarity thresholds; The receiving and transmitting module is used for receiving a first labeling instruction of a user on an object to be labeled in each first reference interface image and obtaining a first labeling result of the first similarity threshold; The processing module is further configured to select an interface image from each second sub-image set corresponding to the second similarity threshold as a second reference interface image, and select an interface image from each third sub-image set corresponding to the third similarity threshold as a third reference interface image, where the second similarity threshold is smaller than the third similarity threshold, label the second reference interface image in each second sub-image set by using the first labeling result to obtain a second labeling result of the second similarity threshold, label each third reference interface image by using the second labeling result, and obtain a mixed similarity of each third reference interface image and each other interface image, so as to label the other interface images in each third sub-image set according to the mixed similarity.
15. An electronic device comprising at least one processor and a memory; The memory stores computer-executable instructions; The at least one processor executing computer-executable instructions stored in the memory cause the at least one processor to perform the method of any one of claims 1-13.
16. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the method of any one of claims 1-13.
17. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any of claims 1-13.

Description

Interface image labeling method and device and electronic equipment Technical Field The embodiment of the disclosure relates to the technical field of image annotation, in particular to a method, a device and electronic equipment for annotating an interface image. Background The User Interface (UI) of the game application includes text, UI controls, character images, and the like. In an automated testing scenario of the game application, the testing tool can identify the UI through a machine learning model so as to automatically trigger the UI control therein, thereby completing the functional test of the interface of the game application. The machine learning model is trained based on the UI control labeling on the UI of the game application program, so that the labeling of the UI control is particularly important. At present, a UI control on an interface image can be identified by adopting a computer vision technology for marking, but the accuracy of marking in the method depends on the training data amount of a model in the computer vision technology, and the accuracy of marking of the UI control cannot be ensured. Disclosure of Invention The embodiment of the disclosure provides a method, a device and electronic equipment for labeling an interface image, which can improve the accuracy of labeling the interface image. In a first aspect, an embodiment of the present disclosure provides a method for labeling an interface image, including performing aggregation processing on an interface image set to be labeled to obtain at least one sub-image set, where similarity between interface images in each sub-image set is greater than or equal to a similarity threshold, selecting an interface image in each sub-image set as a reference interface image, receiving a labeling instruction of a user on an object to be labeled in each reference interface image to obtain a labeling result, and labeling remaining interface images in the at least one sub-image set by using the labeling result. In a second aspect, an embodiment of the present disclosure provides an apparatus for labeling an interface image, including: The processing module is used for carrying out aggregation processing on the interface image sets to be marked to obtain at least one sub-image set, wherein the similarity between the interface images in each sub-image set is larger than or equal to a similarity threshold value, and one interface image is selected from each sub-image set to serve as a reference interface image. And the receiving and transmitting module is used for receiving the labeling instruction of the user on the object to be labeled in each reference interface image to obtain a labeling result. And the processing module is further used for marking the rest interface images in the at least one sub-image set by adopting the marking result. In a third aspect, an embodiment of the disclosure provides an electronic device, including a processor and a memory, where the memory stores computer-executable instructions, and where the processor executes the computer-executable instructions stored in the memory, so that the processor performs the method for labeling an interface image as described in the first aspect above. In a fourth aspect, an embodiment of the present disclosure provides a computer readable storage medium, where computer executable instructions are stored, and when executed by a processor, implement the method for labeling an interface image according to the first aspect. In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising computer instructions which, when executed by a processor, implement the method of interface image annotation as described in the first aspect above. According to the method, the device and the electronic equipment for labeling the interface images, a part of the interface images can be labeled in a manual labeling mode, the labeled interface images are used for labeling the unlabeled interface images, the labeling accuracy can be improved by means of the matching similarity relationship among the interface images, on the other hand, the labeling accuracy can be ensured by manual labeling, and the unlabeled interface images are labeled by the aid of the manually labeled interface images, so that the labeling accuracy can be improved. Drawings In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art. FIG. 1 is a schematic illustration of an interface image; FIG. 2 is a schematic illustration of two different game interface images; FIG. 3A is a flowchart illu