CN-122019770-A - File classification method, apparatus, storage medium, and program product
Abstract
The embodiment of the application provides a file classification method, equipment, a storage medium and a program product. The method comprises the steps of extracting feature information of a first file stored in a network disk, determining a classification label of the first file by utilizing a classification model according to the feature information of the first file, wherein the classification label is a hierarchical label and comprises a hierarchical label with at least two hierarchical structure relations, and the classification label is used for displaying the first file in a classified mode and/or searching the first file. The method realizes the fine classification of the network disk files, is used for simplifying the operation of managing the files by a user, and improves the operation efficiency.
Inventors
- GU WEI
Assignees
- 优视科技(中国)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251208
Claims (15)
- 1. A method of classifying documents, comprising: Extracting characteristic information of a first file stored to a network disk; And determining a classification label of the first file by using a classification model according to the characteristic information of the first file, wherein the classification label comprises a hierarchical label with at least two levels of hierarchical structure relations, and the classification label is used for classifying and displaying the first file and/or searching the first file.
- 2. The method of claim 1, wherein the characteristic information comprises at least one of type information and information including image information, path information, at least part of file content, metadata information.
- 3. The method according to claim 2, wherein the feature information includes the type information and the image information, the image information including original image data obtained by image decoding the first file; determining the classification label of the first file by using a classification model according to the characteristic information of the first file, including: performing object classification identification on the first file by using the classification model based on the original image data under the condition that the type information is a picture type, so as to obtain an object classification label with at least two levels of hierarchical structure relations; the category label of the first file includes the transaction category label.
- 4. The method of claim 3, wherein the determining the classification tag of the first document using a classification model based on the characteristic information of the first document further comprises: Performing face classification recognition on the first file by using the classification model based on the original image data under the condition that the type information is a picture type to obtain a face classification label, wherein the face classification label is used for indicating whether a face is included or not and face information under the condition that the face is included; The classification tag of the first file includes the face classification tag.
- 5. The method of claim 4, wherein the characteristic information further comprises path information, and the image information further comprises character information obtained by performing optical character recognition on the first file; the determining, according to the feature information of the first file, a classification label of the first file by using a classification model, further includes: When the type information is a picture type and the object classification label comprises a data label, carrying out data classification identification on the first file by utilizing the classification model based on the character information, the original image data and the path information to obtain a data classification label with at least two-stage hierarchical structure relation; the classification tag of the first document further includes the material classification tag.
- 6. The method of claim 3, wherein the characteristic information further comprises metadata information including at least one of a photographing time, a photographing apparatus, and a photographing place, and the category label of the first file further comprises the metadata information.
- 7. The method according to claim 2, wherein the characteristic information includes the type information and the path information, the path information includes a path of the first file and file names of other files in the same level directory as the first file, and the type information of the first file is document information, video information, or audio information; determining the classification label of the first file by using a classification model according to the characteristic information of the first file, including: And determining a classification label of the first file by using the classification model based on the path information in the case that the type information is a document type, a video type or an audio type.
- 8. The method of claim 7, wherein the characteristic information further comprises the at least part of file content; The determining, based on the path information, a classification label of the first file using the classification model includes: determining a classification label of the first file by using the classification model according to the path information and the at least part of file content; The at least part of file content is extracted in the following way: Extracting a text of a preset number of words of the first file or extracting a catalog or abstract of the first file under the condition that the type information is a document type, so as to obtain at least part of file content; Extracting at least part of subtitles of the first file, extracting auxiliary enhancement information in at least part of frames of the first file, or converting at least part of voice of the first file into text to obtain at least part of file content under the condition that the type information is video type; And converting at least part of voice of the first file into text under the condition that the type information is an audio type, so as to obtain at least part of file content.
- 9. The method according to claim 4 or 5, wherein said performing object classification recognition on the first document using the classification model based on the original image data to obtain at least two-stage object classification labels comprises: Inputting the original image data into an object classification model to obtain the at least two-stage object classification labels, wherein the object classification model is obtained by training a plurality of picture samples in batches, and the sample labels of the picture samples in each batch comprise object classification labels corresponding to the object classification of the same highest level and having at least two-stage hierarchical structure relations.
- 10. The method according to any one of claims 1-8, further comprising: acquiring a plurality of second files of the classification labels, wherein the second files comprise target labels, and the target labels are faces or pets; And clustering the plurality of second files to obtain at least one target label cluster.
- 11. The method as recited in claim 10, further comprising: obtaining a third file of the classification tag comprising the target tag, and comparing the similarity between the third file and the at least one target tag cluster; and adding the third file into the target label cluster with highest similarity in the at least one target label cluster, or generating a new target label cluster comprising the third file.
- 12. The method according to any one of claims 1-8, further comprising: Responding to a file searching instruction, and converting a searching text in the file searching instruction into a structured searching label; and retrieving the files matched with the structured search label from the network disk based on the classification labels of the files in the network disk.
- 13. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor; Wherein the memory stores instructions executable by the at least one processor to cause the electronic device to perform the method of any one of claims 1-12.
- 14. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor implement the method of any of claims 1-12.
- 15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-12.
Description
File classification method, apparatus, storage medium, and program product Technical Field The present application relates to the field of artificial intelligence, and in particular, to a method, apparatus, storage medium, and program product for classifying files. Background The network disk, also called network U disk, network hard disk, is a network-based online storage service, and it is essential that the network disk service provider allocates the hardware resources of its server to users for use. The network disk provides the document management functions of storing, sharing, accessing, backing up and the like for the user, and the user can manage and edit the files in the network disk through the network. The types of data stored in the network disk by users are more and more diversified, including files in various forms such as audio and video, pictures and documents, the scale of the data assets is continuously increased, challenges are brought to the management of the users, the users are very inconvenient in searching, editing, classifying and cleaning the files, and the problems of complex operation and low operation efficiency are faced. At present, the mainstream network disk products have basic file classification functions, but most of the network disk products are classified based on simple attributes such as time, place and the like, and still cannot solve the problems of complicated operation and low operation efficiency of users. Disclosure of Invention Aspects of the present application provide a file classification method, apparatus, storage medium, and program product, which implement fine classification of network disk files, so as to simplify the operation of managing files by a user and improve the operation efficiency. In a first aspect, an embodiment of the present application provides a method for classifying files, including: Extracting characteristic information of a first file stored to a network disk; and determining a classification label of the first file by using a classification model according to the characteristic information of the first file, wherein the classification label comprises a hierarchical label with at least two levels of hierarchical structure relations, and the classification label is used for classifying and displaying the first file and/or searching the first file. In some implementations, the characteristic information includes at least one of type information and at least one of image information, path information, at least part of file content, metadata information. In some implementations, the feature information includes the type information and the image information, the image information including original image data obtained by image decoding the first file; determining the classification label of the first file by using a classification model according to the characteristic information of the first file, including: performing object classification identification on the first file by using the classification model based on the original image data under the condition that the type information is a picture type, so as to obtain an object classification label with at least two levels of hierarchical structure relations; the category label of the first file includes the transaction category label. The determining, according to the feature information of the first file, a classification label of the first file by using a classification model, further includes: Performing face classification recognition on the first file by using the classification model based on the original image data under the condition that the type information is a picture type to obtain a face classification label, wherein the face classification label is used for indicating whether a face is included or not and face information under the condition that the face is included; The classification tag of the first file includes the face classification tag. In some implementations, the feature information further includes path information, and the image information further includes character information obtained by performing optical character recognition on the first file; the determining, according to the feature information of the first file, a classification label of the first file by using a classification model, further includes: When the type information is a picture type and the object classification label comprises a data label, carrying out data classification identification on the first file by utilizing the classification model based on the character information, the original image data and the path information to obtain a data classification label with at least two-stage hierarchical structure relation; the classification tag of the first document further includes the material classification tag. In some implementations, the feature information further includes metadata information including at least one of a shooting time, a shooting device, and a shooting location, a