US-12625910-B2 - Hand-drawing shape-based document retrieval

US12625910B2US 12625910 B2US12625910 B2US 12625910B2US-12625910-B2

Abstract

The present disclosure provides methods, apparatuses and computer program products for hand-drawing shape-based document retrieval. An input hand-drawing shape may be obtained. A hand-drawing shape feature of the hand-drawing shape may be extracted through a feature extracting model. At least one target document may be retrieved by using the hand-drawing shape feature and a feature index library associated with a plurality of candidate documents, at least one document page in the target document locally matching the hand-drawing shape.

Inventors

Ran Bi
Yingxia LI
Yuwang Wang
Xiaoyi Zhang

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260512
Application Date: 20221123
Priority Date: 20220127

Claims (20)

1 . A method for hand-drawing shape-based document retrieval, comprising: obtaining an input arbitrary non-textual hand-drawing shape comprising at least one geometric or graphical element that is not a handwritten word or text; extracting, through a feature extracting model without using word recognition or text-based processing, a hand-drawing shape feature of the hand-drawing shape; preprocessing the hand-drawing shape, to scale the hand-drawing shape to a predetermined size; and retrieving at least one target document by using the hand-drawing shape feature and a feature index library associated with a plurality of candidate documents, at least one document page in the target document locally matching the hand-drawing shape by comparing the hand-drawing shape feature to a document page image feature in a local region of the document page image, wherein the at least one target document is retrieved based on a match between the hand-drawing shape feature and a target document shape feature in the feature index library associated with the at least one document, and wherein the retrieval is performed without recognizing or transcribing any words or text in the input hand-drawing shape.
2 . The method of claim 1 , wherein the extracting a hand-drawing shape feature comprises: extracting the hand-drawing shape feature of the preprocessed hand-drawing shape.
3 . The method of claim 1 , wherein the feature index library comprises a plurality of data items respectively corresponding to a plurality of document page images, and each data item at least comprises: a document page image feature extracted from a document page image corresponding to a document page in a candidate document; and an index of the document page image.
4 . The method of claim 3 , wherein the retrieving at least one target document comprises: calculating a similarity value between the hand-drawing shape feature and each document page image feature in the feature index library; ranking the plurality of document page images based on similarity values; and selecting at least one candidate document corresponding to at least one highest-ranked document page image, as the at least one target document.
5 . The method of claim 1 , further comprising: for each candidate document in the plurality of candidate documents, obtaining a group of document page images corresponding to the candidate document, each document page image in the group of document page images corresponding to a document page in a group of document pages included in the candidate document; and establishing, at least through the feature extracting model, the feature index library based on a plurality of groups of document page images respectively corresponding to the plurality of candidate documents.
6 . The method of claim 5 , wherein the establishing the feature index library comprises, for each document page image: extracting, through the feature extracting model, a document page image feature of the document page image; and storing the document page image feature and an index of the document page image together into the feature index library.
7 . The method of claim 6 , further comprising: preprocessing the document page image, to scale the document page image to a predetermined size, wherein the extracting a document page image feature comprises: extracting the document page image feature of the preprocessed document page image.
8 . The method of claim 1 , wherein the feature extracting model is based on deep convolutional neural network.
9 . The method of claim 1 , further comprising: training the feature extracting model by using a training dataset which includes a group of training samples, each training sample at least comprising a hand-drawing shape sample, a document page image sample and a similarity label.
10 . The method of claim 9 , further comprising: performing data augmentation to at least one training sample in the training dataset, to randomly change a hand-drawing shape sample and a document page image sample in the training sample.
11 . The method of claim 10 , wherein the data augmentation comprises at least one of: random cropping and scaling; random region erasing; random perspective converting; and random color jittering.
12 . The method of claim 1 , further comprising: presenting a retrieval result about the document page and/or the target document.
13 . The method of claim 1 , wherein the plurality of candidate documents are personal documents of a user and/or public documents on a network.
14 . An apparatus for hand-drawing shape-based document retrieval, comprising: at least one processor; and a memory storing computer-executable instructions that, when executed, cause the at least one processor to: obtain an input arbitrary non-textual hand-drawing shape comprising at least one geometric or graphical element that is not a handwritten word or text; extract, through a feature extracting model without using word recognition or text-based processing, a hand-drawing shape feature of the hand-drawing shape; preprocess the hand-drawing shape, to scale the hand-drawing shape to a predetermined size; and retrieve at least one target document by using the hand-drawing shape feature and a feature index library associated with a plurality of candidate documents, at least one document page in the target document locally matching the hand-drawing shape by comparing the hand-drawing shape feature to a document page image feature in a local region of the document page image, wherein the at least one target document is retrieved based on a match between the hand-drawing shape feature and a target document shape feature in the feature index library associated with the at least one document, and wherein the retrieval is performed without recognizing or transcribing any words or text in the input hand-drawing shape.
15 . The apparatus of claim 14 , wherein the instructions to extract a hand-drawing shape feature comprises instructions to extract the hand-drawing shape feature of the preprocessed hand-drawing shape.
16 . The apparatus of claim 14 , wherein the feature index library comprises a plurality of data items respectively corresponding to a plurality of document page images, and each data item at least comprises: a document page image feature extracted from a document page image corresponding to a document page in a candidate document; and an index of the document page image.
17 . At least one non-transitory machine-readable medium comprising instructions for hand-drawing shape-based document retrieval that, when executed by at least one processor, cause the at least one processor to: obtain an input arbitrary non-textual hand-drawing shape comprising at least one geometric or graphical element that is not a handwritten word or text; extract, through a feature extracting model without using word recognition or text-based processing, a hand-drawing shape feature of the hand-drawing shape; preprocess the hand-drawing shape, to scale the hand-drawing shape to a predetermined size; and retrieve at least one target document by using the hand-drawing shape feature and a feature index library associated with a plurality of candidate documents, at least one document page in the target document locally matching the hand-drawing shape by comparing the hand-drawing shape feature to a document page image feature in a local region of the document page image, wherein the at least one target document is retrieved based on a match between the hand-drawing shape feature and a target document shape feature in the feature index library associated with the at least one document, and wherein the retrieval is performed without recognizing or transcribing any words or text in the input hand-drawing shape.
18 . The at least one non-transitory machine-readable medium of claim 17 , wherein the instructions to extract a hand-drawing shape feature comprises instructions to extract the hand-drawing shape feature of the preprocessed hand-drawing shape.
19 . The at least one non-transitory machine-readable medium of claim 17 , wherein the feature index library comprises a plurality of data items respectively corresponding to a plurality of document page images, and each data item at least comprises: a document page image feature extracted from a document page image corresponding to a document page in a candidate document; and an index of the document page image.
20 . The at least one non-transitory machine-readable medium of claim 17 , further comprising instructions that cause the at least one processor to: for each candidate document in the plurality of candidate documents, obtain a group of document page images corresponding to the candidate document, each document page image in the group of document page images corresponding to a document page in a group of document pages included in the candidate document; and establish, at least through the feature extracting model, the feature index library based on a plurality of groups of document page images respectively corresponding to the plurality of candidate documents.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a U.S. National Stage Filing under 35 U.S.C. 371 of International Patent Application Serial No. PCT/US2022/050834, filed Nov. 23, 2022, and published as WO 2023/146625 A1 on Aug. 3, 2023, which claims priority to Chinese Application No. 202210100572.1, filed Jan. 27, 2022, which applications and publication are incorporated herein by reference in their entirety. BACKGROUND Users of various computing environments may desire to retrieve or search for information or content of interest through information retrieval services. For example, a user of an operating system may desire to find a specific document in local storage. Operating systems generally provide a document retrieval service for document retrieval so as to help users to find documents of interest locally. Moreover, for example, a network user may desire to use a search engine to find webpage content of interest on the network. A search engine may provide a search service to return search results in response to search queries from users. Some search engines may provide an image search service for image retrieval, which can return image search results matching an input image based on the input image provided by a user. SUMMARY This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Embodiments of the present disclosure propose methods, apparatuses and computer program products for hand-drawing shape-based document retrieval. An input hand-drawing shape may be obtained. A hand-drawing shape feature of the hand-drawing shape may be extracted through a feature extracting model. At least one target document may be retrieved by using the hand-drawing shape feature and a feature index library associated with a plurality of candidate documents, at least one document page in the target document locally matching the hand-drawing shape. It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various manners in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents. BRIEF DESCRIPTION OF THE DRAWINGS The disclosed aspects will hereinafter be described in conjunction with the appended drawings that are provided to illustrate and not to limit the disclosed aspects. FIG. 1 illustrates an example of a hand-drawing shape and a document page image according to an embodiment. FIG. 2 illustrates an exemplary process for hand-drawing shape-based document retrieval according to an embodiment. FIG. 3 illustrates an exemplary process for establishing a feature index library according to an embodiment. FIG. 4 illustrates an exemplary process for training a feature extracting model according to an embodiment. FIG. 5A to FIG. 5F illustrate examples of data augmentation according to embodiments. FIG. 6A to FIG. 6C illustrate an exemplary user interface according to an embodiment. FIG. 7 illustrates a flowchart of an exemplary method for hand-drawing shape-based document retrieval according to an embodiment. FIG. 8 illustrates an exemplary apparatus for hand-drawing shape-based document retrieval according to an embodiment. FIG. 9 illustrates an exemplary apparatus for hand-drawing shape-based document retrieval according to an embodiment. DETAILED DESCRIPTION The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure. Generally, a document retrieval service provided by an operating system may receive a keyword input by a user and return a list of documents containing the keyword. Such document retrieval service usually can only support keyword-based document retrieval, but cannot support image retrieval, i.e., cannot support performing document retrieval based on input images. Moreover, generally, an image search service provided by a search engine may receive an input image uploaded by a user as a search query, then search for an image matching the input image on the network, and provide a search result that contains the searched image or a link to it. Such image search service is based on image global matching, which aims to find an image that is exactly the same or as similar as possible to the input image, e.g., the entirety of the searched ima