CN-121366431-B - Large-model-based traditional Chinese painting image hierarchical semantic extraction and visualization method and system

CN121366431BCN 121366431 BCN121366431 BCN 121366431BCN-121366431-B

Abstract

The invention discloses a traditional Chinese painting image hierarchical semantic extraction and visualization method and system based on a large model. According to the invention, the facial features, the attitude features and the co-occurrence features of the characters in the traditional Chinese painting image are respectively extracted from three layers of microcosmic, mesoscopic and macroscopic layers, then, based on the extracted three-layer structural features, a visual language model and a large language model are introduced to carry out multi-level semantic extraction, so that the problem that the semantic information extraction dimension in the prior art is single is effectively solved, the technical limitation that the traditional method relies on shallow visual features to difficultly model deep semantics is overcome, the understanding capability of historical context and artistic intention contained in the traditional Chinese painting is improved, and in addition, the structural features and semantic features of the traditional Chinese painting image are displayed through a three-layer feature clustering view, a semantic association view, a time evolution analysis view and a detail presentation view, an interactive analysis platform and a method are provided for a user, and the user is assisted in carrying out multi-level semantic deep mining and exploration on the traditional Chinese painting image.

Inventors

CHEN WEI
GU XIAOYAN
ZHANG WEI
WANG YIFANG

Assignees

浙江大学

Dates

Publication Date: 20260508
Application Date: 20251223

Claims (6)

1. A Chinese painting image hierarchical semantic extraction and visualization method based on a large model is characterized by comprising the following steps: carrying out multi-level structural feature extraction on the traditional Chinese painting image, and respectively extracting facial features, gesture features and co-occurrence features of people in the traditional Chinese painting image; Performing multi-level semantic extraction on the traditional Chinese painting image based on the visual language model and the large language model to obtain the theme category and the feature category of the traditional Chinese painting image; Based on the multi-level structure feature extraction result and the multi-level semantic extraction result, constructing and displaying a three-layer feature clustering view, a semantic association view, a time evolution analysis view and a detail presentation view; The method for extracting the facial features of the people in the traditional Chinese painting image comprises the following steps: Selecting 42 key points of a face from a face 68 key point labeling system as target key points, wherein the 42 key points comprise 18-27 key points, 37-48 key points and 49-68 key points in the face 68 key point labeling system, and performing face key point detection on a traditional Chinese painting image by using the target key points as detection targets and using a face key point estimation model to obtain face key points of characters in the traditional Chinese painting image; performing facial feature calculation by using the acquired coordinates of the facial key points to obtain final facial features; the method for extracting the co-occurrence characteristics of the characters in the traditional Chinese painting image comprises the following steps: Performing entity detection on the traditional Chinese painting image by using a target detection model to obtain an object image entity in the traditional Chinese painting image; screening the obtained object image entities, removing non-integral object image entities and reserving integral object image entities; Combining the screened object image entities into object image entity pairs in pairs, and filtering out frequent object image entity pairs with insufficient information quantity; using euclidean distance as a representation of the spatial relationship between pairs of object and image entities, the final co-occurrence feature is represented as: , Wherein, the And (3) with Is the name of the object and image entity in the traditional Chinese painting image, Is that And (3) with The relative distance between the two components after normalization, A set of object image entities acquired by entity detection; the method comprises the steps of performing multi-level semantic extraction on the traditional Chinese painting image based on the visual language model and the large language model to obtain the theme category and the characteristic category of the traditional Chinese painting image, and comprises the following steps: Based on the multi-level structure feature extraction result, generating natural language description of the traditional Chinese painting image by using a visual language model; Based on natural language description of the traditional Chinese painting image, clustering modeling is carried out on semantic topics in the traditional Chinese painting image by using a large language model, so as to obtain a topic feature classification system; and automatically classifying and explaining the traditional Chinese painting image based on the theme feature classification system to obtain the theme category and the feature category of the traditional Chinese painting image.
2. The method for hierarchical semantic extraction and visualization of chinese painting images based on large models according to claim 1, wherein the step of performing facial feature calculation using the coordinates of the obtained facial key points to obtain final facial features comprises the steps of: Using the formula Calculating the inclination of the eyebrows, eyes and mouth, wherein, To select the direction angle between the facial keypoints, ; Using the formula Calculating the expression level, wherein, Respectively representing the opening and closing size of eyes and the opening and closing size of mouth; The final facial features are expressed as: 。
3. The large model-based hierarchical semantic extraction and visualization method for traditional Chinese painting images according to claim 1, wherein the extraction of the gesture features of the characters in the traditional Chinese painting images comprises the following steps: using 17 human body key points as detection targets, using a human body key point estimation model to detect human body key points of the traditional Chinese painting image, and obtaining skeleton key points of characters in the traditional Chinese painting image, wherein each skeleton key point is expressed as Wherein , And Respectively horizontal and vertical coordinates; by connecting skeletal key points And The Euclidean distance between the two is used for calculating the bone length, and the formula is as follows: ; Wherein E represents a predefined skeletal keypoint connection relationship; the bone orientation is calculated by calculating the direction angle between the connected bone keypoint pair and the horizontal axis, the formula is: ; the final gesture feature is expressed as: 。
4. The method for hierarchical semantic extraction and visualization of traditional Chinese painting images based on large models as claimed in claim 1, wherein the natural language description based on the traditional Chinese painting images uses large language models to perform clustering modeling on semantic subjects in the traditional Chinese painting images to obtain a subject feature classification system, and comprises the following steps: Carrying out random batch sampling on the traditional Chinese painting image data and extracting candidate subjects; Calling LLM to perform semantic topic clustering on each batch of Chinese painting images, and generating a clustering label and a classification reason; summarizing all the extracted candidate topics of the batch and re-clustering and optimizing; Merging similar topic labels, removing fuzzy topic labels, and constructing an initial topic feature classification system; and optimizing and updating the initial theme feature classification system by means of expert manual verification and model auxiliary optimization to obtain a final theme feature classification system.
5. A Chinese painting image hierarchical semantic extraction and visualization system based on a large model is characterized by comprising the following steps: The multi-level structure feature extraction device is used for extracting multi-level structure features of the traditional Chinese painting image and extracting facial features, gesture features and co-occurrence features of characters in the traditional Chinese painting image respectively; the multi-level semantic extraction device is used for carrying out multi-level semantic extraction on the traditional Chinese painting image based on the visual language model and the large language model to obtain the theme category and the characteristic category of the traditional Chinese painting image; The multi-level semantic visualization device is used for constructing and displaying three-layer feature clustering views, semantic association views, time evolution analysis views and detail presentation views based on the multi-level structure feature extraction results and the multi-level semantic extraction results; wherein, multilayer structure characteristic extraction element is specifically used for: Selecting 42 key points of a face from a face 68 key point labeling system as target key points, wherein the 42 key points comprise 18-27 key points, 37-48 key points and 49-68 key points in the face 68 key point labeling system, using the target key points as detection targets, using a face key point estimation model to detect the face key points of the traditional Chinese painting image to obtain the face key points of the characters in the traditional Chinese painting image, and using the coordinates of the obtained face key points to calculate the face characteristics to obtain final face characteristics; The method comprises the steps of carrying out entity detection on traditional Chinese painting images by using a target detection model to obtain object image entities in the traditional Chinese painting images, screening the obtained object image entities to remove non-integral object image entities and reserve integral object image entities, combining the screened object image entities into object image entity pairs in pairs, filtering out frequent object image entity pairs with insufficient information quantity, using European distance as a spatial relation representation between the object image entity pairs, and finally representing co-occurrence characteristics as follows: , wherein, And (3) with Is the name of the object and image entity in the traditional Chinese painting image, Is that And (3) with The relative distance between the two components after normalization, A set of object image entities acquired by entity detection; the multi-level semantic extraction device is specifically used for: Based on the multi-level structure feature extraction result, generating natural language description of the traditional Chinese painting image by using a visual language model; Based on natural language description of the traditional Chinese painting image, clustering modeling is carried out on semantic topics in the traditional Chinese painting image by using a large language model, so as to obtain a topic feature classification system; and automatically classifying and explaining the traditional Chinese painting image based on the theme feature classification system to obtain the theme category and the feature category of the traditional Chinese painting image.
6. The large model based hierarchical semantic extraction and visualization system for chinese painting images according to claim 5, wherein the multi-level semantic visualization device comprises: The feature distribution view module is used for respectively generating a facial feature clustering sub-view, a gesture feature clustering sub-view and a co-occurrence feature clustering sub-view based on the facial features, the gesture features and the co-occurrence features of the characters in the traditional Chinese painting image extracted by the multi-level structure feature extraction device; The natural language search module is used for generating a natural language search frame, displaying the natural language search frame in the three-layer feature cluster view, performing feature matching based on natural language texts input in the natural language search frame, and returning to the most relevant traditional Chinese painting image set according to a feature matching result; The semantic association view generation module is used for generating the semantic association view based on the multi-level structure feature extraction result and the multi-level semantic extraction result; The time evolution analysis view generation module is used for generating the time evolution analysis view based on the multi-level structural feature extraction result, the multi-level semantic extraction result and the creation year of the traditional Chinese painting works; and the detail presentation view generation module is used for generating the detail presentation view based on the multi-level semantic extraction result and the metadata of the traditional Chinese painting works.

Description

Large-model-based traditional Chinese painting image hierarchical semantic extraction and visualization method and system Technical Field The invention relates to the technical field of image data processing, in particular to a traditional Chinese painting image hierarchical semantic extraction and visualization method and system based on a large model. Background The traditional Chinese painting is drawn by using tool materials such as writing brush, ink, rice paper and the like. The traditional Chinese painting is designed to be artistic conception, charm and vividness, and the subject material covers various categories such as mountain water, flowers, birds and figures. The traditional Chinese painting is an important form of Chinese cultural inheritance and is widely applied to the fields of art exhibition, education transmission, culture digitization and the like. In traditional Chinese painting works, people are used as core visual elements and are commonly used for expressing the intention of authors, describing historical scenes and conveying social concepts, and key semantics and context information are often carried in the whole painting. Therefore, in the process of intelligent understanding and deep digital modeling of traditional Chinese painting, character elements in the painting and multi-level semantics contained in the painting are accurately identified and analyzed, and the method has important value. In order to realize understanding of artistic images, the current technology mainly adopts traditional computer vision methods, such as methods of image classification, style identification, target detection and the like. Most of these methods rely on low-level visual features such as color, texture, contours, etc., or use convolutional neural networks to extract depth representations for style analysis or image generation tasks. In the analysis of character elements, part of the methods introduce emotion recognition and gesture estimation techniques, such as extracting key points by OpenPose to judge the character motion state, or recognizing facial expressions and emotion signals through local features. These methods have primarily achieved a static structural perception of the person in the picture. In addition, in recent years, a Visual Language Model (VLM) has emerged, and a new method is provided for the fusion of graphic information. Models such as CLIP are trained through massive image-text pairing so that images and natural language can be aligned in a unified semantic space. The model is gradually applied to classification and retrieval tasks of museum collections, and semantic structuring capability and accessibility of artwork are improved to a certain extent. Meanwhile, in a system facing image content interpretation, the DARK system provides tool support for researchers to identify common composition modes in ancient images by detecting repeated themes and symbolic structures in the images, the InTaVia system builds a cross-country cultural knowledge graph to assist experts in establishing semantic connection among historical characters, events and visual arts and supporting cultural narrative reconstruction among cross-works, the Virtual Rosetta project tries to visually cluster the historical images, the immersive display effect is enhanced by means of Virtual reality, and the Arnold system faces to display and management of a multi-dimensional image set and helps users organize and classify cultural image resources. Although the above system provides some analysis support in the temporal, spatial and style dimensions of artistic images, most of its functions still rely on external metadata of the images (such as year of creation, location of works, artist information, etc.), lacking systematic modeling capabilities for semantic structures inside the images (such as character expressions, action states, relationships of characters to the background, etc.). Meanwhile, the systems generally lack mechanisms for carrying out structural extraction and semantic reconstruction on key visual elements (particularly character elements) in images, and are difficult to meet deep exploration demands of users in the aspects of cultural understanding, style analysis, narrative reasoning and the like. In summary, in terms of semantic understanding and visual analysis of traditional Chinese painting images, the technical problems still remain to be solved, namely, firstly, semantic information extraction dimension is single, multi-level structural features such as facial expressions, body gestures and co-occurrence relations with background objects of people are often ignored, secondly, a unified modeling mechanism of semantic features is lacked, semantic expressions are scattered and do not exist, so that semantic expressions lack layering and integrity, thirdly, the contained semantic of traditional Chinese painting works highly depend on specific historical cultural backgrounds, the tradit