CN-122024144-A - Dynamic advertisement putting method based on terminal side AI real-time content analysis

CN122024144ACN 122024144 ACN122024144 ACN 122024144ACN-122024144-A

Abstract

The invention relates to the technical field of digital advertising, in particular to a dynamic advertising method based on terminal side AI real-time content analysis, which constructs a lightweight scene monitoring model based on an optical flow method and inter-frame difference, calculates the motion significance of a video stream in real time and triggers key frame extraction; the method comprises the steps of performing instance segmentation on a key frame by adopting a main analysis model, generating a pixel level mask and a refined boundary of a target main body, performing region-of-interest clipping and background rejection on the basis of the mask, extracting an explicit discrete label and an implicit high-dimensional visual feature vector in parallel by adopting a hybrid semantic analysis model, capturing texture, material and style attributes of the target, combining an end-side knowledge graph and a visual dominance score, performing semantic consistency check and logic conflict resolution on a recognition result to generate visual feature representation, and finally rendering a visual interaction floating layer with proper self-adaptive transparency and position on a video layer on the basis of histogram analysis and edge detection, and putting digital advertisements to users.

Inventors

LAI YILING
ZHANG CHUNYAN
ZHANG YOULIANG

Assignees

杭州华数智屏信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (10)

1. The dynamic advertisement delivery method based on terminal side AI real-time content analysis is characterized by being executed on a terminal device side and comprising the following steps: Based on the inter-frame difference, constructing a lightweight scene monitoring model by adopting an optical flow method, continuously monitoring a currently played video stream, extracting a key frame from the video stream when the video stream meets a preset trigger condition, and executing main body detection segmentation on the key frame by adopting a main analysis model to determine the boundary of a main body in the key frame; Cutting out a main body close-up image from the key frame according to the boundary, adopting a mixed semantic analysis model to perform forward reasoning on the cut main body close-up image, and outputting semantic characterization data in parallel; And generating a commodity request with privacy protection based on the visual characteristic representation, sending the commodity request to an advertisement server, receiving returned recommended commodity data, and rendering an intelligent interaction floating layer on a video playing interface of the terminal equipment.
2. The method for dynamic advertisement delivery based on real-time content analysis of terminal AI according to claim 1, wherein the judging of the preset triggering condition comprises the following steps: The method comprises the steps of adopting the distribution of the amplitude and the direction of an optical flow field of inter-frame pixel displacement as input characteristics, calculating an optical flow statistical histogram of a current frame and a previous N frames, setting an optical flow amplitude threshold value and a direction entropy threshold value, carrying out joint judgment on the optical flow statistical histogram to detect whether a video picture is in a stable state or not and whether low motion blur exists or not in real time, adopting a lightweight focus tracking algorithm to record the screen residence frame number of the same target, and judging that a stable focus object exists when the residence frame number exceeds a preset threshold value.
3. The method for dynamic advertisement delivery based on real-time content analysis of terminal AI according to claim 1, wherein the lightweight scene monitoring model is a convolutional neural network CNN with the layer number of 3, and the training and using steps comprise: And performing supervision training on the convolutional neural network by adopting a video frame data set marked with an analyzable binary label to ensure that the convolutional neural network outputs continuous analyzable scores between 0 and 1, comprehensively evaluating the stability, the blurring degree, the main body significance and the illumination suitability of the picture, and triggering key frame extraction when the real-time calculated analyzable score is larger than a preset threshold value.
4. The method for dynamic advertisement delivery based on end-side AI real-time content analysis according to claim 1, wherein the main analysis model selects YOLOv n-seg optimized at the mobile end, and the step of determining the boundary of the main body in the key frame specifically comprises: The method comprises the steps of using a main analysis model to execute a single instance segmentation reasoning task on an extracted key frame, outputting pixel level masks and boundary frame coordinates of all identified subjects in a picture, attaching subject category confidence degrees, screening out core subjects with highest confidence degrees and categories belonging to commodities according to the category confidence degrees of the subjects, and determining pixel level boundaries of single target subjects according to the pixel level masks of the core subjects.
5. The method for dynamic advertisement delivery based on end-side AI real-time content analysis of claim 4, further comprising the steps of, after determining the boundaries of the subject: The method comprises the steps of calculating a minimum compact bounding box for all independent main bodies with confidence degrees exceeding a preset threshold value in the same key frame based on independent pixel level masks of the independent main bodies, cutting out corresponding multi-main-body close-up areas from the key frame according to the minimum compact bounding box, optimizing the cut-out close-up areas by morphological operation and edge eclosion, removing background pixel interference caused by expansion of a boundary frame, and generating a main-body close-up image set without background interference.
6. The method for dynamic advertisement delivery based on terminal side AI real-time content analysis according to claim 1, wherein the hybrid semantic analysis model adopts a hybrid semantic feature extraction architecture, and the specific construction and execution steps include: The method comprises the steps of adopting pre-trained MobileCLIP as a backbone network, combining a multi-label dataset in the commodity vertical field to perform migration learning and contrast learning fine adjustment, utilizing a fine-adjustment model to perform forward reasoning on a cut main body close-up image, and outputting two types of semantic characterization data in parallel, wherein the first type is an explicit structured label group and comprises discrete dimension labels of main body classes, brands, colors, styles, materials and scenes for accurate keyword matching, the second type is an implicit dense semantic vector, mapping image features to a high-dimensional continuous vector space, capturing unstructured semantic information comprising antique styles, texture textures and atmosphere features and supporting fuzzy semantic matching based on vector similarity.
7. The method for dynamic advertisement delivery based on end-side AI real-time content analysis according to claim 1, wherein the step of performing semantic logic verification optimization based on the end-side lightweight knowledge-graph specifically comprises: Constructing an end-side lightweight class knowledge graph and a hierarchical classification tree, wherein the hierarchical classification tree defines upper and lower containing relations among labels to normalize semantic granularity of the labels, and the knowledge graph defines partial integral relations among label entities and a co-occurrence probability matrix, and the co-occurrence probability matrix quantifies semantic association strength and logic compatibility of different labels in the same space-time scene.
8. The method of claim 7, wherein the semantic verification and query optimization step further comprises performing multidimensional tag cleansing and logical conflict resolution: Firstly, performing a preliminary screening based on confidence coefficient, and screening out a candidate label set by applying a self-adaptive threshold truncation algorithm according to the confidence coefficient value output by the mixed semantic analysis model; secondly, executing semantic logic verification based on a map, traversing label pairs in the candidate label set, and inquiring the co-occurrence probability matrix; and finally, performing arbitration based on visual dominance, respectively calculating visual dominance scores in the key frames aiming at the label pairs with semantic conflict, wherein the scores are calculated based on pixel occupation area ratio, main body center distance and texture definition weight, retaining the party with higher visual dominance score, and eliminating logic paradox labels to generate visual characteristic representation.
9. The method for dynamic advertisement delivery based on end-side AI real-time content analysis of claim 1, wherein the step of generating the privacy-protected merchandise request comprises: The method comprises the steps of carrying out hash and/or anonymization processing on the generated visual characteristic representation by adopting an SHA-256 algorithm, only reserving tag ID, not uploading any original image and user portrait data, constructing a lightweight JSON request taking the processed tag as a query parameter, and sending the lightweight JSON request to an advertisement server through HTTPS.
10. The method for dynamic advertisement delivery based on terminal AI real-time content analysis according to claim 1, wherein the step of rendering the intelligent interactive floating layer on the video playing interface of the terminal device comprises: Carrying out Alpha channel dynamic fusion on a video frame real-time HSV/brightness histogram and a background layer of an advertisement assembly, and automatically adjusting the transparency of the assembly according to the average brightness of a current picture; and the edge detection is adopted to avoid the human face and caption area, so that the suspension position of the AI assistant is intelligently adjusted and awakened, and the AI assistant is combined with the end-side small language model SLM to provide the user with the interactive intelligent body for advertising.

Description

Dynamic advertisement putting method based on terminal side AI real-time content analysis Technical Field The invention relates to the technical field of digital advertisements, in particular to a dynamic advertisement putting method based on terminal side AI real-time content analysis. Background The general visual recognition system is mainly used for carrying out rough-granularity full-image classification or simple object detection based on a predefined discrete label system, can only recognize basic categories such as 'car', 'clothes', and the like, and lacks the perception capability of fine-granularity visual properties of a target main body, and is characterized in that the texture (such as distinguishing the surface characteristics of leather and canvas) of an object, the design style (such as combining antique and modern visual elements) and the whole light and shadow atmosphere of a picture are difficult to extract effectively. The shallow recognition which is only remained at the category level causes that the system can not distinguish targets with similar appearance and different semantics, label conflict or semantic ambiguity often occurs, and the accurate retrieval requirement based on deep visual features can not be met. In terms of object segmentation and region of interest localization, existing anchor point localization techniques typically frame objects based on rectangular bounding boxes, and this rough geometric approximation cannot fit irregular edges of non-rigid objects (e.g., character contours, fluids), resulting in extracted regions that contain significant amounts of ineffective background pixel noise. When the dynamic visual component is rendered, the positioning error is very easy to cause the floating layer to shade key visual information (such as a subtitle region, a facial expression or gesture action) in the video, so that the composition integrity of a picture is destroyed, and the immersive visual experience of a user is seriously disturbed. In order to solve the technical problem that the semantic logic self-consistency verification of a target object in a video stream is realized under the constraint condition of mobile terminal equipment (CPU, memory and limited power consumption) so that a visual analysis task can be completed based on an end-side lightweight model, a dynamic advertisement delivery method based on end-side AI real-time content analysis is provided. Disclosure of Invention The invention aims to provide a dynamic advertisement delivery method based on end-side AI real-time content analysis, and aims to realize real-time pixel-level fine-grained semantic understanding and logic self-consistency verification of video streams by utilizing an edge computing architecture and a mixed semantic analysis model. The method is executed at the terminal equipment side and comprises the following steps: Based on the inter-frame difference, constructing a lightweight scene monitoring model by adopting an optical flow method, continuously monitoring a currently played video stream, extracting a key frame from the video stream when the video stream meets a preset trigger condition, and executing main body detection segmentation on the key frame by adopting a main analysis model to determine the boundary of a main body in the key frame; Cutting out a main body close-up image from the key frame according to the boundary, adopting a mixed semantic analysis model to perform forward reasoning on the cut main body close-up image, and outputting semantic characterization data in parallel; And generating a commodity request with privacy protection based on the visual characteristic representation, sending the commodity request to an advertisement server, receiving returned recommended commodity data, and rendering an intelligent interaction floating layer on a video playing interface of the terminal equipment. Preferably, the judging of the preset triggering condition includes the following steps: The method comprises the steps of adopting the distribution of the amplitude and the direction of an optical flow field of inter-frame pixel displacement as input characteristics, calculating an optical flow statistical histogram of a current frame and a previous N frames, setting an optical flow amplitude threshold value and a direction entropy threshold value, carrying out joint judgment on the optical flow statistical histogram to detect whether a video picture is in a stable state or not and whether low motion blur exists or not in real time, adopting a lightweight focus tracking algorithm to record the screen residence frame number of the same target, and judging that a stable focus object exists when the residence frame number exceeds a preset threshold value. Preferably, the lightweight scene monitoring model is a convolutional neural network CNN with the layer number of 3, and the training and using steps include: And performing supervision training on the convolutional neur