CN-121997934-A - Ancient book content integrity detection method and system based on semantic structural features
Abstract
The invention discloses a method and a system for detecting the integrity of ancient book contents based on semantic structural features, which relate to the technical field of information processing and comprise a data acquisition unit, a chapter integrity analysis unit, a sentence integrity analysis unit, an integrity comprehensive analysis unit, a result feedback unit and a display terminal; the structural features of the chapters and the section sentences are extracted accurately by utilizing the semantic structure recognition technology, and the integrity analysis model is combined to comprehensively detect the integrity of the ancient book contents from the aspects of the chapters and the section sentences, thereby effectively overcoming the defect that the traditional means rely on manual proofreading or simple page number comparison and keyword extraction, the invention realizes the automation and multi-layer analysis of the structural content and the semantic layer content integrity of the ancient books and improves the efficiency of detecting and identifying the content integrity of the ancient books.
Inventors
- CHEN XIANG
- HE JUNHUI
- FANG HAO
- ZHANG WENHAO
- DU WENJING
- YU HONGJI
Assignees
- 四川农业大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260123
Claims (6)
- 1. The ancient book content integrity detection system based on the semantic structural features comprises a data acquisition unit, a chapter integrity analysis unit, a clause integrity analysis unit, an integrity comprehensive analysis unit, a result feedback unit and a display terminal, wherein the data acquisition unit is used for extracting chapter structural feature data information and clause structural feature data information of the current ancient book text according to indexed current ancient book text content and combining a semantic structural feature recognition technology, and respectively transmitting the chapter structural feature data information and the clause structural feature data information to the chapter integrity analysis unit and the clause integrity analysis unit; The sentence integrity analysis unit is used for receiving the sentence structural feature data information, analyzing and processing the sentence content integrity, generating a high-level signal of the ancient book paragraph and sentence content integrity, a medium-level signal of the ancient book paragraph and sentence content integrity, a poor-level signal of the ancient book paragraph and sentence content integrity and the like, and sending the signals to the integrity comprehensive analysis unit; The integrity comprehensive analysis unit is used for carrying out integral comprehensive analysis processing on the received corresponding section integral level judgment signals and corresponding section integral level judgment signals to generate a primary comprehensive integrity feedback signal, a secondary comprehensive integrity feedback signal and a tertiary comprehensive integrity feedback signal according to the section integral level judgment signals and the section integral level judgment signals, and sending the primary comprehensive integrity feedback signal, the secondary comprehensive integrity feedback signal and the tertiary comprehensive integrity feedback signal to the result feedback unit; the result feedback unit is used for receiving the comprehensive integrity feedback signals of the corresponding levels, performing early warning analysis processing, and sending the comprehensive integrity feedback signals to the display terminal for display description in a text word description mode.
- 2. The ancient book content integrity detecting system based on semantic structural features according to claim 1, wherein: the section content integrity analysis processing comprises the following specific processes: carrying out semantic structure feature recognition on the content of the indexed current ancient book text, carrying out normalization processing on chapter page number comparison total difference values, total number of quotations missing and chapter keyword reproduction frequency of each chapter in the extracted chapter structure feature data information of the current ancient book text, analyzing the chapter integrity degree of each chapter of the current ancient book text, and determining chapter integrity coefficients of each chapter of the current ancient book text, wherein xzj i represents chapter integrity coefficients of i-th chapters in the current ancient book text, zym i 、nqs i and pfx i represent chapter page number comparison total difference values, total number of quotations missing and chapter keyword reproduction frequency of i-th chapters in the current ancient book text respectively, wherein/represents divisors, a1, a2 and a3 represent weight values of the chapter page number comparison total difference values, total number of quotations missing and chapter keyword reproduction frequency respectively, and specific numerical values are set by a person in the technical field; performing digital-analog analysis on the chapter integrity coefficients of each chapter of the current ancient book text, taking the numerical value of the chapter integrity coefficients as an ordinate, taking the chapter number as an abscissa, establishing a two-dimensional coordinate system, respectively marking the chapter integrity coefficients of each chapter of the current ancient book text in the two-dimensional coordinate system in an anchor point form, and establishing a chapter integrity anchor point marking map of the current ancient book text; Marking chapters corresponding to the chapter integrity coefficients at and above the reference line as complete chapters through a reference line preset in a two-dimensional coordinate system, counting the number of the complete chapters, marking chapters corresponding to the chapter integrity coefficients below the reference line as incomplete chapters, and counting the number of the incomplete chapters; Comparing and analyzing the number of the complete chapters with the number of the incomplete chapters to generate a complete level judgment signal of the corresponding chapters, wherein the specific comparison process comprises the following steps: If the number of the complete chapters is larger than the number of the incomplete chapters, generating a chapter advanced complete signal, if the number of the complete chapters is equal to the number of the incomplete chapters, generating a chapter integrity intermediate complete signal, and if the number of the complete chapters is smaller than the number of the incomplete chapters, generating a chapter low complete signal.
- 3. The ancient book content integrity detecting system based on semantic structural features according to claim 1, wherein: the specific process of analyzing and processing the paragraph and sentence content integrity comprises the following steps: Carrying out semantic structure feature recognition on the content of the indexed current ancient book text, carrying out normalization processing on the total number of segment topic deletions, the total number of time mismatching and the total number of term interpretation description deletions in the extracted segment structure feature data information of the current ancient book text, analyzing the segment integrity degree of the current ancient book text, and determining segment integrity coefficients of the current ancient book text, wherein xjd = (b1+b2+b3)/(b1× cbt +b2×ccp+b3× csy), wherein xjd represents the segment integrity coefficients of the current ancient book text, cbt, ccp and csy respectively represent the total number of segment topic deletions, the total number of time mismatching and the total number of term interpretation description deletions, wherein/represents divisors, b1, b2 and b3 respectively represent weight values of the total number of segment topic deletions, the total number of time mismatching and the total number of term interpretation description deletions, and the specific values are set by a person skilled in the technical field; presetting integrity gradient reference thresholds CZ1 and CZ2 of segment integrity coefficients of a current ancient book text, wherein CZ1 is smaller than CZ2, comparing and analyzing the segment integrity coefficients of the current ancient book text with the preset integrity gradient reference thresholds CZ1 and CZ2, and specifically comparing the segment integrity coefficients of the current ancient book text with the preset integrity gradient reference thresholds CZ1 and CZ2, wherein the specific comparison process comprises the following steps: Generating a superior signal of the integrity of the ancient book paragraph and sentence content if the integrity coefficient of the sentence of the current ancient book text is larger than the integrity gradient reference threshold CZ2, generating a medium signal of the integrity of the ancient book paragraph and sentence content if the integrity coefficient of the sentence of the current ancient book text is between the integrity gradient reference threshold CZ1 and CZ2, and generating a poor signal of the integrity of the ancient book paragraph and sentence content if the integrity coefficient of the sentence of the current ancient book text is smaller than the integrity gradient reference threshold CZ 1.
- 4. The ancient book content integrity detecting system based on semantic structural features according to claim 1, wherein: The specific process of the comprehensive analysis processing of the integrality of the chapters and the section sentences comprises the following steps: marking the chapter high-level complete signal, the chapter integrity middle-level complete signal and the chapter low-level complete signal as W1, W2 and W3 respectively according to the received corresponding chapter complete level judgment signals; Marking the signals with superior integrity of the ancient book paragraphs and sentences, medium integrity of the ancient book paragraphs and sentences, poor integrity of the ancient book paragraphs and sentences and the like as M1, M2 and M3 respectively according to the received corresponding sentence integrity level judging signals; The two types of labeling signals are integrated and analyzed, and the specific analysis process comprises the following steps: And if the two simultaneously acquired labeling signals have numbers of 1, generating a first-stage comprehensive integrity feedback signal, if the two simultaneously acquired labeling signals have numbers of 1, generating a second-stage comprehensive integrity feedback signal, and if the two simultaneously acquired labeling signals have no numbers of 1, generating a third-stage comprehensive integrity feedback signal.
- 5. The ancient book content integrity detecting system based on semantic structural features according to claim 1, wherein: The specific process of the early warning analysis processing comprises the following steps: When the received first-level comprehensive integrity feedback signal is received, generating a text word description content which is complete in chapter and sentence contents of the current ancient book text, directly used for subsequent learning and reference, and transmitting to a display terminal for display description; When a received secondary comprehensive integrity feedback signal is received, generating a text word description content which is 'incomplete chapter content or incomplete sentence content of the current ancient book text, temporarily not used for subsequent learning and reference', and sending to a display terminal for display description; when the three-level comprehensive integrity feedback signal is received, the generated text word description content is that the chapter and the sentence content of the current ancient book text are incomplete and cannot be used for subsequent learning and reference, and the generated text word description content is sent to a display terminal for display description.
- 6. The ancient book content integrity detection method based on the semantic structural features is applied to the ancient book content integrity detection system based on the semantic structural features as set forth in any one of claims 1 to 5, and is characterized by comprising the following steps: S1, extracting chapter structure feature data information and paragraph structure feature data information of a current ancient book text according to the indexed current ancient book text content and by combining a semantic structure feature recognition technology; s2, receiving chapter structure characteristic data information, and analyzing and processing chapter content integrity, so as to generate a chapter high-level complete signal, a chapter integrity medium-level complete signal and a chapter low-level complete signal; S3, receiving paragraph structure characteristic data information, and analyzing and processing the completeness of the paragraph and sentence content, so as to generate signals with superior completeness of the ancient book paragraph and sentence content, signals with medium completeness of the ancient book paragraph and sentence content, signals with poor completeness of the ancient book paragraph and sentence content and the like; S4, carrying out section and sentence integrity comprehensive analysis processing on the received section and sentence integrity level judgment signals and the received section and sentence integrity level judgment signals, and accordingly generating a primary comprehensive integrity feedback signal, a secondary comprehensive integrity feedback signal and a tertiary comprehensive integrity feedback signal; S5, receiving comprehensive integrity feedback signals of corresponding levels, performing early warning analysis processing, and sending the comprehensive integrity feedback signals to a display terminal for display description in a text word description mode.
Description
Ancient book content integrity detection method and system based on semantic structural features Technical Field The invention relates to the technical field of information processing, in particular to a method and a system for detecting the integrity of ancient book contents based on semantic structural features. Background Along with the deep fusion of digital technology and cultural protection, information processing technology, particularly natural language processing, is becoming an important force for promoting the intelligent inheritance of cultural heritage, ancient book documents are core objects in current digital personal, intelligent libraries and cultural learning reconstruction by virtue of unique historical value, language style and cultural inheritance status in a plurality of cultural data resources, however, to realize the deep utilization of ancient books, the requirements cannot be met obviously only by scanning archiving or keyword retrieval, and the structural integrity detection of content is becoming a key measurement index of high-quality digital ancient book resources gradually. However, the existing ancient book content integrity detection means still has obvious short plates in terms of semantic integrity assessment on the structural level of the ancient book content, the traditional methods often rely on manual proofreading or simple page number comparison and keyword extraction, are difficult to cope with the common problems of typesetting confusion, chapter and page missing, introduction omission and interpretation ambiguity in the ancient books, particularly are difficult to effectively judge whether one ancient book has complete learning and inheritance values, and more critical, under the situation of lacking semantic structure understanding, the connection relation between paragraphs and sentences, whether time logic is disordered and whether terms lack definition explanation are difficult to judge, so that comprehensive quality assessment of the content is difficult to realize, and the technical blanks directly limit intelligent utilization of the ancient book content, so that the application efficiency of digital documents in education, research and cultural propagation scenes is reduced. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a method and a system for detecting the integrity of ancient book contents based on semantic structural features, which solve the problems in the background art. The invention aims at realizing the technical scheme that the ancient book content integrity detection system based on the semantic structural characteristics comprises a data acquisition unit, a section integrity analysis unit, an integrity comprehensive analysis unit, a result feedback unit and a display terminal, wherein the data acquisition unit is used for extracting section structural characteristic data information and section structural characteristic data information of the current ancient book text according to the indexed current ancient book text content and combining with the semantic structural characteristic identification technology and respectively sending the section structural characteristic data information and the section structural characteristic data information to the section integrity analysis unit and the section integrity analysis unit, and is characterized in that the section integrity analysis unit is used for receiving the section structural characteristic data information, carrying out section content integrity analysis processing, and accordingly generating section high-level complete signals, section integrity medium-level complete signals and section low-level complete signals and sending the section high-level complete signals to the integrity comprehensive analysis unit; The sentence integrity analysis unit is used for receiving the sentence structural feature data information, analyzing and processing the sentence content integrity, generating a high-level signal of the ancient book paragraph and sentence content integrity, a medium-level signal of the ancient book paragraph and sentence content integrity, a poor-level signal of the ancient book paragraph and sentence content integrity and the like, and sending the signals to the integrity comprehensive analysis unit; The integrity comprehensive analysis unit is used for carrying out integral comprehensive analysis processing on the received corresponding section integral level judgment signals and corresponding section integral level judgment signals to generate a primary comprehensive integrity feedback signal, a secondary comprehensive integrity feedback signal and a tertiary comprehensive integrity feedback signal according to the section integral level judgment signals and the section integral level judgment signals, and sending the primary comprehensive integrity feedback signal, the secondary comprehensive integrity feedback signal and the tertiary