CN-114580356-B - Text editing method and system
Abstract
The invention discloses a text editing method which comprises the steps of receiving a long text and analyzing chapter titles in the long text, obtaining a chapter list after analysis is completed, generating chapter structural data taking chapters as units, editing chapters by using the chapter structural data after the chapter structural data is generated, and exporting the edited long text. In addition, the invention also discloses a text editing system. The invention obtains the structured data by analyzing the long text and editing the chapter information according to the structured data, removes the non-text content in the chapter names, enables the chapter names to still work normally under the abnormal condition of the word numbers by monitoring the abnormal condition of the chapter word numbers, and can reconstruct the chapter structure according to the structured data to generate the text taking the set as the structural unit, thereby improving the editing efficiency of converting the existing text such as a novel or a script into a recording script and further improving the production efficiency and flexibility of producing the audio book.
Inventors
- ZHU FENGYUN
- FAN ZIYE
- YANG YAHUI
Assignees
- 大连即时智能科技有限公司
- 大连即时智能科技有限公司
Dates
- Publication Date
- 20260421
- Application Date
- 20220126
- Priority Date
- 20220126
Claims (6)
- 1. A text editing method, comprising: step 1, receiving a long text and analyzing chapter titles in the long text; Step 2, after analysis is completed, an analysis result and a chapter list are obtained, and chapter structural data taking chapters as units is generated according to the analysis result; step3, after generating chapter structured data, editing the chapter by using the chapter structured data; Step 4, after the editing operation is completed, exporting the edited long text; In the step 3, the editing operation of the chapter comprises adding, deleting, modifying, moving and searching operations, wherein the sequence number of the chapter is recalculated after the adding, deleting, modifying and moving operations are implemented on the chapter; wherein the operation of adding chapters includes dividing the content of the current chapter into different parts to form a new chapter, or creating a new chapter and adding new content therein; the editing operation further comprises a reconstruction operation, wherein the reconstruction operation is carried out on the chapter by using the chapter structural data; Reconstructing the structure of the chapter according to the chapter word number criterion, wherein the structural unit after the chapter reconstruction is a set; The chapter word number criterion comprises setting a word number range of each set, wherein the word number range comprises a minimum value and a maximum value; Starting from the end position of the current last set, searching the character start position and the character end position corresponding to the character number range backwards, and highlighting the text of the searched character number range on the interface; Selecting the ending position of the current set in the text in the searched word number range according to the content requirement, dividing a new next set, and starting to divide the ending position as the word starting point position of the new set backwards; When the text of the re-divided set comprises a plurality of chapter titles, selecting one chapter name as the set name of the set; in the step1, traversing the original text of the long text, matching the chapter title of the original text and analyzing the chapter serial number and the chapter name in the original text; Selecting a chapter title format template from a chapter title format template library, and analyzing a chapter sequence number and a chapter name of a long text according to the selected chapter title format template; Judging whether non-text content exists in the chapter names, and selecting to remove the non-text content; in the step 2, the chapter structure data comprises an original text sequence number, a sequence difference, a chapter name, chapter contents and chapter word numbers; When the situation that the sequence number is inconsistent with the original sequence number occurs, a difference exists between the sequence number and the original sequence number, the difference between the sequence number and the original sequence number is the sequence difference, and the sequence difference is used for judging whether a missing or redundant chapter exists or not; the chapter word number is a statistical result of the text word number in the chapter; in the step 4, the derived chapter serial number is an original text serial number or a sequence serial number, and the file format of the derived text comprises a file composed of a plurality of chapters or a file composed of a single chapter; The chapter of the derived text takes the chapter number and the title as the head line, and the file name of the derived text is determined according to the chapter number and the chapter name; when the chapter structure is reconstructed by taking the set as a structural unit, a text taking the set as the structural unit is exported, and the range of the set needing to be exported is selected before the export; the file format of the text with the set as the structural unit is derived, wherein the file format comprises a file composed of a plurality of sets or a single set forms a file, the set of the derived text is named as the top line by the set serial number and the set name, and the file name of the derived file is determined according to the set serial number and the set name.
- 2. The text editing method of claim 1, wherein, Judging whether the text in the original text is a chapter title or not by using a grammar-based matching method or a machine learning model classification method, wherein the grammar-based matching method comprises a regular expression; Extracting a chapter sequence number and a chapter name from a chapter title by using a grammar-based parsing method or a machine learning model prediction method, wherein the grammar-based parsing method comprises a regular expression; When the chapter title format template library does not have the chapter title format template which accords with the chapter title format template library, the user-defined regular expression is selected to be analyzed; the method comprises the steps of matching non-text content by using a grammar-based matching method or a machine learning model classification method, wherein the grammar-based matching method comprises a regular expression; Non-textual content is removed using a grammar-based replacement method or a machine learning model prediction method, wherein the grammar-based replacement method includes regular expressions.
- 3. The text editing method of claim 1, wherein, The method comprises the steps of carrying out modeling analysis on the number of chapter words by using a statistical model, judging whether the number of chapter words in a chapter is excessive or insufficient by using the statistical model, judging that the chapter has abnormal word number when the number of chapter words is excessive or insufficient, and carrying out highlighting display on the chapter judged to be abnormal word number so as to carry out chapter list screening and text editing based on the abnormal information.
- 4. The text editing system is characterized by comprising an analysis device, a data processor, an editor and a deriving device which are connected with each other in sequence; the analysis device receives the long text and analyzes the chapter title in the long text; the analysis device acquires an analysis result and a chapter list after completing analysis, and sends the analysis result to the data processor connected with the analysis result; The data processor sends the generated chapter structure data to the editor connected with the data processor, and the editor utilizes the chapter structure data to edit chapters; After the editor finishes the editing operation, the edited long text is exported by the exporting device connected with the editor; Editing the chapters comprises adding, deleting, modifying, moving and searching operations, wherein the chapter sorting sequence numbers are recalculated after the chapter adding, deleting, modifying and moving operations are implemented; wherein the operation of adding chapters includes dividing the content of the current chapter into different parts to form a new chapter, or creating a new chapter and adding new content therein; the editing operation further comprises a reconstruction operation, wherein the reconstruction operation is carried out on the chapter by using the chapter structural data; Reconstructing the structure of the chapter according to the chapter word number criterion, wherein the structural unit after the chapter reconstruction is a set; The chapter word number criterion comprises setting a word number range of each set, wherein the word number range comprises a minimum value and a maximum value; Starting from the end position of the current last set, searching the character start position and the character end position corresponding to the character number range backwards, and highlighting the text of the searched character number range on the interface; Selecting the ending position of the current set in the text in the searched word number range according to the content requirement, dividing a new next set, and starting to divide the ending position as the word starting point position of the new set backwards; When the text of the repartitioned set comprises a plurality of chapter titles, selecting one chapter name as the set name of the set, and sorting the repartitioned set to obtain a corresponding set serial number for each set, wherein the analysis device carries out traversing operation on the original text of the long text, matches the chapter titles of the original text and analyzes the chapter serial numbers and the chapter names in the original text; The analysis device comprises a chapter title format template library, wherein an editing user selects a chapter title format template from the chapter title format template library, and analyzes a chapter serial number and a chapter name of a long text according to the chapter title format template selected by the editing user; The analysis device judges whether the chapter names have non-text content or not, and the editing user selects and removes the non-text content; The chapter structure data comprises an original text sequence number, a sequence difference, a chapter name, chapter contents and chapter word numbers; When the situation that the sequence number is inconsistent with the original sequence number occurs, a difference exists between the sequence number and the original sequence number, the difference between the sequence number and the original sequence number is the sequence difference, and the sequence difference is used for judging whether a missing or redundant chapter exists or not; the chapter word number is a statistical result of the text word number in the chapter; The chapter serial number exported by the exporting device is an original text serial number or a sequencing serial number, and the file format of the exported text of the exporting device comprises a file composed of a plurality of chapters or a file composed of a single chapter; the chapter of the export text is exported by the export device by taking the chapter serial number and the title as the head line, and the file name of the export text is determined according to the chapter serial number and the chapter name; When the chapter structure is reconstructed by taking the set as a structural unit, the deriving device derives the text by taking the set as the structural unit and selects the range of the set to be derived before deriving; the deriving means derives a starting sequence number of the first set in the range of the specified set; the file format of the text which takes the set as a structural unit is derived by the deriving device, the file format comprises a file which consists of a plurality of sets or a single set forms a file, the set of the text which is derived by the deriving device is characterized in that the set sequence number and the set name are the first line, and the file name of the file which is derived by the deriving device is determined according to the set sequence number and the set name.
- 5. The text editing system of claim 4, wherein the parsing means determines whether text in the text is a chapter title using a grammar-based matching method or a machine learning model classification method; The analysis device extracts a chapter sequence number and a chapter name from a chapter title by using a grammar-based analysis method or a machine learning model prediction method, wherein the grammar-based analysis method comprises a regular expression; When the chapter title format template library does not have the chapter title format template which accords with the chapter title format template library, the user-defined regular expression is selected to be analyzed; The parsing device matches non-text content by using a grammar-based matching method or a machine learning model classification method, wherein the grammar-based matching method comprises a regular expression; The parsing apparatus removes non-text content using a grammar-based substitution method or a machine learning model prediction method, wherein the grammar-based substitution method includes a regular expression.
- 6. A text editing system as in claim 4 wherein the data processor uses a statistical model to model the number of chapter words, the statistical model is used to determine whether the number of chapter words in a chapter is too large or too small, and if the number of chapter words is too large or too small, it is determined that the chapter has abnormal number of words, and highlighting the chapter determined to have abnormal number of words to facilitate chapter list screening and text editing based on the abnormal information.
Description
Text editing method and system Technical Field The present invention relates to the field of text processing technologies, and in particular, to a text editing method and system. Background In the prior art, along with the development of technology and the diversification of information acquisition ways, digital media continuously impact traditional media, and the reading habit of the masses is continuously changed, so that the reading form of an audio reading material is promoted. The audio reading material is an electronic reading material taking sound as medium, and comprises audio news, audio novels and the like, and the audio reading material is crossed and distinguished from digital media and traditional media, and has unique advantages capable of meeting the requirements of various users. However, the inventor found that the production of the audio book in the prior art is not systematic and procedural, and especially the process of converting the existing text, such as a novel or script, into a recording script takes a lot of labor and time, resulting in low production efficiency and poor flexibility in producing the audio book. Disclosure of Invention Based on the above, in order to solve the technical problems in the prior art, a text editing method is specifically provided, which includes: step 1, receiving a long text and analyzing chapter titles in the long text; Step 2, after analysis is completed, an analysis result and a chapter list are obtained, and chapter structural data taking chapters as units is generated according to the analysis result; step3, after generating chapter structured data, editing the chapter by using the chapter structured data; and step 4, after the editing operation is completed, exporting the edited long text. In one embodiment, in step 1, traversing an original text of a long text, matching a chapter title of the original text, and analyzing a chapter number and a chapter name of the original text; Selecting a chapter title format template from a chapter title format template library, and analyzing a chapter sequence number and a chapter name of a long text according to the selected chapter title format template; and judging whether the non-text content exists in the chapter names, and selecting to remove the non-text content. In one embodiment, judging whether text in a text is a chapter title or not by using a grammar-based matching method or a machine learning model classification method, wherein the grammar-based matching method comprises a regular expression; Extracting a chapter sequence number and a chapter name from a chapter title by using a grammar-based parsing method or a machine learning model prediction method, wherein the grammar-based parsing method comprises a regular expression; When the chapter title format template library does not have the chapter title format template which accords with the chapter title format template library, the user-defined regular expression is selected to be analyzed; the method comprises the steps of matching non-text content by using a grammar-based matching method or a machine learning model classification method, wherein the grammar-based matching method comprises a regular expression; Non-textual content is removed using a grammar-based replacement method or a machine learning model prediction method, wherein the grammar-based replacement method includes regular expressions. In one embodiment, in step 2, the chapter structure data includes an original text number, a sequence difference, a chapter name, a chapter content, and a chapter word number; When the situation that the sequence number is inconsistent with the original sequence number occurs, a difference exists between the sequence number and the original sequence number, the difference between the sequence number and the original sequence number is the sequence difference, and the sequence difference is used for judging whether a missing or redundant chapter exists or not; The chapter content comprises text of one or more paragraphs, and the chapter word number is a statistical result of the text word number in the chapter. In one embodiment, a statistical model is used for modeling and analyzing the number of chapter words, the statistical model is used for judging whether the number of chapter words in a chapter is too large or too small, judging that the chapter has abnormal word number when the number of chapter words is too large or too small, and highlighting the chapter judged to be abnormal in word number so as to facilitate chapter list screening and text editing based on the abnormal information. In one embodiment, in step 3, the editing operation of the chapter comprises adding, deleting, modifying, moving and searching operations, wherein the sequence number of the chapter is recalculated after the adding, deleting, modifying and moving operations are implemented on the chapter; Wherein the operation of adding chapters includes dividing the conten