CN-122018708-A - Pinyin supplementing labeling system and method for distinguishing machine/product, seven, rare/Western six-tone in Chinese pronunciation
Abstract
The invention relates to the field of language and word information processing, in particular to a supplementary marking system, an input method, electronic equipment and a computer readable storage medium for carrying out unique distinction on three groups of different-shaped phonemes of machine/product, odd/seven and rare/Western through machine readable symbols in a Chinese phonetic system, aiming at providing a phonetic supplementary marking system, which can carry out unique distinction on the pronunciation of product, seven and Western through a recorded symbol (j.q.x) of a prefix point and an original letter on the premise of keeping the original spelling of a Chinese phonetic scheme, wherein the unique distinction is carried out on the pronunciation of the product, seven and Western through a recorded symbol (j.q.x) of the original letter, and can achieve the purpose of carrying out blind measurement on news of 10 ten thousand sentences.
Inventors
- HUO LIYUAN
Assignees
- 霍立远
Dates
- Publication Date
- 20260512
- Application Date
- 20260115
Claims (8)
- 1. A pinyin supplemental tagging system for distinguishing "machine/product, odd/seven, rare/west" six tones in chinese pronunciation, comprising: The mapping module is used for establishing and storing the following unique mapping relation: "product" pronunciation ≡→ →.j, Seven-piece pronunciation the method comprises the steps of (i) carrying out (i) to (q), "West" pronunciation ≡→· x; The coding module is used for embedding the mapping relation into the electronic text in a UTF-8 coding form; The analysis module is used for identifying and extracting the supplementary annotation in millisecond time through a regular expression "/[ jqx ]/"; A voice synthesis module for calling the voice library units corresponding to j, q and x respectively according to the analysis result and outputting a voice waveform consistent with the target pronunciation; and the voice recognition post-processing module is used for preferentially selecting candidate words containing j, q and x in the pinyin field when the homonym candidate is received so as to improve the recognition accuracy.
- 2. The system of claim 1, wherein the mapping relationship is stored in JSON table form, keys are "·j, ·q, ·x", and values are corresponding international phonetic symbols, vowel tone feature vectors, and example word lists.
- 3. The system of claim 1 or 2, further comprising an input method skin that displays "·j ·q ·x" miniature labels on the right side of the original pinyin in a candidate window for a user to type in a key.
- 4. The pinyin supplement labeling method is characterized by comprising the following steps of: S1, receiving an original pinyin character string input by a user; s2, judging whether the current character string belongs to syllables of ji, qi and xi; s3, if the sound belongs to the sound, popping up a 'machine/product, odd/seven, rare/western' sound selection floating window to the user; S4, receiving target pronunciation of user click or voice confirmation; s5, when the target pronunciation is 'product, seven and western', automatically inserting U+00B7 intermediate point symbols before j, q and x to generate · j, ·q and ·x; s6, writing the pinyin characters with the supplementary notes back to a text buffer area, and synchronously writing a hidden phoneme label (phone).
- 5. The method according to claim 4, wherein the floating window in step S3 predicts default pronunciation according to context by using a machine learning model, and automatically screens pronunciation with prediction confidence not less than 95%, without secondary confirmation by the user.
- 6. A computer readable storage medium having stored thereon a computer program which when executed by a processor realizes the steps of the method of claim 4 or 5.
- 7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the functions of the system or method of any one of claims 1-5 when the program is executed.
- 8. The electronic device of claim 7, wherein the device is an electronic dictionary, a smart box, a smart phone, an in-vehicle voice assistant, an online translator, or an external chinese teaching tablet.
Description
Pinyin supplementing labeling system and method for distinguishing machine/product, seven, rare/Western six-tone in Chinese pronunciation Technical Field The invention relates to the field of language and word information processing, in particular to a supplementary labeling system, an input method, electronic equipment and a computer readable storage medium for uniquely distinguishing three groups of homonymous different phonemes of 'machine/product, odd/seven and rare/Western' through machine readable symbols in a Chinese pinyin system. Background In the current 'Chinese phonetic alphabet scheme', the initial consonants j, q and x correspond to two groups of different phonemes of 'machine, odd, rare' and 'product, seven and Western'. For example, the former sound is j machine, the latter sound is j product, i.e. body, collective, note, age, emergency, disease, city, future, combination, joint, frightening, delicacy, near, approach, gold pulse, jin pulse, rescue, just in the market, worry, clean, giant dragon, gathering, good wine, street, hijack, abstinence bar, receipt for a loan, rape, frying, entanglement, knot, see people, base person, etc. The second one is q odd, the second one is q seven, which is true, seventy, expects, seven generations, riding, seven stars, chessman, wife, duty, gunshot, light wind, breeze, toppling over, qingdao, light work, clear womb, easy, sense, zheng, strive, interesting, dive, shallow water, seeking for the day, autumn, poultry, family, lead tin, migration, bridge, small, insect, maggot and so on. (III) the former sound is x rare, the latter sound is x Western, hope, joss, box, auspicious, union, by sheer luck, surname, porridge, week, clouds, stars, delight, must, effective, laugh, blood, snownight, rest and maintenance, vamp, inclined plane, joke with, fine speaking, salted fish, fresh fish, dawn, small, surprise, fine wash, scorpion, wedge, etc. Because the spelling forms are completely the same, under the scenes of speech recognition (ASR), speech synthesis (TTS), machine translation, chinese teaching, audio reading, ancient book reciting, dialect protection and the like, the system can not judge the target pronunciation only by the pinyin text, which results in: 1. The voice synthesis is carried out by 'Zhang guan Li wear'; 2. after voice recognition, ambiguous field explosion is processed, and the accuracy is reduced by 3-7%; 3. additional mouth and ear demonstration is needed for teaching the external Chinese language, so that the learning cost is increased; 4. When ancient books, dialects, poems and the like need to keep historical pronunciation, a writing marking means is lacked. The prior art attempts to solve the problems by means of numeric suffixes (1) and (2), tone superscript variants, international phonetic symbols (IPA) side notes and the like, but all break down the keyboard friendly principle of phonetic alphabets of syllable-letter strings, and cannot be accepted by mainstream input methods, coding tables, word stock and Unicode planes in a seamless manner, so that the industry is difficult to land. Disclosure of Invention The invention aims to provide a pinyin supplementary marking system with zero learning cost, full platform compatibility, machine reading and expandability, which uses a Unicode recorded symbol (j.q.x) of prefix point and original letter to uniquely distinguish product, seven and western pronunciation on the premise of keeping the original spelling of Chinese pinyin scheme, thereby realizing the following steps: a) Human-readable-see · j, known read "product"; b) The machine can analyze-regular one-time capture without dictionary; c) The keyboard can be used for inputting, namely, a Chinese input method is used for directly typing in 'dian +j'; d) Downward compatibility-no point form defaults to read "machine/odd/rare", stock text does not need to be modified; e) Extensible-the same symbology can cover more different phones. 1. A pinyin supplemental tagging system for distinguishing "machine/product, odd/seven, rare/west" six tones in chinese pronunciation, comprising: The mapping module is used for establishing and storing the following unique mapping relation: "product" pronunciation ≡→ →.j, Seven-piece pronunciation the method comprises the steps of (i) carrying out (i) to (q), "West" pronunciation ≡→· x; The coding module is used for embedding the mapping relation into the electronic text in a UTF-8 coding form; The analysis module is used for identifying and extracting the supplementary annotation in millisecond time through a regular expression "/[ jqx ]/"; A voice synthesis module for calling the voice library units corresponding to j, q and x respectively according to the analysis result and outputting a voice waveform consistent with the target pronunciation; and the voice recognition post-processing module is used for preferentially selecting candidate words containing j, q and x in the pinyin field when the homonym candida