CN-121366048-B - Intelligent identification system and method for preventing bill repetition
Abstract
The invention discloses an intelligent recognition system and method for preventing bill repetition, which relate to the technical field of data recognition, and the invention is characterized in that a time stamp is added and a historical reimbursement behavior is constructed by collecting digital signature and text information of an electronic bill; the method comprises the steps of extracting writing characteristics of a digital signature to construct a bill to be detected, screening the bill to be detected in a preset interval according to a time stamp, calculating writing characteristic similarity, marking suspected repeated identification bills, sequencing the suspected repeated bills to form a sequence to be detected, comparing the sequence with a historical reimbursement sequence, marking to pass if a consistent sequence exists, otherwise, automatically reimburseing the first bill, transferring the rest to manual auditing, collecting manual auditing results, updating corresponding bill reimbursement historical records, and updating a data acquisition construction module, a data characteristic extraction module, a screening similarity judgment module, a sorting comparison processing module and a result record updating module.
Inventors
- ZHANG XIANGFEI
- ZHU CE
- Chen Beiqi
- Shen Tianjie
- HE TINGTING
- TONG SHAN
- JIANG YIFAN
Assignees
- 上海市大数据中心
Dates
- Publication Date
- 20260505
- Application Date
- 20251219
Claims (9)
- 1. The intelligent identification method for preventing the bill from being repeated is characterized by comprising the following steps of: s1, acquiring digital signatures and text information of electronic notes, and attaching a time stamp and historical reimbursement behaviors, wherein the historical reimbursement behaviors are all note reimbursement sequences of the same digital signature; The specific steps of the step S1 are as follows: s1-1, acquiring a digital signature and text information carried by an electronic bill, wherein the text information comprises all text information on the electronic bill; S1-2, generating a time stamp according to the real-time submitted by the electronic bill, and binding and associating the time stamp with the digital signature and text information of the corresponding electronic bill to form a preliminary data set; s1-3, extracting all past bill reimbursement data corresponding to a certain digital signature, and finishing according to reimbursement time sequence to form a complete bill reimbursement sequence to construct historical reimbursement behaviors; s2, cleaning and standardizing the acquired digital signature and text information, extracting writing characteristics of the digital signature, and constructing a bill to be detected; s3, extracting all the to-be-detected notes with the time stamp not exceeding a preset range, collecting all writing characteristics, and marking the to-be-detected notes with the similarity larger than a preset threshold value as suspected repeated identification notes; S4, extracting all suspected repeated identification notes, classifying according to writing characteristics, sorting the suspected repeated identification notes according to time stamps to form a sequence to be detected, inquiring whether a note reimbursement sequence identical to the sequence to be detected exists in historical reimbursement behaviors, if so, marking the past, and if not, reimburseing the first suspected repeated identification note, and converting the subsequent suspected repeated identification notes into manual auditing; and S5, collecting a manual auditing result, and updating a corresponding bill reimbursement historical record.
- 2. The intelligent identification method for preventing the bill from being repeated according to claim 1, wherein the specific steps of the step S2 are as follows: S2-1, the digital signature is regulated according to a format preset by a system, and text information is uniformly converted into a preset format; s2-2, extracting writing characteristics from the regulated digital signature, wherein the writing characteristics comprise stroke logic and structural layout of the signature; And S2-3, associating and integrating the extracted writing characteristics and the converted text information with a time stamp and historical reimbursement behaviors to form a bill to be detected.
- 3. The intelligent identification method for preventing the bill from being repeated according to claim 2, wherein the specific steps of the step S3 are as follows: S3-1, extracting a certain bill to be detected, constructing a preset time interval of the certain bill to be detected according to a time stamp and a time stamp screening range preset by a system, and extracting all the bills to be detected, of which the time stamps fall into the preset time interval, from all the bills to be detected to form a temporary bill set to be detected; s3-2, extracting the digital signature writing characteristics of each bill to be detected one by one from the temporary bill set to be detected, and forming a temporary writing characteristic set by all the extracted writing characteristics; And S3-3, carrying out pairwise comparison calculation on all writing features in the temporary writing feature set by adopting a preset feature similarity calculation method to obtain similarity values between every two groups of writing features, comparing the calculated similarity values with a preset threshold value of a system, and if the similarity values of the two groups of writing features are larger than the preset threshold value of the system, marking to-be-detected notes associated with the two groups of features as suspected repeated identification notes together, and summarizing and sorting all the information marked as the suspected repeated identification notes to form a suspected repeated note list.
- 4. The intelligent identification method for preventing bill repetition according to claim 3, wherein the specific steps of the step S4 are as follows: S4-1, classifying all suspected repeated identification notes according to the extracted writing characteristics, and classifying the suspected repeated identification notes with the writing characteristics completely consistent to the same category; S4-2, sequencing suspected repeated identification notes in each category according to the sequence of the bound time stamp information, and associating and binding the digital signature of each suspected repeated identification note, the standardized text information and the sequence to be detected to form the sequence to be detected of each category; And S4-3, extracting a bill reimbursement sequence in the historical reimbursement behaviors associated with the digital signature corresponding to the sequence to be detected, comparing the sequence to be detected with the extracted historical reimbursement sequence, marking all suspected repeated identification bills of the category as passing if the comparison result is that the bill reimbursement sequence is completely consistent, automatically reimburseing the suspected repeated identification bills with the first sorting order if the comparison result is that the bill reimbursement sequence is not completely consistent, transferring the rest bills into a manual auditing process, and recording the comparison result and corresponding bill association information.
- 5. The intelligent identification method for preventing bill repetition according to claim 4, wherein the specific steps of the step S5 are as follows: S5-1, collecting comparison results generated by a manual auditing process, and summarizing the comparison results, the digital signature bound with the bill, standardized text information and a time stamp to form a record document; And S5-2, updating the bill reimbursement historical record corresponding to the digital signature according to the recorded document.
- 6. An intelligent recognition system for preventing bill repetition, which is applied to the intelligent recognition method for preventing bill repetition according to any one of claims 1-5, is characterized in that the intelligent recognition system comprises a data acquisition construction module, a data feature extraction module, a screening similarity judgment module, a classification comparison processing module and a result record updating module; The system comprises a data acquisition construction module, a data characteristic extraction module, a screening similarity judgment module, a classification comparison processing module and a result record updating module, wherein the data acquisition construction module is used for acquiring digital signatures and text information of electronic notes and generating timestamps, simultaneously constructing historical reimbursement behaviors in all note reimbursement sequence forms corresponding to the same digital signature, the data characteristic extraction module is used for cleaning and standardizing the acquired digital signatures and text information, extracting writing characteristics of the digital signatures, correlating the timestamps with the historical reimbursement behaviors to construct notes to be detected, the screening similarity judgment module is used for screening notes in a preset time interval according to the timestamps of the notes to be detected, marking suspected reignition notes by calculating the writing characteristic similarity, the classification comparison processing module is used for classifying the suspected reignition notes according to the writing characteristics, sorting according to the timestamps to form a to-be-detected sequence, and correspondingly processing the to-be-detected sequence after comparison with the historical reimbursement sequence, and the result record updating module is used for collecting comparison results generated by manual auditing flows, summarizing results and note information, and updating reimbursement history records of notes corresponding to the digital signatures; The output end of the data acquisition and construction module is electrically connected with the input end of the data characteristic extraction module, the output end of the data characteristic extraction module is electrically connected with the input end of the screening similarity judging module, the output end of the screening similarity judging module is electrically connected with the input end of the classification comparison processing module, and the output end of the classification comparison processing module is electrically connected with the input end of the result record updating module.
- 7. The intelligent recognition system for ticket repetition prevention of claim 6, wherein: the data acquisition construction module comprises an information acquisition unit and a historical behavior construction unit; The information acquisition unit is used for acquiring digital signatures of the electronic bill and text information formed by all characters on the bill, generating a time stamp associated with the bill binding according to the real-time of bill submission, and the history behavior construction unit is used for extracting all past bill reimbursement data corresponding to a certain digital signature and finishing the past bill reimbursement data according to reimbursement time sequence to form a complete bill reimbursement sequence.
- 8. The intelligent recognition system for ticket repetition prevention of claim 6, wherein: the data characteristic extraction module comprises a data normalization unit and a characteristic extraction unit; the feature extraction unit is used for extracting stroke logic and structural layout writing features from the regular digital signature, and associating and integrating the stroke logic and structural layout writing features with the converted text information, the time stamp and the historical reimbursement behavior to form a bill to be detected; the screening similarity judging module comprises a time screening unit and a similarity judging unit; the time screening unit is used for constructing a preset time interval according to the time stamp of a certain bill to be detected and the time stamp screening range preset by the system, extracting the bills with the time stamps falling into the interval from all the bills to be detected to form a temporary bill set to be detected, and the similarity judging unit is used for carrying out pairwise comparison calculation on the writing characteristics of all the bills in the temporary bill set to be detected by adopting a preset characteristic similarity calculating method, and marking suspected repeated identification bills and summarizing after comparing the similarity value with a preset threshold value.
- 9. The intelligent recognition system for ticket repetition prevention of claim 6, wherein: The classification comparison processing module comprises a bill classification ordering unit and a comparison result processing unit; The bill classifying and sorting unit is used for classifying all suspected repeated identification bills according to the extracted writing characteristics, classifying bills with consistent writing characteristics into the same category, sorting bills in the category according to time stamps and forming a to-be-detected sequence according to related information; the result record updating module comprises a result collecting unit and a record updating unit; the result collection unit is used for collecting bill comparison results generated by the manual auditing process, summarizing the results, the digital signature, the standardized text information and the time stamp bound with the bill to form a record document, and the record updating unit is used for updating the bill reimbursement history record of the digital signature corresponding to the bill according to the record document formed by summarizing.
Description
Intelligent identification system and method for preventing bill repetition Technical Field The invention relates to the technical field of big data analysis, in particular to an intelligent identification system and method for preventing bill repetition. Background Under the dual drive of digital transformation and deepening of enterprise cross-regional operation, the bill is used as a core certificate for financial reimbursement and compliance audit, and the circulation scene is increasingly complex. The popularization of the electronic invoice solves the problem of low efficiency of paper bill transfer, but the problem that the same bill is submitted simultaneously in a plurality of sub-departments possibly occurs, so that the system judges that the bill is repeatedly reimbursed for a plurality of times, and the manual examination is required to be carried out for a plurality of times, thereby reducing the examination efficiency of the bill. Disclosure of Invention The invention aims to provide an intelligent identification system and method for preventing bill repetition, which are used for solving the problems in the prior art. In order to achieve the purpose, the invention provides the following technical scheme that the intelligent identification method for preventing the bill from being repeated comprises the following steps: s1, acquiring digital signatures and text information of electronic notes, and attaching a time stamp and historical reimbursement behaviors, wherein the historical reimbursement behaviors are all note reimbursement sequences of the same digital signature; s2, cleaning and standardizing the acquired digital signature and text information, extracting writing characteristics of the digital signature, and constructing a bill to be detected; s3, extracting all the to-be-detected notes with the time stamp not exceeding a preset range, collecting all writing characteristics, and marking the to-be-detected notes with the similarity larger than a preset threshold value as suspected repeated identification notes; S4, extracting all suspected repeated identification notes, classifying according to writing characteristics, sorting the suspected repeated identification notes according to time stamps to form a sequence to be detected, inquiring whether a note reimbursement sequence identical to the sequence to be detected exists in historical reimbursement behaviors, if so, marking the past, and if not, reimburseing the first suspected repeated identification note, and converting the subsequent suspected repeated identification notes into manual auditing; and S5, collecting a manual auditing result, and updating a corresponding bill reimbursement historical record. Further, the specific steps of the step S1 are as follows: s1-1, acquiring a digital signature and text information carried by an electronic bill, wherein the text information comprises all text information on the electronic bill; S1-2, generating a time stamp according to the real-time submitted by the electronic bill, and binding and associating the time stamp with the digital signature and text information of the corresponding electronic bill to form a preliminary data set; And S1-3, extracting all past bill reimbursement data corresponding to a certain digital signature, and finishing the data according to reimbursement time sequence to form a complete bill reimbursement sequence to construct historical reimbursement behaviors. Further, the specific steps of the step S2 are as follows: S2-1, the digital signature is regulated according to a format preset by a system, and text information is uniformly converted into a preset format; s2-2, extracting writing characteristics from the regulated digital signature, wherein the writing characteristics comprise stroke logic and structural layout of the signature; And S2-3, associating and integrating the extracted writing characteristics and the converted text information with a time stamp and historical reimbursement behaviors to form a bill to be detected. Further, the specific steps of the step S3 are as follows: S3-1, extracting a certain bill to be detected, constructing a preset time interval of the certain bill to be detected according to a time stamp and a time stamp screening range preset by a system, and extracting all the bills to be detected, of which the time stamps fall into the preset time interval, from all the bills to be detected to form a temporary bill set to be detected; s3-2, extracting the digital signature writing characteristics of each bill to be detected one by one from the temporary bill set to be detected, and forming a temporary writing characteristic set by all the extracted writing characteristics; And S3-3, carrying out pairwise comparison calculation on all writing features in the temporary writing feature set by adopting a preset feature similarity calculation method to obtain similarity values between every two groups of writing features, comparing