Search

KR-20260064317-A - ELECTRONIC DEVICE AND METHOD FOR DETECTING PERSONAL INFORMATION

KR20260064317AKR 20260064317 AKR20260064317 AKR 20260064317AKR-20260064317-A

Abstract

A method for detecting personal information performed by an electronic device is provided. The method may include the steps of: obtaining a first text of a first language; detecting personal information in the first text; translating the first text into a second language to obtain a second text; calculating a first score based on a pronunciation analysis of the personal information and the second text; calculating a second score based on a semantic analysis of the personal information and the second text; and detecting personal information elements corresponding to the personal information of the first text in the second text based on the first score and the second score.

Inventors

  • 이상호

Assignees

  • 삼성전자주식회사

Dates

Publication Date
20260507
Application Date
20241031

Claims (20)

  1. In a method for detecting personal information performed by an electronic device, Step of obtaining a first text of a first language; A step of detecting personal information from the first text above; A step of obtaining a second text by translating the first text into a second language; A step of calculating a first score based on the pronunciation analysis of the above personal information and the above second text; A step of calculating a second score based on semantic analysis of the above personal information and the above second text; and A method comprising the step of detecting personal information elements corresponding to the personal information of the first text in the second text based on the first score and the second score.
  2. In paragraph 1, The step of calculating the first score above is, A step of converting the above personal information and the above second text into the International Phonetic Alphabet (IPA); and A method comprising the step of calculating the first score representing the similarity between the IPA of the second text and the IPA of the personal information.
  3. In paragraph 2, The step of calculating the first score above is, A method comprising the step of applying weights to the similarity between the IPA of the second text and the IPA of the personal information based on a group of phonetic symbols grouped with similar phonetic symbols.
  4. In paragraph 1, The step of calculating the second score above is, A method comprising the step of calculating a second score representing the degree of association between the first text and the second text by applying an attention mechanism.
  5. In paragraph 4, The step of obtaining the above second text is, A step of translating the above first text into a third language which is an intermediate language; and A method comprising the step of translating a third text translated into the third language into the second language to obtain the second text.
  6. In paragraph 1, The above method is, A method further comprising the step of applying a weight corresponding to at least one of a language characteristic or a personal information characteristic to at least one of the first score or the second score.
  7. In paragraph 1, The above method is, A method further comprising the step of determining a priority between the pronunciation analysis and the semantic analysis by comparing at least one of the first score or the second score with a threshold value.
  8. In paragraph 1, The above method is, A step of obtaining user input selecting the second language; and A method further comprising the step of applying settings for performing pronunciation analysis and semantic analysis based on identification information of the first language and the second language.
  9. In paragraph 8, The above method is, A method further comprising the step of determining whether to perform the pronunciation analysis based on identification information of the first language and the second language.
  10. In paragraph 1, The above method is, A method further comprising the step of determining the protection level of a document containing the second text based on personal information elements detected from the second text.
  11. In electronic devices, Communication interface; At least one processor; and It includes memory for storing instructions, By executing the above instructions by the at least one processor, the electronic device, Obtain the first text of the first language, and Detecting personal information from the above first text, and The above first text is translated into a second language to obtain a second text, and Calculate a first score based on the pronunciation analysis of the above personal information and the above second text, and Calculate a second score based on the semantic analysis of the above personal information and the above second text, and An electronic device that detects personal information elements corresponding to the personal information of the first text in the second text based on the first score and the second score.
  12. In Paragraph 11, By executing the above instructions by the at least one processor, the electronic device, Convert the above personal information and the above second text into the International Phonetic Alphabet (IPA), and An electronic device that calculates the first score indicating the similarity between the IPA of the second text and the IPA of the personal information.
  13. In Paragraph 12, By executing the above instructions by the at least one processor, the electronic device, An electronic device that applies weights to the similarity between the IPA of the second text and the IPA of the personal information based on a group of phonetic symbols that groups similar phonetic symbols.
  14. In Paragraph 11, By executing the above instructions by the at least one processor, the electronic device, An electronic device that calculates a second score representing the degree of association between the first text and the second text by applying an attention mechanism.
  15. In Paragraph 11, By executing the above instructions by the at least one processor, the electronic device, Translate the above first text into a third language, which is an intermediate language, and An electronic device that obtains the second text by translating the third text translated into the third language into the second language.
  16. In Paragraph 11, By executing the above instructions by the at least one processor, the electronic device, An electronic device that applies a weight corresponding to at least one of a language characteristic or a personal information characteristic to at least one of the first score or the second score.
  17. In Paragraph 11, By executing the above instructions by the at least one processor, the electronic device, An electronic device that determines the priority between the pronunciation analysis and the semantic analysis by comparing at least one of the first score or the second score with a threshold value.
  18. In Paragraph 11, By executing the above instructions by the at least one processor, the electronic device, Obtaining user input selecting the above second language, An electronic device that applies settings for performing pronunciation analysis and semantic analysis based on identification information of the first language and the second language.
  19. In Paragraph 18, By executing the above instructions by the at least one processor, the electronic device, An electronic device that determines whether to perform the pronunciation analysis based on identification information of the first language and the second language.
  20. A computer-readable recording medium having a program for executing the method of any one of paragraphs 1 through 10 on a computer.

Description

ELECTRONIC DEVICE AND METHOD FOR DETECTING PERSONAL INFORMATION The present disclosure relates to an electronic device and method for detecting personal information elements within text using a single cross-language model capable of detecting personal information in several different languages. As interest in and the importance of personal information protection increase, regulations to safeguard it are continuously being introduced, and various technologies for this purpose are being developed simultaneously. In particular, the importance of technologies capable of accurately detecting personal information within text is growing. However, since personal information is expressed differently depending on the language, problems arise where detection becomes difficult or new detection methods must be applied every time the language changes. This inefficiency becomes even more severe as the scale of data increases. Therefore, there is a growing need for methods that can efficiently detect the same personal information across various languages, regardless of the specific language. FIG. 1 is a diagram illustrating, by way of example, an operation in which an electronic device according to one embodiment of the present disclosure detects cross-language personal information. FIG. 2 is a flowchart illustrating the operation of an electronic device detecting personal information according to one embodiment of the present disclosure. FIG. 3 is a diagram illustrating the operation of an electronic device detecting personal information according to one embodiment of the present disclosure. FIG. 4 is a diagram illustrating the operation of a pronunciation analysis module of a cross-language model according to one embodiment of the present disclosure. FIG. 5 is a diagram illustrating the operation of a semantic analysis module of a cross-language model according to one embodiment of the present disclosure. FIG. 6 is a diagram illustrating the operation of an electronic device according to one embodiment of the present disclosure to detect personal information by combining pronunciation analysis and semantic analysis. FIG. 7 is a diagram illustrating an example of an operation in which an electronic device according to one embodiment of the present disclosure detects personal information across languages. FIG. 8 is a diagram illustrating an example of an operation in which an electronic device according to one embodiment of the present disclosure detects personal information across languages. FIG. 9 is a flowchart illustrating the operation of an electronic device according to one embodiment of the present disclosure applying settings for personal information detection. FIG. 10 is a flowchart illustrating the operation of an electronic device detecting personal information according to one embodiment of the present disclosure. FIG. 11 is a block diagram illustrating the configuration of an electronic device according to one embodiment of the present disclosure. FIG. 12 is a block diagram illustrating the configuration of a server according to one embodiment of the present disclosure. The terms used in this specification will be briefly explained, and the present disclosure will be described in detail. In the present disclosure, the expression "at least one of a, b, or c" may refer to "a," "b," "c," "a and b," "a and c," "b and c," "all of a, b, and c," or variations thereof. The terms used in this disclosure have been selected to be as widely used and general as possible, taking into account their functions within this disclosure; however, these terms may vary depending on the intent of those skilled in the art, case law, the emergence of new technologies, etc. Additionally, in specific cases, terms have been selected at the applicant's discretion, and in such cases, their meanings will be described in detail in the relevant explanatory sections. Therefore, terms used in this disclosure should be defined not merely by their names, but based on their meanings and the overall content of this disclosure. Singular expressions may include plural expressions unless the context clearly indicates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as generally understood by those skilled in the art as described in this specification. Additionally, terms including ordinal numbers, such as "first" or "second," used in this specification may be used to describe various components, but said components should not be limited by said terms. Such terms are used solely for the purpose of distinguishing one component from another. When a part of a specification is described as "comprising" a certain component, this means that, unless specifically stated otherwise, it does not exclude other components but may include additional components. Furthermore, terms such as "part" or "module" as used in the specification refer to a unit that processes at least one function or operation, and this may be im