KR-20260067880-A - System and method for automatically generating network diagrams through document-based data extraction
Abstract
A system and method for automatically generating a network diagram through document-based data extraction are disclosed. A system for automatically generating a network diagram according to one embodiment includes a data extraction module that extracts network components and connection information from an input document, and a network diagram generation module that structures the extracted information to generate a network diagram.
Inventors
- 방혁준
- 임정환
Assignees
- 쿤텍 주식회사
Dates
- Publication Date
- 20260513
- Application Date
- 20241106
Claims (4)
- A system for automatically extracting network configuration information from a document and generating a diagram, characterized by comprising a data extraction module that extracts network components and connection information from an input document, and a network diagram generation module that structures the extracted information to generate a network diagram.
- In Article 1, A network diagram automatic generation system characterized in that the data extraction module includes a text-based data extraction module and an image-based data extraction module, and the text-based data extraction module extracts network device information from text using regular expressions and natural language processing (NLP) technology.
- In Article 1, A network diagram automatic generation system characterized by the above-described image-based data extraction module detecting network device and connection information from an image using OCR and deep learning-based object detection models.
- In Article 1, A network diagram automatic generation system characterized by the above-mentioned network diagram generation module generating structured data in the form of JSON or XML based on extracted network information and visualizing it as a network diagram.
Description
System and method for automatically generating network diagrams through document-based data extraction The present invention proposes a system and method that automatically collect network configuration information using document-based data extraction technology and automatically generate a network diagram based thereon, thereby enabling efficient network configuration. The information required to construct network diagrams is scattered across various formats, such as documents, spreadsheets, and databases, making it time-consuming and labor-intensive to collect and visualize it manually. Problems with existing network diagram creation methods include the inefficiency of manual input, the difficulty of maintenance due to data changes, and the challenge of integrated management due to the diversification of data sources. Accordingly, the present invention proposes an automated network diagram generation system through automated data collection, automatic visualization, and real-time updates, with the aim of enhancing the efficiency of network management and promoting the delivery of accurate information and the speed of system operation by automatically generating network diagrams through document-based data extraction. FIG. 1 is a configuration diagram of a system for automatically generating network diagrams through document-based data extraction according to an embodiment of the present invention. FIG. 2 is a flowchart for automatic generation of a network diagram through document-based data extraction according to an embodiment of the present invention. FIG. 1 is a configuration diagram of a system for automatically generating a network diagram through document-based data extraction according to an embodiment of the present invention, and FIG. 2 is a flowchart for automatically generating a network diagram through document-based data extraction according to an embodiment of the present invention. Document Input Section - Receives a document file containing network information. The document file can include various formats such as PDF, images, scanned files, and text files. Data Extraction Unit - A module that extracts data from an input document, consisting of a document format detection unit, an image analysis unit, and a text extraction unit. Document Format Detector - Analyzes the document received from the document input unit, classifies the document format into text, image, or scan file, and transmits the document to the format-specific processing unit. Image Analysis Unit - Extracts text data from images and scan files transmitted from the Document Format Detection Unit and transmits the text data to the Text Extraction Unit. Text Extraction Unit - Extracts network-related information from text data. Data Structuring Components - Network-related information extracted from the data extraction components network Network Diagram Generation Unit = Generates a network diagram from the structured format data in the Data Structuring Unit. The document input unit receives a document file containing network information. The document file can include various formats such as PDF, images, scanned files, and text files. The document input from the document input unit is transmitted to the document format detection unit of the data extraction unit. The document format detection unit detects three types of input files: text, image, and scanned file. Text files are distinguished by checking file extensions such as txt, md, doc, and docx, and whether the file content consists of pure text strings. Image files are usually stored in bitmap or vector formats and are distinguished by checking file extensions such as jpg, jpeg, png, bmp, tiff, and svg, and checking the file header. For example, JPEG has a header starting with FFD8, and PNG has a header starting with 89504E47. Although scanned files are saved in image format, they typically appear as scanned copies of documents or book pages; therefore, optical character recognition is used to distinguish scanned files containing text within the image. Depending on the format classified by the document format detection unit, text files are transmitted to the text extraction unit and image files to the image analysis unit. Text files classified by the document format detection unit are analyzed by the text extraction unit to extract network-related data. Network-related information is extracted from the structured text data of the text documents using regular expressions and natural language processing. Identify and extract network data such as IP addresses, subnets, MAC addresses, and port numbers using regular expressions. For example, IP addresses can be searched using a pattern such as \b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b. Natural language processing extracts phrases regarding device names or connection relationships from text. For example, in a sentence such as "Router A is connected to Switch B through port 24," "Router A", "Switch B", and connection information "