US-12619772-B2 - Sensitive information disclosure prediction system for social media users and method thereof
Abstract
The system comprises a feature extraction processor for extracting a set of features from user data; a sensitivity score calculation processor for calculating the sensitivity score using a machine learning technique; an SVM model for classifying the dataset; a tracking unit for monitoring the outbound data transfers made by a DLP agent and identifying the outgoing data transfers of different data types; a central processing unit for determining a first reputation score and destination entity for a first outbound data transfer to a specified recipient entity, determining a subsequent standing score, and determining a first violation of a DLP policy, thereby determining a second violation of the DLP policy; and a controlling unit for carrying out at least one of the reporting and/or remedial actions in response to the first, second, or both violations and generating alert for approving or denying a particular first or second outbound data transfer.
Inventors
- Geetha Raju
- Karthika Subbaraj
- S. Bose
- Poongodi Manoharan
Assignees
- Geetha Raju
- Karthika Subbaraj
- S. Bose
- Poongodi Manoharan
Dates
- Publication Date
- 20260505
- Application Date
- 20230824
Claims (20)
- 1 . A sensitive information disclosure prediction system for social media users, the system comprises: at least one hardware processor; and a cloud server platform memory having program instructions stored thereon executable by the at least one hardware processor that, when executed, direct the at least one hardware processor to: collect various facets of a data of a user, said data selected from a user-specific and profile-related information including personal preferences and a profile picture, and generate an information regarding interactions of social network sites (SNS) users including a visit duration, an information about users' friends, an activity data containing information that users shared and created on a website and a group data containing an information about each of a group user forms and takes part in; extract features and selecting the features thereby splitting a dataset based on a certain threshold value(s) of the features using an Extra-Trees ensemble method; calculate a sensitivity score of each of a split data; classify the dataset into a rank selected from extremely sensitive, very sensitive, moderately sensitive, low sensitivity, and very low sensitive using the sensitivity score of each of the split data; monitor outbound data transfers made by a computing device by a data loss prevention (DLP) agent running on the system and identify outgoing data transfers; determine a first reputation score based on classification and a destination entity for a first outbound data transfer to a specified recipient entity, where the first outbound data transfer is a first data type from a plurality of data types, and determine a second outbound information move in light of grouping and objective substance, wherein the second outbound information move is a subsequent information kind of a majority of information types, and determine a first violation of a DLP policy by comparing the first reputation score to a reputation threshold, where the first violation is determined when the first reputation score is lower than the reputation threshold thereby determining a second violation of the DLP policy by comparing a second reputation score to the reputation threshold, with the second violation being identified when the second reputation score is lower than the reputation threshold; and carry out at least one of reporting actions or remedial actions in response to one of a first, second, or both violations that are found, wherein carrying out at least one of the reporting actions or remedial actions comprises: a) generating an alert for approving or denying a particular first or second outbound data transfer; b) obtaining user input for approving or denying a particular outbound data transfer; and c) permitting a separate outbound information move when a client endorses a particular outbound information move when the user denies the respective outbound data transfer, and preventing that particular outbound data transfer.
- 2 . The system of claim 1 , wherein feature selection and cut-off points are randomized in the Extra-Trees ensemble method, wherein the feature selection is used for obtaining a feature vector to train a support vector machine (SVM) model.
- 3 . The system of claim 1 , wherein the at least one hardware processor updates the first reputation score of the destination entity for the first data type based on the first reputation score and update second reputation score of the destination entity based on the second reputation score for a second data type thereby communicating with a network community service for sharing the first reputation score for the first data type and the second reputation score for the second data type.
- 4 . The system of claim 1 , wherein the first reputation score and the second reputation score is determined upon assigning a certain first data type and second data type to the data of a particular first outbound data transfer and second outbound data transfer and connecting the information to the destination entity thereby calculating the first reputation score and the second reputation score using the data type and a target entity; and wherein calculation of the first reputation score and the second reputation score requires tracking a number of DLP policy violations that are discovered through prior data transfers to the destination entity, wherein the first reputation score and the second reputation score are calculated based on the data type being transferred to the destination entity and a tracked number of discovered violations.
- 5 . The system of claim 4 , wherein the at least one hardware processor is further directed to: calculate the first reputation score and the second reputation score upon determining whether the computing device is transmitting data to the destination entity for a first time; determine whether a desired data type is sent to the target entity for the first time with a specific outbound data transfer; determine whether an overall reputation score of the target entity is below a certain threshold; or determine whether the overall reputation score of the destination entity for a relevant data type is within a certain level.
- 6 . The system of claim 1 , wherein the data from each outbound data transfer is fed into one of several different categories, with each category representing a different type of data, wherein the user input is received from a client to characterize a majority of classifications, wherein one or more data-type reputation ratings is obtained for the destination entity for the plurality of data types, and an at least one overall reputation score of the destination entity from a network community service; and wherein an activity type is denied if a target public audience specified in a request falls outside of an access limits for the activity type, wherein an activity monitor provides a request response to a requestor indicating that the request is denied, in case the request is denied by the activity monitor, wherein the activity monitor does not provide a requesting party with a data entity with hen request is denied that falls under an indicated activity type.
- 7 . The system of claim 6 , wherein the at least one hardware processor is further directed to: receive a user nomination for the activity type, wherein the activity type is being a classification of user-related activities for which activity data is collected; allow the user to select from a predefined set of access restrictions an access restriction for accessing data entities of the activity type, wherein the access restriction defines a set of users and services that is capable of accessing data entities belonging to a respective activity type, wherein each of the data entities comprising a data record of an activity associated with the user, wherein the access restriction configured with a filter that filters a subset of the data entities associated with the activity type; and configure the activity monitor to grant access to data entities of the activity type.
- 8 . The system of claim 6 , wherein the at least one hardware processor is further directed to: allow the user to define the activity type, and provides a user interface (UI) element configured to allow the user to select at least one activity type; permit an automatic generation and formulation of a privacy profile, wherein the privacy profile showing suggested access limits for each selected activity category; and generate the privacy profile upon obtaining an activity record that corresponds to the data entity and evaluate the activity data to find a link thereby determining the privacy profile.
- 9 . The system of claim 1 , wherein the generating an alert for approving or denying a particular first or second outbound data transfer using the at least one hardware processor, wherein the at least one hardware processor is further directed to: detect initiated outbound data transfers and categorize an initiated outbound data transfers as either first or subsequent data movements; instantiate a notification for user interaction when an outbound data transfer is identified, wherein the notification offering a choice to approve or deny a detected transfer; embed relevant data details within the notification, wherein the notification comprising a destination, a data type, and a timestamp, thereby furnishing users with information for informed decision-making; capture and interpret a user's decision to approve or deny from the notification; elevate an urgency of specific alerts and prioritize specific alerts based on defined criteria, wherein the defined criteria comprising a data size and a destination sensitivity; engage with stored historical user decisions, using a history to refine and potentially pre-empt for alerts for repetitive and routine transfers; upon receipt of a user decision, dispense a confirmation message reflecting a user's chosen action; and grant system administrators to delineate parameters and triggers for alert generation.
- 10 . The system of claim 1 , wherein the at least one hardware processor is further directed to: determine the first outbound data transfer to derive the first reputation score based on the classification of a contained data, the destination entity receiving the first outbound data transfer, and a specific data type of the first outbound data transfer, selected from a predefined set of possible data types; process a subsequent outbound data transfer to determine a second reputation score by considering the classification of a subsequent data, an intended endpoint or objective entity for the subsequent outbound data transfer, and associated data type of the second outbound data transfer, chosen from the predefined set of data types; compare the first reputation score and the second reputation score to a predefined reputation threshold to identify potential violations of the data loss prevention (DLP) policy, marking a violation if reputation score is below a set threshold; fetches real-time data for analysis and dynamically update the predefined set of data types to account for emerging data categories; notify system administrators of detected DLP policy violations; classify data in the subsequent outbound data transfer based on a hierarchical structure that weighs sensitivity and importance of data categories; regulates the reputation threshold based on sensitivity requirements set by an organization; log records of evaluated data transfers and detected violations; map destination entities based on trust levels to influence a derivation of reputation scores; present reputation scores and potential DLP policy violations and offer administrative controls for setting or adjusting thresholds via a user interface; and refine and improve an accuracy of reputation score calculations over time by integrating them with machine learning techniques.
- 11 . The system of claim 1 , wherein approving or denying the particular outbound data transfer by the at least one hardware processor, wherein the at least one hardware processor is further directed to: identify outbound data transfers that necessitate user approval and subsequently generate an alert using modalities comprising pop-ups, system alerts and emails; upon user interaction with a generated notification, visually represent details of an impending outbound data transfer, including a source of data, a destination entity, a data type, a volume or size, a timestamp; present distinct user action options, recognize and add a destination to a list of trusted entities; receive and interpret a user's chosen action and authorize or prohibit the outbound data transfer based on a selection by the user; document and store every user decision, and built with a capability to use stored decisions for audit trails, user preference learning, or reference purposes; furnish the user with a confirmation message corresponding to their action, providing statuses, wherein the statuses comprising a transfer in progress and a transfer stopped; default to pre-set behaviors if a user remains indecisive within a designated timeframe; and adapt to user behaviors over time, optimizing system responses based on prior logged decisions.
- 12 . The system of claim 1 , wherein permitting the separate outbound information move using the at least one hardware processor, wherein the at least one hardware processor is further directed to: detect and catalog separate outbound data movements initiated from within the system; capture a user decision related to an identified outbound data transfers and solicit an endorsement and denial of a particular outbound data movement; analyze the user's decision, wherein the user endorses a particular outbound data transfer, the at least one hardware processor authorizes a progression of that respective data movement, and conversely, upon user denial, and instantaneously halts and prevents the particular outbound data transfer; capture, record, and store each user decision concerning outbound data transfers, thereby built to offer insights and provide a traceable record for both endorsed and denied data transfer requests; provide feedback to the user or system administrator concerning issues preventing an execution of the user's decision; monitor and log a status of outbound data transfers, and ensure that only user-endorsed transfers proceed while denied transfers remain halted or terminated; and provide the user with a timely confirmation message that mirrors their decision, delivering real-time statuses, wherein the real-time statuses comprising a transfer approved and a transfer denied.
- 13 . A sensitive information disclosure prediction method for social media users, the method comprises: collecting various facets of a user data selected from a user-specific and profile-related information including personal preferences and a profile picture, generating an information regarding interactions of social network sites (SNS) users including a visit duration, an information about users' friends, an activity data containing information that users shared and created on a website, and a group data containing information about each of a group user forms and takes part in; extracting features and selecting the features thereby splitting a dataset based on a certain threshold value(s) of the features using an Extra-Trees ensemble method; calculating a sensitivity score of each of a split data using a machine learning technique; classifying the dataset into a rank selected from extremely sensitive, very sensitive, moderately sensitive, low sensitivity, and very low sensitive using the sensitivity score of each of the split data; monitoring outbound data transfers made by a computing device by a data loss prevention (DLP) agent and identifying outgoing data transfers as one of several different data types; determining a first reputation score based on a classification and a destination entity for a first outbound data transfer to a specified recipient entity, where the first outbound data transfer is a first data type from a plurality of data types, and determining a second outbound information move in light of a grouping and an objective substance, wherein the second outbound information move is a subsequent information kind of a majority of information types, and determining a first violation of a DLP policy by comparing the first reputation score to a reputation threshold, where the first violation is determined when the first reputation score is lower than the reputation threshold thereby determining a second violation of the DLP policy by comparing a second reputation score to the reputation threshold, with the second violation being identified when the second reputation score is lower than the reputation threshold; and carrying out at least one of a reporting actions or remedial actions in response to first, second, or both violations that are found, wherein carrying out at least one of the reporting actions or remedial actions comprising steps of: a) generating an alert for approving or denying a particular first or second outbound data transfer; b) obtaining user input for approving or denying a particular outbound data transfer; and c) permitting a separate outbound information move when a client endorses a particular outbound information move when the user denies the respective outbound data transfer, and preventing that particular outbound data transfer.
- 14 . The method of claim 13 , further comprising: accessing a user's profile on a social network site (SNS) and initiating data collection by fetching profile metadata, including user-defined preferences; capturing graphical data, specifically analyzing and extracting feature vectors from the profile picture; logging user activity metrics and visit duration; crawling and indexing user's friends' profiles to obtain relational data; capturing posts, comments, likes, and shares to generate a user interaction dataset; and iterating through user's group associations, detailing groups joined, content shared within them, and engagement metrics.
- 15 . The method of claim 13 , further comprising: enhancing data utility and privacy by implementing the Extra-Trees ensemble method for high-dimensional data to break down user datasets; analyzing a constructed feature space to determine and extract feature vectors; and dividing the dataset based on computed threshold values of significant features.
- 16 . The method of claim 13 , further comprising: elevating privacy measures by assigning a sensitivity score to data fragments leveraging machine learning techniques, emphasizing differential privacy techniques; balancing between raw sensitive data from the SNS and a weighted importance assigned by the user to their data; isolating features capable of direct user identification or tagged as highly sensitive; and utilizing semantic transformation techniques to modify less sensitive features, ensuring data integrity and reduced privacy concerns.
- 17 . The method of claim 13 , further comprising: engaging an optimized Support Vector Machine (SVM) model for multi-category classification; and assigning sensitivity labels ranging from “extremely sensitive” to “very low sensitive” based on sensitivity scores.
- 18 . The method of claim 13 , wherein monitoring data transfers in real-time by: invoking a data loss prevention (DLP) agent to scrutinize outbound data packets; and employing deep packet inspection to categorize the data being transferred as per a predefined set of data type categories.
- 19 . The method of claim 13 , further comprises strategizing data transfer safety by: computing reputation scores using neural network models based on data classification and the destination entity; analyzing historical data transfers, based on frequency and nature of data types transferred to compute subsequent reputation scores; setting and dynamically adjusting reputation thresholds, leveraging reinforcement learning models that adapt to changing data patterns and external feedback; and triggering violation flags when computed reputation scores breach set thresholds.
- 20 . The method of claim 13 , further comprises executing responsive protocols, both reactive and proactive, including: activating system-generated alerts, using heuristics to decide when to prompt the user based on data transfer risks; incorporating user feedback loops, wherein the system learns from user actions to refine its alert and action mechanisms; and conditionally allowing data transfers based on user input, utilizing cryptographic techniques to safeguard data in transit, when a transfer is flagged but permitted by the user, and halting those not endorsed.
Description
TECHNICAL FIELD The present disclosure relates to systems for preventing sharing of sensitive information. More particularly, a sensitive information disclosure prediction system and method thereof for social media users using machine learning techniques and support vector machine (SVM). BACKGROUND Social media platforms have become an integral part of modern society, allowing users to connect, share content, and engage with others online. However, the widespread use of social media has also raised concerns about the privacy and security of personal information shared on these platforms. Existing solutions in this domain include various privacy settings and user controls offered by social media platforms. These settings allow users to limit the visibility of their posts, control who can access their personal information, and manage the sharing of sensitive data. However, these settings often rely on the users themselves to make informed decisions about their privacy, which can be challenging and prone to errors. Another existing approach is the use of automated techniques employed by social media platforms to detect and flag potentially sensitive content. These techniques analyze the content of user posts, comments, and messages, and employ machine learning techniques to identify potentially harmful or inappropriate content. While these systems are effective in detecting explicit content, they may not adequately address situations where sensitive information is indirectly disclosed or shared unintentionally. Additionally, there are research studies and academic papers that focus on user behavior analysis to predict potential sensitive information disclosure on social media. These studies often utilize data mining and natural language processing techniques to analyze patterns and trends in user behavior, identifying potential indicators of sensitive information disclosure. However, these studies are limited to academic research and lack practical implementation in real-world social media platforms. Hence, there is a need for an innovative system that can accurately predict potential instances of sensitive information disclosure on social media platforms, considering both explicit and implicit disclosure scenarios. The system should leverage advanced machine learning and data analysis techniques to assess user behavior, content patterns, and contextual information to provide proactive measures for users to protect their sensitive information effectively. The present disclosure aims to overcome the limitations of existing solutions by providing a sensitive information disclosure prediction system that utilizes advanced techniques and contextual analysis to identify potential instances of sensitive information disclosure on social media platforms. By leveraging machine learning models and user behavior analysis, the system offers enhanced accuracy and reliability in predicting disclosure scenarios, thereby empowering users to take proactive measures to protect their privacy and prevent unintended information exposure. BRIEF SUMMARY The present disclosure seeks to provide a sensitive information disclosure prediction system for social media users to protect their private information from these inferences. The prediction system offers a data analytics component for accurate forecasts of users' privacy control with the help of the machine learning technique and support vector machine (SVM). In an embodiment, a sensitive information disclosure prediction system for social media users is disclosed. The system includes a data collection unit for collecting various facets of user data selected from user-specific and profile-related information including personal preferences and profile picture, the data that generates all the information regarding interactions of SNS (social network sites) users including visit duration, information about the users' friends, activity data containing information that users shared and/or created on the website, and group data containing information about each of the group user forms and takes part in. The process further includes a feature extraction processor for extracting a set of features and selecting the features thereby splitting the dataset based on a certain threshold value(s) of the features using an Extra-Trees ensemble method. The process further includes a sensitivity score calculation processor for calculating the sensitivity score of each of the split data using a machine learning technique, wherein the sensitivity score is identified using the sensitive information in both SNS and the importance of information for users, wherein each feature that leads to the identification of an individual directly and/or is considered as highly sensitive will be detached and other features having less sensitivity based on the user's perspective are replaced with less semantic values to decrease the privacy risks for users. The process includes a support vector machine (SVM) model for c