Search

DE-112012005197-B4 - Event mining in social networks

DE112012005197B4DE 112012005197 B4DE112012005197 B4DE 112012005197B4DE-112012005197-B4

Abstract

Procedure (100,500) for detecting an event from a social stream, wherein the procedure (100,500) comprises the steps of one: Receiving a social stream from a social network, where the social stream contains at least one object; and where the object contains a text, sender information of the text and recipient information of the text (101,510); Assigning the object to a cluster based on a similarity value between the object and the cluster (102,520); Monitoring changes in at least one of the clusters (103,540); and Triggering an alarm when changes in at least one of the clusters exceed a first threshold (104,550), where at least one of the steps is performed using a computer unit, wherein the procedure further includes maintaining changes in the clusters as history data of the social stream (530), wherein in the maintenance step a sketch-supported technique is used, and wherein the sketch-supported technique is used to estimate a structural similarity value between the object and the clusters.

Inventors

  • Karthik, c/o IBM India Ltd. Subbian
  • Charu ,c/o IBM Aggarwal

Assignees

  • INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date
20260513
Application Date
20121123
Priority Date
20111213

Claims (15)

  1. Method (100,500) for detecting an event from a social stream, wherein the method (100,500) comprises the steps of: receiving a social stream from a social network, where the social stream contains at least one object; and where the object contains text, sender information of the text, and recipient information of the text (101,510); assigning the object to a cluster based on a similarity score between the object and the cluster (102,520); monitoring changes in at least one of the clusters (103,540); and triggering an alarm when changes in at least one of the clusters exceed an initial threshold (104,550), where at least one of the steps is performed using a computer unit, where the procedure further includes maintaining the changes in the clusters as historical social stream data (530), where in the maintenance step a sketch-based technique is used, and where the sketch-based technique is used to estimate a structural similarity value between the object and the clusters.
  2. Procedure (100,500) according to Claim 1 , wherein the similarity score is determined by calculating a value selected from a group consisting of a structural similarity score, a content-based similarity score, a temporal similarity score, and combinations thereof.
  3. Procedure (100,500) according to Claim 1 , wherein the assignment step (520) further includes the step of: assigning the object to an existing cluster if a similarity value between the object and the existing cluster is greater than a second threshold.
  4. Procedure (100,500) according to Claim 1 , wherein the Assign step (520) further includes the steps of: creating a new cluster with the object if the similarity value between the object and the existing cluster is less than a second threshold; and replacing an obsolete cluster with the new cluster.
  5. Procedure (100,500) according to Claim 3 , where the second threshold is calculated from the expected value and standard deviation of the similarity value.
  6. Procedure (100,500) according to Claim 1 , where the historical data is used for monitored event detection.
  7. Procedure (100,500) according to Claim 6 , wherein the step of triggering an alarm (550) further includes the step of: using an event signature and a horizon signature.
  8. System (200) for detecting an event from a social stream, wherein the system (200) comprises: a receiver module (220) for receiving a social stream from a social network, where the social stream contains at least one object; and where the object contains text, sender information of the text, and recipient information of the text; a cluster module (230) for assigning the object based on a similarity value between the object and the cluster; a monitoring module (240) for monitoring changes in at least one of the clusters; and a triggering module (250) for triggering an alarm when the changes in at least one of the clusters exceed a first threshold, where the system further comprises a maintenance module for maintaining the changes in the cluster as historical data of the social stream, wherein the maintenance module uses a sketch-based technique, and wherein the sketch-based technique is used to estimate a structural similarity value between the object and the clusters.
  9. System (200) according to Claim 8 , wherein the similarity score is determined by calculating a value selected from a group consisting of a structural similarity score, a content-based similarity score, a temporal similarity score, and combinations thereof.
  10. System (200) according to Claim 8 , wherein the cluster module (230) further includes: an Existing Cluster module for assigning the object to an existing cluster if a similarity value between the object and the existing cluster is greater than a second threshold.
  11. System (200) according to Claim 8 , wherein the cluster module (230) further includes: a new cluster module for creating a new cluster with the object if the similarity value between the object and the existing cluster is less than a second threshold; and a replace module for replacing an obsolete cluster with the new cluster.
  12. System (200) according to Claim 10 , where the second threshold is calculated from the expected value and standard deviation of the similarity value.
  13. System (200) according to Claim 8 , where the historical data is used for monitored event detection.
  14. System (200) according to Claim 13 , wherein the trigger module (250) further includes: a signature module for using an event signature and a horizon signature.
  15. A computer-readable storage medium that physically forms a computer-readable program code with computer-readable instructions which, when executed, cause a computer to perform the steps of Claim 1 to execute.

Description

BACKGROUND OF THE INVENTION Field of invention The present invention relates to event mining and, more precisely, the detection of new events from a social stream. Description of the state of the art The problem of text mining has been extensively studied within the information gathering community due to the ubiquity of text data availability in a wide variety of scenarios, such as the web, social networks, newsfeeds, and many others. Much of this text data originates in the context of temporal applications, such as newsfeeds and social media streams, where text arrives as a continuous and massive influx of documents. Streaming applications pose a particular challenge for such problems because it is often necessary to process the data in a single pass, and not all data can be stored on disk for reprocessing. A key challenge in the context of temporal and streaming text data is that of online event detection, which is closely related to the problem of topic detection and tracking. This challenge is also closely related to stream splitting and attempts to identify new thematic trends within the text stream and their significant development. The idea is that important and newsworthy real-life events (such as the recent unrest in the Middle East) are often captured as temporal bursts of closely related documents within a social stream. The challenge can exist in both supervised and unsupervised scenarios. In the unsupervised case, it is assumed that no training data is available to guide the stream's event detection process. In the supervised case, historical data on events is available to guide the event detection process. From the printed text US 2010/0119053 A1 A method for evaluating social flows is known. From the printed text US 2007/0226212 A1 A method for detecting outliers in data based on clusters is known, whereby individual data points are assigned to clusters based on similarities with other data points. BRIEF SUMMARY OF THE INVENTION Accordingly, one aspect of the present invention provides a method for detecting an event from a social stream. The method comprises the steps of: receiving a social stream from a social network, wherein the social stream contains at least one object and the object contains text, sender information of the text, and recipient information of the text; assigning the object based on a similarity value between the object and the clusters; monitoring changes in at least one of the clusters; and triggering an alarm when the changes in at least one of the clusters exceed a first threshold, wherein at least one of the steps is performed using a computer unit. Another aspect of the present invention provides a system that recognizes an event from a social stream. The system includes: a receiver module for receiving a social stream from a social network, wherein the social stream contains at least one object and the object contains text, sender information of the text, and recipient information of the text; a cluster module for assigning the object to a cluster based on a similarity value between the object and the clusters; A monitoring module for monitoring changes in at least one of the clusters; and a triggering module for triggering an alarm when the changes in at least one of the clusters exceed a first threshold. BRIEF DESCRIPTION OF THE DRAWINGS 1 shows a flowchart illustrating a method 100 for detecting an event in a social network according to a preferred embodiment of the present invention.2 shows a system for detecting an event in a social network according to a preferred embodiment of the present invention.3 shows a hardware configuration for realizing or executing at least one embodiment of the present invention.4 shows a flowchart illustrating a method for assigning an object to an existing cluster or creating a new cluster during a partitioning step 102 according to a preferred embodiment of the present invention.5 shows a flow chart illustrating a method 500 for maintenance according to a further preferred embodiment of the invention.6 shows a detailed overall algorithm for maintenance according to a further preferred embodiment of the invention.7 According to a preferred embodiment of the present invention, the effectiveness results of the clustering algorithm with respect to cluster purity are illustrated.8 According to a preferred embodiment of the present invention, the efficiency of the clustering approach with an increasing number of clusters is shown.9 illustrates the results of the monitored event detection method according to a preferred embodiment of the present invention. DETAILED DESCRIPTION OF PREFERRED EXECUTION FORMS The foregoing and further features of the present invention will become apparent from a detailed description of embodiments thereof, which are shown in conjunction with the attached drawings. Identical reference numerals refer to the same or similar parts in the attached drawings of the invention. As will be apparent to those skilled