Search

CN-122027320-A - Intelligent OTA flow identification method based on machine learning

CN122027320ACN 122027320 ACN122027320 ACN 122027320ACN-122027320-A

Abstract

The invention discloses an intelligent OTA flow identification method based on machine learning, which relates to the technical field of communication network security and artificial intelligence intersection, and comprises the steps of analyzing hotel search requests and filtering abnormal flows; the method comprises the steps of predicting shopping frequency and conversion rate in real time, determining key influencing factors through SHAP analysis, grouping requests and eliminating outliers, dynamically calculating TTL, dividing flow into seven stages P0 to P6 according to PDCC and conversion performance and executing differential caching or penetration strategies, generating structured caching key writing Redis, and retraining through a full-link buried point and a periodic model to construct a closed-loop optimization mechanism. The invention aims to solve the problem of resource mismatch caused by static caching strategy and coarse-granularity flow diversion in the traditional OTA system, remarkably improve the cache hit rate and response speed, reduce cloud service cost and realize collaborative optimization of cost, performance and user experience.

Inventors

  • TAN XIAOKANG
  • Tian Hanze
  • GAO JUN

Assignees

  • 陕西航星数科信息技术有限公司

Dates

Publication Date
20260512
Application Date
20260306

Claims (10)

  1. 1. The intelligent OTA flow identification method based on machine learning is characterized by comprising the following specific steps: Step 1, receiving a hotel search request from an online travel agency channel, and carrying out parameter analysis and validity verification on the request; Step 2, calling a deployed flow value recognition model to evaluate the current request in real time, constructing the model based on LightGBM algorithm, inputting multidimensional feature vectors, and outputting shopping behavior prediction frequency and positive price conversion rate corresponding to the request; step 3, decomposing the model decision process by using SHAP interpretability analysis framework, and calculating the contribution degree of each input characteristic to the final prediction result Determining dominant influence factors according to the contribution degree sequence; step 4, combining the prediction result, hotel identification, check-in date, historical shopping frequency and conversion rate index into four-dimensional data points, inputting the four-dimensional data points into a pre-trained K-means cluster to carry out grouping attribution judgment, setting the cluster number as 6, adopting Euclidean distance as a similarity measurement standard, and outputting the cluster number to which the current request belongs; Step 5, performing Z-Score anomaly detection to exclude outlier data, respectively calculating standardized scores of an observed value x relative to a historical mean value mu and a standard deviation sigma of the two key indexes of the historical shopping frequency and the conversion rate, and marking the value as anomaly and terminating a subsequent caching process if the absolute value exceeds a threshold value 3; Starting a TTL intelligent computing engine, inquiring a preset behavior weight coefficient table according to the combination of the number of days of a store and the number of adults, obtaining a corresponding weight coefficient, and carrying out weighting operation in combination with a basic period of 769 hours to respectively generate preliminary buffer time suggestions of buffering_ttl and pricing _ttl, wherein TTL is a survival buffer life period; step 7, introducing a dynamic adjustment mechanism driven by clusters, and selecting a TTL correction strategy according to the cluster number value; and 8, superposing the time dimension adjustment factors, acquiring the class of the hour-level time period where the current system is located, and determining the buffer life cycle of the data item corresponding to the request according to the class.
  2. 2. The intelligent OTA traffic recognition method based on machine learning of claim 1 wherein the parameters of step 1 are parsed into key information of check-in date, check-out date, number of rooms, number of adults, purchase price level and search time stamp, the validity check is to execute abnormal traffic filtering, and when the number of rooms is detected to be greater than 9, the invalid request is determined and intercepted.
  3. 3. The intelligent OTA traffic recognition method based on machine learning of claim 1, wherein the multi-dimensional feature vector in step 2 comprises a date_sine_ base, ap, check _in_ weekend, check _in_mole, a day_of_week and a check_in_day, wherein the date_sine_base is the number of days from a certain fixed reference date, ap is a purchase price level, normalized to a 0 to 1 interval, check_in_ weekend is a Boolean value indicating whether the date of entry is a weekend, check_in_mole is a1 to 12 integer value of the month of entry, day_of_week is a 0 to 6 integer value of the day of entry, and check_in_day is a1 to 31 date of entry.
  4. 4. The intelligent OTA traffic recognition method based on machine learning of claim 1, wherein in the step 7, the TTL correction policy is that the final TTL is set to 0 min when the cluster number belongs to the lowest or the second lowest cluster, the final TTL is set to 60 min when the cluster number belongs to the third lowest cluster, the final TTL is set to 30 min when the cluster number belongs to the hot hotel cluster, and the rest conditions uniformly use 240 min as a default value, and the mapping rule of the correction policy is that the TTL is set to 0 min when the cluster number is 0 or 1, the TTL is set to 60 min when the cluster number is 2, the TTL is set to 30 min when the cluster number is 3, and the TTL is set to 240 min when the cluster number is 4 or 5.
  5. 5. The intelligent OTA traffic recognition method based on machine learning of claim 1, wherein the determining method in step 8 is to introduce a 0.8-fold attenuation coefficient to linearly down-regulate TTL obtained in step 7 if the current time is a high request amount but low success rate interval, otherwise, maintaining the original duration unchanged.
  6. 6. The intelligent OTA traffic identification method based on machine learning of claim 1, further comprising the steps of: step 9, constructing a hierarchical decision matrix according to the PDCC cost index and the L2B, L P conversion performance, and dividing the flow into seven grades from P0 to P6; Step 10, for a request to be cached, generating a structured cache key with a format of ota, wherein the cache is gtp_group_hotel_id_ checkin _ checkout _ rooms _ adults _child, hotel quotation data are written in by using a Redis persistent storage system according to setex command, and TTL parameters are converted into second-level units by multiplying the minute value output by the step 8 by 60 and then are input; Step 11, embedding a buried point acquisition module in the request processing full link, continuously recording core service indexes, and synchronizing the feedback data to a big data platform in real time for updating a training sample set; step 12, periodically triggering a model retraining task, re-optimizing LightGBM model parameters theta based on the latest accumulated buried data, and minimizing a double-objective loss function Simultaneous reinforcement of regularization term To suppress the risk of overfitting, push the new model to the online service cluster after training is completed and update the SHAP interpretation layer and cluster center configuration.
  7. 7. The intelligent OTA traffic identifying method based on machine learning according to claim 6, wherein the PDCC in step 9 integrates the expected cloud resource consumption cost and the expected benefit, the L2B is the conversion rate from clues to reservation, the L2P is the conversion rate from clues to payment, and the index traffic classification judging condition comprises the P0 level requirement And has the real reservation record and price checking behavior, P1 level requirement And is also provided with And (3) with The P2 level is in the same PDCC interval but both conversion indexes are not up to standard, the P3 level has no conversion history, and the P4 to P6 levels correspond Is a combination of different transformation states.
  8. 8. The intelligent OTA traffic recognition method based on machine learning of claim 6, wherein in the generation rule of the cache key in step 10, gtp is a global transaction protocol version number, group is service block code, hotel_id is a hotel unique identifier, checkin and checkout are expressed in YYYYMMDD format, rooms, adults, children are integer values, and all fields are connected by English colon numbers.
  9. 9. The intelligent OTA traffic identification method based on machine learning as set forth in claim 6, wherein the core traffic index in the step 11 comprises a cache hit status, an end-to-end response delay, a user click action, and an order conversion result, the data item collected by the embedded point comprises a cache hit flag bit, an end-to-end response delay, a user click action, whether to enter a payment process, and whether to be a single finally, all events carry a unique request ID and a timestamp, and are asynchronously transmitted to the data warehouse through a message queue.
  10. 10. The intelligent OTA traffic recognition method based on machine learning of claim 6, wherein the model retraining period in step 12 is executed once every 6 hours, the incremental learning mode is adopted to load newly added sample data for 6 hours, A/B test is performed after training is completed, and if the prediction accuracy of the new model on the reserved sample is improved by more than 1.5 percent relative to the old model, the new model is automatically online.

Description

Intelligent OTA flow identification method based on machine learning Technical Field The invention relates to the technical field of communication network security and artificial intelligence intersection, in particular to an intelligent OTA flow identification method based on machine learning. Background With the continuous expansion of the online travel agency platform service scale, hotel search request quantity presents an exponentially growing situation, and a serious challenge is presented to the real-time response capability and resource scheduling efficiency of a back-end system. The traditional caching mechanism generally adopts a static caching time strategy and a single-dimensional caching key design, and lacks dynamic perceptibility of the intrinsic business value of the flow, so that the system faces double dilemma of unbalanced resource allocation and uncontrolled cost in a high concurrency scene. Particularly, in the cloud primary architecture, the on-demand charging mode of computing resources greatly amplifies the erosion effect of invalid or inefficient requests on the operation cost, and an intelligent cache system capable of accurately identifying the flow value and implementing differentiated processing according to the flow value is needed. The traffic value identification and cache management technology based on machine learning becomes a key direction for optimizing the utilization efficiency of OTA platform resources. The technique aims at constructing a quantifiable and interpretable flow value evaluation model by analyzing multidimensional features in user search behaviors, and driving dynamic adjustment of a caching strategy according to the quantitative and interpretable flow value evaluation model. The core aim is to realize real-time penetrating query of high-value traffic to ensure user experience, and simultaneously implement high-efficiency caching on low-value traffic to reduce the calling frequency of suppliers, thereby remarkably compressing cloud service expenditure on the premise of maintaining service quality. In the prior art, a mainstream OTA platform still widely depends on a rule-based caching strategy, for example, a fixed TTL is uniformly set for all requests or coarse-granularity caching decisions are carried out only according to the request frequency, the method cannot capture the relevance between a time sequence mode of user behaviors and commercial value, so that high-value traffic is forced to repeat inquiry due to premature cache failure, an optimal quotation window is missed, and meanwhile, a large number of requests with low conversion potential continuously trigger real-time calculation, so that serious waste of calculation resources is caused. Although some systems attempt to introduce simple statistical metrics as cache weight references, these metrics are difficult to fully reflect the true cost-benefit ratio of traffic, and especially cannot support achievement of refined cost control objectives such as Per Dollar Cloud Cost. In addition, the existing cache architecture generally lacks fusion modeling capability for multi-source behavior data, and high-frequency distribution characteristics such as number of days in which the cache is input, number of adults and the like cannot be effectively utilized to guide cache key design and TTL calculation, so that the cache hit rate is low for a long time, and the overall performance of the system is limited. Disclosure of Invention The invention aims to provide an intelligent OTA flow identification method based on machine learning, which is used for solving the problem of resource mismatch caused by static caching strategy and coarse-grained flow diversion in a traditional OTA system. In order to solve the technical problems, the invention provides the following technical scheme: An intelligent OTA flow identification method based on machine learning comprises the following specific steps: Step 1, receiving a hotel search request from an online travel agency channel, carrying out parameter analysis and validity verification on the request, extracting key information of check-in date, check-out date, room number, adult number, purchase price level and search time stamp, executing abnormal flow filtering, judging that the request is invalid when the room number is detected to be more than 9, and intercepting the request; Step 2, invoking a deployed flow value recognition model to evaluate the current request in real time, constructing the model based on LightGBM algorithm, inputting multidimensional feature vectors comprising the date_sine_ base, ap, check _in_ weekend, check _in_mole, the day_of_week and the check_in_day, and outputting shopping behavior prediction frequency and prediction positive price conversion rate corresponding to the request; decomposing a model decision process by applying a SHAP interpretability analysis framework, calculating the contribution degree phi of each input feature to