Search

US-12626159-B2 - Music recommendation method and apparatus

US12626159B2US 12626159 B2US12626159 B2US 12626159B2US-12626159-B2

Abstract

A music recommendation method and apparatus are provided, to determine an attention mode of a user in a complex environment by using viewpoint information of the user, thereby more precisely implementing music matching. According to a first aspect, a music recommendation method is provided. The method includes: receiving visual data of a user (S 501 ); obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data (S 502 ); determining an attention mode of the user based on the attention duration of the at least one attention unit (S 503 ); and determining recommended music information based on the attention mode (S 504 ).

Inventors

  • Shu Fang
  • Libin Zhang

Assignees

  • HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date
20260512
Application Date
20230227

Claims (17)

  1. 1 . A music recommendation method comprising: receiving visual data of a user, wherein the visual data comprises picture information viewed by the user; obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data, wherein obtaining the at least one attention unit and the attention duration of the at least one attention unit comprises: obtaining the at least one attention unit based on the picture information, determining a similarity between a first attention unit of the at least one attention unit and a second attention unit of the at least one attention unit, and when the similarity is greater than or equal to a first threshold, determining that attention duration of the second attention unit is equal to a sum of attention duration of the first attention unit and the attention duration of the second attention unit, wherein the first attention unit and the second attention unit are attention units at different moments in time, and the attention duration of the at least one attention unit comprises the attention duration of the second attention unit; determining an attention mode of the user based on the attention duration of the at least one attention unit; and determining recommended music information based on the attention mode.
  2. 2 . The method according to claim 1 , wherein the visual data further comprises viewpoint information of the user, and the viewpoint information comprises a position of a viewpoint and attention duration of the viewpoint.
  3. 3 . The method according to claim 1 , wherein determining the attention mode of the user comprises: when a standard deviation of the attention duration of the at least one attention unit is greater than or equal to a second threshold, determining that the attention mode of the user is a staring mode; or when a standard deviation of the attention duration of the at least one attention unit is less than a second threshold, determining that the attention mode of the user is a scanning mode.
  4. 4 . The method according to claim 1 , wherein determining the recommended music information comprises: when the attention mode is a scanning mode, determining the music information based on the picture information; or when the attention mode is a staring mode, determining the music information based on an attention unit with highest attention in the at least one attention unit.
  5. 5 . The method according to claim 4 , wherein determining the recommended music information further comprises: determining a behavior state of the user at each moment within a first time period based on the attention mode; determining a behavior state of the user within the first time period based on the state at each moment; and determining the music information based on the behavior state within the first time period.
  6. 6 . The method according to claim 1 , wherein obtaining the at least one attention unit and the attention duration of the at least one attention unit further comprises: when the similarity is less than the first threshold, reserving the attention duration of the first attention unit and the attention duration of the second attention unit, wherein the attention duration of the at least one attention unit comprises the attention duration of the first attention unit and the attention duration of the second attention unit.
  7. 7 . A music recommendation apparatus, comprising: a memory to store executable instructions thereon, and a processor coupled to the memory to execute the executable instructions to cause the music recommendation apparatus to perform operations comprising: receiving visual data of a user, wherein the visual data comprises picture information viewed by the user; obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data, wherein obtaining the at least one attention unit and the attention duration of the at least one attention unit comprises: obtaining the at least one attention unit based on the picture information, determining a similarity between a first attention unit of the at least one attention unit and a second attention unit of the at least one attention unit, and when the similarity is greater than or equal to a first threshold, determining that attention duration of the second attention unit is equal to a sum of attention duration of the first attention unit and the attention duration of the second attention unit, wherein the first attention unit and the second attention unit are attention units at different moments in time, and the attention duration of the at least one attention unit comprises the attention duration of the second attention unit; determining an attention mode of the user based on the attention duration of the at least one attention unit; and determining recommended music information based on the attention mode.
  8. 8 . The music recommendation apparatus according to claim 7 , wherein the visual data further comprises viewpoint information of the user, and the viewpoint information comprises a position of a viewpoint and attention duration of the viewpoint.
  9. 9 . The music recommendation apparatus according to claim 7 , wherein determining the attention mode of the user comprises: when a standard deviation of the attention duration of the at least one attention unit is greater than or equal to a second threshold, determining that the attention mode of the user is a staring mode; or when a standard deviation of the attention duration of the at least one attention unit is less than a second threshold, determining that the attention mode of the user is a scanning mode.
  10. 10 . The music recommendation apparatus according to claim 7 , wherein determining the recommended music information comprises: when the attention mode is a scanning mode, determining the music information based on the picture information; or when the attention mode is a staring mode, determining the music information based on an attention unit with highest attention in the at least one attention unit.
  11. 11 . The music recommendation apparatus according to claim 10 , wherein determining the recommended music information further comprises: determining a behavior state of the user at each moment within a first time period based on the attention mode; determining a behavior state of the user within the first time period based on the state at each moment; and determining the music information based on the behavior state within the first time period.
  12. 12 . The music recommendation apparatus according to claim 7 , wherein obtaining the at least one attention unit and the attention duration of the at least one attention unit further comprises: when the similarity is less than the first threshold, reserving the attention duration of the first attention unit and the attention duration of the second attention unit, wherein the attention duration of the at least one attention unit comprises the attention duration of the first attention unit and the attention duration of the second attention unit.
  13. 13 . A non-transitory computer-readable storage medium storing executable instructions thereon, that when executed by a processor of an apparatus, cause the apparatus to perform operations comprising: receiving visual data of a user, wherein the visual data comprises picture information viewed by the user; obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data, wherein obtaining the at least one attention unit and the attention duration of the at least one attention unit comprises: obtaining the at least one attention unit based on the picture information, determining a similarity between a first attention unit of the at least one attention unit and a second attention unit of the at least one attention unit, and when the similarity is greater than or equal to a first threshold, determining that attention duration of the second attention unit is equal to a sum of attention duration of the first attention unit and the attention duration of the second attention unit, wherein the first attention unit and the second attention unit are attention units at different moments in time, and the attention duration of the at least one attention unit comprises the attention duration of the second attention unit; determining an attention mode of the user based on the attention duration of the at least one attention unit; and determining recommended music information based on the attention mode.
  14. 14 . The non-transitory computer-readable storage medium according to claim 13 , wherein the visual data further comprises viewpoint information of the user, and the viewpoint information comprises a position of a viewpoint and attention duration of the viewpoint.
  15. 15 . The non-transitory computer-readable storage medium according to claim 13 , wherein determining the attention mode of the user comprises: when a standard deviation of the attention duration of the at least one attention unit is greater than or equal to a second threshold, determining that the attention mode of the user is a staring mode; or when a standard deviation of the attention duration of the at least one attention unit is less than a second threshold, determining that the attention mode of the user is a scanning mode.
  16. 16 . The non-transitory computer-readable storage medium according to claim 13 , wherein determining the recommended music information comprises: when the attention mode is a scanning mode, determining the music information based on the picture information; or when the attention mode is a staring mode, determining the music information based on an attention unit with highest attention in the at least one attention unit.
  17. 17 . The non-transitory computer-readable storage medium according to claim 13 , wherein obtaining the at least one attention unit and the attention duration of the at least one attention unit further comprises: when the similarity is less than the first threshold, reserving the attention duration of the first attention unit and the attention duration of the second attention unit, wherein the attention duration of the at least one attention unit comprises the attention duration of the first attention unit and the attention duration of the second attention unit.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This disclosure is a continuation of International Application No. PCT/CN2020/112414, filed on Aug. 31, 2020. The disclosures of the aforementioned application are hereby incorporated by reference in entirety. TECHNICAL FIELD This disclosure relates to the field of artificial intelligence, and more specifically, to a music recommendation method and apparatus. BACKGROUND A personalized music recommendation technology can improve music experience of a user. A conventional method is that music recommendation is implemented by using a data mining technology based on historical music playback information of a user. In this method, current state information of the user cannot be considered. Currently, in some methods, current state information of a user may be collected by using different sensors. For example, related music recommendation is implemented by sensing environmental information including information such as a position, weather, a time, a season, ambient sound, and an environment picture; or related music recommendation is implemented by measuring the current state information of the user, for example, analyzing a current psychological state of the user by collecting a brain wave, collecting a picture seen by the user, or obtaining a heart rate of the user. In a current method, music recommendation is performed based on an image that is seen by a user and that is collected through shooting. This relates to a music-image matching process. In an actual scenario, an environment may include many scenes. If music recommendation is implemented based on only an entire image, a music matching degree is reduced. SUMMARY This disclosure provides a music recommendation method and apparatus, to determine an attention mode of a user in a complex environment by using viewpoint information of the user, thereby more precisely implementing music matching. According to a first aspect, a music recommendation method is provided. The method includes: receiving visual data of a user; obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data; determining an attention mode of the user based on the attention duration of the at least one attention unit; and determining recommended music information based on the attention mode. In the music recommendation method in this embodiment of this disclosure, the attention mode of the user is determined based on visual information of the user, to more precisely determine attention content of the user, so that more suitable music is recommended, and the recommended music is in line with a thing that the user is really interested in and is in line with a real behavior state of the user, thereby improving user experience. With reference to the first aspect, in a possible implementation of the first aspect, the visual data includes viewpoint information of the user and picture information viewed by the user, and the viewpoint information includes a position of a viewpoint and attention duration of the viewpoint. With reference to the first aspect, in a possible implementation of the first aspect, the obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data includes: obtaining the at least one attention unit based on the picture information; and obtaining a sum of the attention duration of the viewpoint in the at least one attention unit, to use the sum as the attention duration of the at least one attention unit. In the music recommendation method in this embodiment of this disclosure, an initial attention unit is determined based on the obtained picture information, and duration of each attention unit is determined based on the viewpoint information of the user. Compared with the conventional technology in which music recommendation is implemented based on only an entire picture viewed by a user, the viewpoint information may precisely indicate the attention content that the user is interested in, so that the recommended music can be more in line with a requirement of the user. With reference to the first aspect, in a possible implementation of the first aspect, the obtaining at least one attention unit and attention duration of the at least one attention unit based on the visual data further includes: determining similarity between a first attention unit and a second attention unit in the at least one attention unit, where the first attention unit and the second attention unit are attention units at different moments; and if the similarity is greater than or equal to a first threshold, attention duration of the second attention unit is equal to a sum of attention duration of the first attention unit and the attention duration of the second attention unit. In the music recommendation method in this embodiment of this disclosure, the first attention unit and the second attention unit may be attention units in image frames at di