Search

EP-4742248-A1 - OPTIMIZING OPTICAL NANO-BIOSENSORS MADE OF SINGLE-WALLED CARBON NANOTUBES WRAPPED IN SINGLE-STRANDED SEQUENCES OF NUCLEOTIDES FOR DETECTION OF MOLECULES SUCH AS GLUCOSE AND CANCER BIOMARKERS

EP4742248A1EP 4742248 A1EP4742248 A1EP 4742248A1EP-4742248-A1

Abstract

The invention is notably directed to a computer-implemented method of optimizing an optical sensor (1) comprising a single-walled carbon nanotube (2), or SWCNT for short, which is wrapped in single-stranded sequence (3) of nucleotides (ssSN). The method revolves around two or more optimization cycles. Each optimization cycle includes clustering ssSNs of an input set of ssSNs to obtain clusters and selecting representative ssSNs of the clusters obtained. Next, one accesses measurements of optical responses of optical sensors to a target molecule, where the optical sensors include SWCNTs wrapped in actual ssSNs, which are synthetized in accordance with the representative ssSNs selected. After that, a top ssSN is identified among the representative ssSNs. The top ssSN is one that led to the highest optical response. Importantly, the top ssSN identified at each optimization cycle (but the last cycle) is mutated to obtain mutants and form a superset of sequences including the top ssSN and its mutants, prior to performing the next optimization cycle, using this superset as a new input set of ssSNs. This way, a local optimum is identified in the initial input space of the ssSNs. Additional optimization can be achieved through supervised learning and pattern recognition. The invention is further directed to related engineering methods (to synthetize ssSNs and wrap SWCNTs) and computer programs (to perform the optimizations).

Inventors

  • BOGHOSSIAN, ARDEMIS ANOUSH
  • Rabbani, Yahya
  • Bregy, Joey

Assignees

  • ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)

Dates

Publication Date
20260513
Application Date
20241108

Claims (15)

  1. A computer-implemented method of optimizing an optical sensor (1) comprising a single-walled carbon nanotube (2), or SWCNT, wrapped in single-stranded sequence (3) of nucleotides, or ssSN, wherein the method comprises performing (S20) two or more optimization cycles, wherein each cycle of the optimization cycles includes: clustering (S24) ssSNs of an input set of ssSNs to obtain clusters (C11 - C33) and selecting (S25) representative ssSNs of the clusters obtained; accessing (S26) measurements of optical responses of optical sensors (1) to a target molecule, the optical sensors including SWCNTs wrapped in actual ssSNs, the latter synthetized in accordance with the representative ssSNs selected; and identifying (S27), among the representative ssSNs, a top ssSN that led to a highest one of the optical responses, and mutating (S29) the top ssSN identified at each of the optimization cycles but a last cycle thereof to obtain mutants and form a superset of sequences including the top ssSN and said mutants, prior to performing (S20) a next one of the optimization cycles, using this superset as a new input set of ssSNs.
  2. The method according to claim 1, wherein, at mutating (S29) the top ssSN, the top ssSN is mutated (S29) several times through changes in both base and length.
  3. The method according to claim 2, wherein, at mutating (S29) the top ssSN, the top ssSN is mutated (S29) by introducing n bsm base sequence mutations and adding n ba bases at extremities of the top ssSN, where each of n bsm and n ba is equal to 1, 2, or 3.
  4. The method according to any one of claims 1 to 3, wherein, at said each cycle, the ssSNs are clustered (S24) according to one or more distance functions based on N d distinct sequence properties of the ssSNs, whereby the clusters (C11 - C33) obtained at said each cycle include at least N d parallel sets (S1 - S3) of clusters, where N d ≥ 2, preferably N d = 3, and the representative ssSNs are selected (S25) in each cluster (C11 - C33) of the distinct sets (S1 - S3) of clusters obtained, wherein each of the representative ssSNs is preferably selected (S25) based on its distance to a centre of a respective one of the clusters (C11 - C33).
  5. The method according to claim 4, wherein the distinct sequence properties include two or more, preferably each, of: an alignment similarity of the sequences, a k-mer frequency in the sequences, where k is preferably of between 3 and 7, and more preferably equal to 3, and folding properties of the sequences.
  6. The method according to claim 1 or 5, wherein said each cycle further comprises, determining (S23) optimal numbers of clusters for respective ones of the N d parallel sets (S1 - S3), preferably using the Elbow method, whereby the ssSNs are clustered (S24) in accordance with the optimal numbers of clusters determined, preferably using an unsupervised machine learning clustering method, more preferably the κ -means clustering method.
  7. The method according to any one of claims 1 to 6, wherein the method further comprises, after said last cycle, determining (S40)) one or more candidate ssSNs that potentially provide highest optical responses to the target molecule, based on at least a subset, preferably a full set of, all representative ssSNs corresponding to all optical sensors (1) for which measurements of optical responses were accessed throughout the optimization cycles, using (S44, S48) inference techniques relying on one or more computational models.
  8. The method according to claim 7, wherein said one or more models involve one or more machine learning models, and the one or more candidate ssSNs are determined (S40) by: training (S42) each model of the one or more machine learning models on a training dataset consisting of pairs of inputs and outputs, where the inputs correspond to at least a subset of said all representative ssSNs, respectively, and the outputs are based on measurements of optical responses as accessed (S26) for respective ones of the optical sensors (1); and running (S44) said each model, once trained, on a test set of ssSNs to infer (S44) respective optical response performances, whereby the one or more candidate ssSNs are eventually determined (S48) based on the optical response performances inferred.
  9. The method according to claim 8, wherein the set of one or more machine learning models comprises an odd number of at least three machine learning models, each configured as a classifier, whereby the optical response performances are inferred (S44) as classifications into classes that include, preferably consist of, a first class corresponding to a high optical response and a second class corresponding to a low optical response, and the method further comprises performing (S46) a majority vote based on the classifications obtained from each model of the set of machine learning models.
  10. The method according to claim 8 or 9, wherein the method further comprises performing (S48) pattern recognition on test sequences of the test set of ssSNs based on the optical response performances inferred therefor to identify sequence patterns that respond best, optically, to the target molecule, whereby the one or more candidate ssSNs are eventually determined based on the sequence patterns identified, and preferably predicting one or more new sequences based on the identified patterns, whereby the one or more candidate ssSNs are eventually determined based on the one or more new sequences predicted.
  11. The method according to claim 10, wherein said sequence patterns include patterns in terms of one or more, preferably two, and more preferably three, of k-mers, base positions, and lengths, of the test sequences.
  12. A method of engineering one or more optical sensors (1), each comprising a single-walled carbon nanotube (SWCNT) wrapped in a single-stranded sequence of nucleotides (ssSN), wherein the method comprises performing two or more optimization cycles, each including: receiving (S31), from a computerized system, representative ssSNs as selected during a corresponding optimization cycle of the method according to any one of claims 1 to 11; wrapping (S34) actual ssSNs around SWCNTs to obtain optical sensors (1), the actual ssSNs synthetized (S32) in accordance with the representative ssSNs selected, wherein the actual ssSNs are preferably wrapped around the SWCNTs using an exchange protocol; and characterizing (S36) optical responses of the optical sensors (1) to a target molecule to obtain measurements of said optical responses; and forwarding (S26) measurements of the optical responses to the computerized system for it to identify (S27), among the representative ssSNs, one or more top ssSNs that led to highest optical responses of the optical sensors (1).
  13. The method according to claim 12, wherein said optical responses are characterized (S36) through near infrared fluorescence spectroscopy, and include, for each of the optical sensors (1), a fluorescence peak shift and/or a fluorescence peak intensity change, each obtained for one or more predetermined excitation wavelengths.
  14. The method according to any one of claims 1 to 13, wherein the target molecule is glucose and the ssSNs are single-stranded sequences of DNA.
  15. A computer program comprising software code adapted to perform a method of optimizing an optical sensor (1) according to any one of claims 1 to 11 when executed by processing means.

Description

TECHNICAL FIELD The invention relates in general to the field computer-implemented methods of optimizing optical sensors comprising single-walled carbon nanotubes wrapped in single-stranded sequences of nucleotides (e.g., DNA sequences), as well as related methods of engineering optical sensors, and related computer programs. The goal is to identify at least one sequence of nucleotides that shows a measurable response to a target molecule, such as glucose or a cancer biomarker, it being noted that most sequences will typically not respond, especially when dealing with small target molecules. In particular, the invention concerns a method that determines a local optimum in a large space of input DNA sequences, by clustering the sequences, selecting representative sequences in each cluster, and accessing measurements of optical responses of nano-biosensors wrapped in the selected sequences. Next, a top sequence is identified and mutated, to form a superset that serves as an input set of sequences for a next optimization cycle. BACKGROUND Monitoring blood glucose is vital for diabetes management. However, current methods are often inconvenient or imprecise. Nano-biosensors made of single-walled carbon nanotubes (SWCNTs) wrapped in single-stranded sequences of DNA (ssDNA) can potentially be used as optical sensors for monitoring blood glucose, due to their transparency, photostability, sensitivity, and selectivity. SWCNTs, which can be represented as graphene sheets wrapped into cylinders, demonstrate remarkable physical, mechanical, and chemical capabilities, providing an excellent platform for designing nano-biosensors. However, identifying an optimal DNA sequence for glucose responsiveness within the vast space of possible sequences presents a significant challenge. The development of ssDNA-SWCNT sensors faces major challenges, particularly in finding an initial DNA sequence that effectively responds to a target analyte. Researchers lack knowledge about how specific DNA sequences affect sensor properties, making it necessary to screen large DNA libraries, which is time-consuming and inefficient. Techniques like random screening and SELEX can identify sequences with strong binding affinities but often fail to achieve the desired fluorescence response needed for sensor applications Moreover, as the present inventors realized, other target molecules could potentially be sensed using SWCNTs wrapped in single-stranded sequences of nucleotides, beyond glucose molecules. However, a similar problem remains. That is, it is necessary to determine one or more sequences of nucleotides that measurably respond, optically, to a target molecule. Therefore, novel methods are needed to optimize and engineer optical nano-biosensors as described above. The following document forms part of the background art: Lambert, B. P. J. G. Directed evolution of DNA-wrapped single-walled carbon nanotube complexes for optical sensing. PhD thesis (EPFL, Lausanne, 2021). 10.5075/epfl-thesis8406. SUMMARY According to a first aspect, the invention is embodied as a computer-implemented method of optimizing an optical sensor comprising a single-walled carbon nanotube (SWCNT), which is wrapped in a single-stranded sequence of nucleotides (ssSN). The method comprises performing two or more optimization cycles. Each optimization cycle includes the following steps: First, ssSNs of an input set of ssSNs are clustered to obtain clusters. The clustering step may possibly be performed so as to achieve an optimal number of clusters. Representative ssSNs are then selected in each of the clusters obtained.Next, optical response measurements are accessed, which are measurements of optical responses of optical sensors to a target molecule. The optical sensors at issue include SWCNTs wrapped in actual ssSNs, where the latter have been synthetized in accordance with the representative ssSNs selected. Interestingly, use can potentially be made of nano sensor arrays. For example, high throughput screening and measurements of different ssDNA sequences and different chiralities of SWCNT can be performed, as in embodiments described later.A top ssSN is subsequently identified among the representative ssSNs. The top ssSN is an ssSN that led to the highest optical response among all the optical responses for which measurements were accessed. Importantly, however, a step of mutation is performed between two optimization cycles. That is, the method further comprises mutating the top ssSN identified at each optimization cycle (but the last cycle) to obtain mutants and form a superset of sequences including (or consisting of) the top ssSN and the mutants obtained. This mutation is performed prior to performing a next optimization cycle, which will use this superset as a new input set of ssSNs. Note, several top ssSNs could actually be identified during each cycle. As noted in the background section, one challenge is the lack of understanding of how sequences of nucleotide