SwRI Creates Chemistry-Based LLM Named GAMES to Accelerate Drug Discovery

CN1CCN(C)CC1

Image: 

Southwest Research Institute (SwRI) has developed a large language model known as Generative Approaches for Molecular Encodings (GAMES) designed to create Simplified Molecular Input Line Entry System (SMILES) strings, which provide a text-based format for representing chemical molecular structures.

 

view more 

Credit: Southwest Research Institute

SAN ANTONIO — August 14, 2025 — Scientists and engineers at Southwest Research Institute have developed a specialized large language model (LLM) aimed at enhancing drug design and discovery processes.

A multidisciplinary team has created the Generative Approaches for Molecular Encodings (GAMES) LLM to produce Simplified Molecular Input Line Entry System (SMILES) strings, a standard method for representing molecular structures via short text characters for better storage, retrieval, and modeling. This initiative was funded by SwRI’s LAMP program, an internal research effort to advance LLM technologies. The GAMES model has been trained to understand and generate valid new SMILES combinations.

“This project illustrates a systematic approach to building databases and networks of molecules for AI-driven processing and comparison solely through language,” stated Dr. Jonathan Bohmann, Institute Scientist and lead developer of SwRI’s Rhodium™ molecular docking software utilized for virtually screening drug compounds.

Rhodium software employs descriptors along with graphical processing to visualize the chemical properties of compounds. Integrating GAMES into the Rhodium workflow provides a more efficient general method for drug discovery and design.

“With LLMs, we can apply machine learning and AI directly to molecules using SMILES strings, which are easily readable text characters that don’t need an abstract representation translation,” Bohmann explained.

SwRI has trained the GAMES model using various classes of carbon-based molecules and other reference compounds to validate and refine the SMILES strings it produces.
“This project highlights the potential of training LLMs in specialized scientific areas to concentrate on specific goals,” noted SwRI Lead Computer Scientist Michael Hartnett. “In this context, our fine-tuning focuses on extracting the most pertinent knowledge within the drug discovery field.”

GAMES utilizes LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA) techniques to effectively fine-tune LLMs, minimizing the hardware and energy required to operate Rhodium models. The team aspires to apply this methodology to various additional applications and fields within the Institute.

“Generating accurate SMILES with LLMs could revolutionize the drug discovery process, especially when trained on targeted datasets,” remarked SwRI Research Scientist Daniel Hinojosa. “The fine-tuned techniques have led to significant performance enhancements, boosting the number of valid SMILES while decreasing invalid outputs. Structured datasets and specialized training methods were crucial for this success.”

The researchers are optimistic that GAMES will provide a robust framework for ranking compounds identified in chemical libraries according to their drug-likeness— a term that encapsulates the properties that increase the likelihood of a compound securing approval as a safe medication. Furthermore, they intend to systematically explore chemical landscapes through testing. Hinojosa and Bohmann are seeking additional internal funding to propel the next phase of the project.

“Although we are still in the early stages of development, the outcomes are already affecting ongoing research initiatives at SwRI,” Bohmann said.

GAMES has been funded through the SwRI Internal Research and Development Program. In 2024, SwRI allocated over $11 million towards future technologies to expand its knowledge base, enhance its reputation as a leader in science and technology, and promote professional growth among its staff.

For more information, visit: https://www.swri.org/what-we-do/internal-research-development or SwRI’s https://www.swri.org/markets/biomedical-health/pharmaceutical-development/drug-discovery/structure-based-virtual-screening.


Disclaimer: AAAS and EurekAlert! are not liable for the accuracy of news releases posted to EurekAlert! by contributing institutions or for any information used through the EurekAlert system.



Source link

Alex Parker

Alex Parker is a tech enthusiast and digital tools reviewer with over a decade of experience exploring software solutions that boost productivity. He specializes in file management, conversion technologies, and emerging AI-driven applications, helping readers choose the right tools for their needs.