This artificial intelligence (AI) research could make it easy to teach old AI models new tricks without any expensive fine-tuning or retraining sessions.
Artificial intelligence researchers at Google Research and Google DeepMind developed a strategy by which a large language model (LLM) can be enhanced with other language models.
This development addresses one of the largest outstanding issues affecting LLMs by enabling developers to infuse existing models with new features and abilities without needing to start from scratch or engage in expensive retraining and fine-tuning sessions.
Based on a statement by the Google Research team, augmenting an LLM using another language enhances performance at existing tasks and ensures that new tasks that would not be achievable by the models by themselves are tackled.
Teaching Old Chatbots New Tricks
This research was done using Google’s PaLM2-S LLM, a model the search engine firm said is comparable to GPT-4, the AI infrastructure that supports OpenAI’s ChatGPT.
PaLM2-S was tested and benchmarked by itself in the team’s experiments and then after, it was augmented with smaller, specialized language models. The tasks performed included translation, where the enhanced version showed up to 13 % improvement over baseline, and coding.
When tested in coding tasks, the hybrid model showed considerable improvements, according to the paper.
“Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks—on par with fully fine-tuned counterparts.”
Possibly Massive Impacts
On the surface, the highlighted performance gains could have quick implications for the AI industry. The boosted performance in translation tasks, for instance, was greatest when translating language with low support into English. That is still an outstanding issue in machine learning, and Google’s work here can move the needle.
Nonetheless, in the bigger scheme, this vein of research might address the looming Sword of Damocles that hangs over the heads of most tech CEOs in the AI industry: legal issues that may dismantle the very foundation of chatbots like ChatGPT.
Related:How AI and Web3 Innovations Could Dominate in 2024
Copyright vs. Artificial Intelligence
The creators of some of the most popular large language models are named as defendants in many lawsuits that hinge on claims that these AI networks are trained on copyrighted data.
The question legislators and the courts will need to answer is whether a for-profit firm can legally utilize such data to train its language models. Looking at things on the extreme, if the courts rule that developers cannot use such data and that any models trained on copyrighted material need to be purged, it might prove technically impossible or financially infeasible to keep offering the impacted services.
Fundamentally, due to the high costs involved in training large language models, and their dependence on lots of data, products like ChatGPT, as they are developed today, may not be viable in a more-regulated United States AI landscape.
Nevertheless, in case Google’s new LLM augmentation scheme comes up with more development, probably, most of the scaling needs and costs of spinning up an LLM from zero or retraining an existing one might be mitigated.