The digital world keeps advancing at a rapid pace. In that context, Meta confirmed its commitment to open science by publicly releasing LLaMA (Large Language Model Meta AI) on February 24, 2023. LLaMA is a state-of-the-art foundational massive language model ideally designed to enable researchers to advance their work within the subfield of artificial intelligence (AI).
This buzz in tech in the last several weeks has focused majorly on the language models developed and deployed by Big Techs like Google, Microsoft, and OpenAI. However, Facebook’s parent company, Meta, continues doing lots of work in this field and is unveiling a new AI language generator known as LLaMA.
Illustratively, smaller and more effective models in their performance like LLaMA let others in the research space with no access to huge amounts of infrastructure extensively study these models, further democratizing access in the important and fast-advancing sector.
LLaMA is not like Bing or ChatGPT; it is not a model that anybody can talk to. Instead, it is a research tool Meta believes will “democratize access in this important, fast-changing field.” In general: it enables experts to tease out the issues of AI language models, from bias and toxicity to their tendency to just make up different kinds of information.
Training the smaller foundation models like LLaMA is beneficial in the massive language model space since it needs minimal computing power and resources to test new frontiers, authenticate others’ work, and explore new use cases.
Related: We Are Beginning The Age Of AI
Foundation models train on a big set of unlabeled data making them perfect for fine-tuning for different tasks. Meta is making LLaMA available in small sizes (7B, 13B, 33B, and 65B parameters) and sharing a LLaMA model card that explains how the model was built in keeping with the company’s approach to Responsible AI practices.
Over the past 12 months, huge language models – natural language processing (NLP) systems having billions of parameters – highlighted new capabilities to solve mathematical theorems, answer reading comprehension questions, generate creative text, predict protein structures, and a lot more. They are now among the clearest cases of the huge potential benefits that come with AI at scale to billions of people.
Even with all the recent technological advancements in huge language models, full research access to them remains limited due to the resources that are needed to train and operate these large models. This limited access has restricted researchers’ ability to comprehend how and why the large language models operate, prohibiting progress on efforts to boost their massiveness and resolve the known issues, including toxicity, bias, and the possibility of generating misinformation.
The smaller models trained on more tokens – pieces of words – are easy to retrain and fine-tune for particular potential product use cases. Meta trained LLaMA 33B and LLaMA 65B on 1.4 trillion tokens. The smallest model, LLaMA 7B, is trained on one trillion tokens.
Just like the other huge language models, LLaMA functions by taking a sequence of words as input and projecting the next word to recursively generate text. To train the model, developers chose a text from the 20 languages with the most speakers, focusing mainly on those with Cyrillic and Latin alphabets.
For now, there is still a lot of research that has to be done to address the various risks of bias, hallucinations, and toxic comments in massive language models. Just like the other models, LLaMA is also affected by these challenges.
Being a foundation model, LLaMA is ideally designed to be highly versatile and can be applied to a variety of use cases, compared to a fine-tuned model that is designed for a particular task. By sharing the code for LLaMA, the other researchers can more readily test new strategies for limiting or getting rid of these problems in huge language models.
Meta also offered in the paper a set of different evaluations on benchmarks that evaluate model biases and toxicity to show the model’s limitations and further support extra research in the fundamental area.
To maintain the model’s integrity and avoid misuse, Meta said that it is releasing the LLaMA model under a noncommercial license that is primarily focused on research use cases.
Notably, access to the model will get granted on a case-by-case basis to academic researchers; those that are affiliated with organizations in civil society, government, and academia; and industry research laboratories throughout the world. All people who are interested in applying for access can get the link to the application in the Meta research paper.
The company wrote in a post:
“We believe that the entire AI community — academic researchers, civil society, policymakers, and industry — must work together to develop clear guidelines around responsible AI in general and responsible large language models in particular. We look forward to seeing what the community can learn — and eventually build — using LLaMA.”
Related: Is AI the New Electricity? We talk to Futurist Peter Scott about Artificial Intelligence in our Near Future
Meta believes that the whole artificial intelligence (AI) – civil society, industry, academic researchers, and policymakers – needs to work together to develop clear guidelines around responsible artificial intelligence in general and some responsible massive language models specifically.