The Power of Model Merging: Creating State-of-the-Art Language Models on a Budget
Developing massive language models with trillions of parameters, has led to remarkable breakthroughs in natural language processing. But training these behemoth models comes with an equally massive price tag, putting them out of reach for all but the most well-resourced organizations.
However, an emerging technique called model merging is now enabling researchers to create highly capable models by combining the knowledge and abilities of multiple smaller models — at a fraction of the computational cost. By strategically blending specialized models, the resulting unified model can match or even exceed the performance of models orders of magnitude larger.
A prime example comes from a study by Xiaoding Lu et al., which found that blending just three language models in the 6B-13B parameter range could rival the 175B+ parameter GPT-3 in both in-domain and out-of-domain tasks. This blended model achieved its impressive results while being 24 times more computationally efficient in inference speed compared to GPT-3 [1].
But figuring out the optimal way to combine different models is a complex challenge. That’s where evolutionary algorithms come in. Researchers at Sakana AI in Tokyo have pioneered a novel evolutionary approach that can automatically discover the best recipes for merging diverse open-source models.
Evolutionary Optimization Unlocks Powerful Cross-Domain Capabilities
Their method operates in both the parameter space (tuning the weights of individual models) and the data flow space (optimizing the path data takes through the merged model’s layers). Surprisingly, this automated approach can uncover powerful combinations that exceed the performance of any individual model — without requiring extensive additional training data or compute resources [2].
Even more exciting, evolutionary optimization is enabling the creation of models with entirely new cross-domain capabilities by merging models trained on different tasks, languages, and modalities. For example, Sakana AI generated a Japanese language model with strong mathematical reasoning skills by merging a Japanese LLM with an English math model. In another experiment, they created a culturally-aware Japanese vision-language model by combining a Japanese LLM with an English VLM [2].
Remarkably, both of these automatically generated models achieved state-of-the-art results on a variety of Japanese LLM and vision benchmarks, despite not being explicitly trained for those tasks. The Japanese math model even outperformed some 70B parameter Japanese LLMs while using an order of magnitude fewer parameters. These results underscore the potential for model merging to efficiently imbue foundation models with powerful new abilities.
Open-Source Tools Accelerate Adoption of Model Merging
Recognizing the immense potential of model merging, the open-source community has embraced this new paradigm. Thousands of models have already been merged using open-source tools like MergeKit, a library that makes it easy to combine models using various merging algorithms like SLERP, TIES, and DARE. Many of the top performing models on the Open LLM Leaderboard are now merged models created using MergeKit [3].
New techniques are also emerging to further optimize the model merging process. For example, the Model Stock method from Dong-Hwan Jang et al. can approximate an optimal merged model using just a few fine-tuned models by leveraging the geometric properties of the weight space [4]. Fine-tuning merged models with direct preference optimization (DPO) has also proven effective for boosting performance [5].
The Future Belongs to Artful Model Blending
As these automated tools and optimization methods continue to advance, model merging is poised to become an increasingly powerful and accessible approach for building state-of-the-art AI models. By harnessing the collective knowledge embedded in the expanding universe of open-source models, even modestly resourced organizations can now punch well above their weight and achieve cutting-edge results.
While developing massive monolithic language models was once the only path to NLP breakthroughs, model merging has changed the game. Expect to see this technique drive many of the most impactful and novel AI advances in the years ahead, as researchers mix and match models in creative new ways. The age of the trillion-parameter model may be giving way to a future defined by artful and efficient model blending.
References:
- Xiaoding Lu et al. “Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM.” arXiv preprint arXiv:2401.02994 (2024).
- Takuya Akiba et al. “Evolutionary Optimization of Model Merging Recipes.” arXiv preprint arXiv:2403.13187 (2024).
- Charles Goddard et al. “Arcee’s MergeKit: A Toolkit for Merging Large Language Models.” arXiv preprint arXiv:2403.13257 (2024).
- Dong-Hwan Jang et al. “Model Stock: All we need is just a few fine-tuned models.” arXiv preprint arXiv:2403.19522 (2024).
- Wenqi Glantz. “Exploring mergekit for Model Merge, AutoEval for Model Evaluation, and DPO for Model Fine-tuning.” Towards Data Science (2024).