Fine tuning guide for [Llama2
Fine tuning case studies by companies that provide fine tuning services using their own services
Llama-2 model
Better than GPT-4 in some niche cases
Functional Expression Extraction from Unstructured Text (ViGGO)
SQL generation (SQL-create-context)
7B is sufficient for both.
In particular, Llama-13b improved accuracy from 58% to 98% in function representation, from 42% to 89% in SQL generation, and from 28% to 47% in GSM.
Fine-tuning basics In all three tasks, we use standard fine tuning techniques for all parameters.
Shard data between workers Model sharding with DeepSpeed
special token
Explanation of ViGGO
Effectiveness of Fine Tuning
In an earlier blog post, we discussed the idea that fine tuning is not about facts, but about form.
Some important questions
because far more examples can be incorporated into the weights of the neural network inside the model.
ViGGO revolves around pattern recognition and requires a basic grasp of language and basic concepts, but does not require complex logical reasoning.
evaluation
SQL generation with Llama-2 fine-tuning model
The success of this task depends on the LLM's ability to learn the "structure" of SQL and translate natural language into this structure
result
Arithmetic Reasoning for Elementary School Students (GSM8k)
The fine-tuning task on this data set is different from the previous two. As opposed to simply learning the structure, we wanted to see how well LLM could improve our ability to reason about math problems.
Cut out at GPT-3.5 because it is difficult to verify if the answer is correctly answered when the answer is given in natural sentences.
The chat version has higher performance in 7B and 13B to begin with.
They've decided that 8k data points isn't enough, so they've taken the approach of increasing it even more, and they're saying it's even better.
This page is auto-translated from [/nishio/Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique Applications](https://scrapbox.io/nishio/Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique Applications) using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.