Meta has launched an AI "artifact", how does the open source and free code model Code Llama compare to ChatGPT?

Original source: AGI Innovation Lab

Image source: Generated by Unbounded AI‌

Meta recently released Code Llama, a large-scale language model based on Llama 2 for fine-tuning, which can generate code using text prompts and is open source for research and commercial use.

Code Llama is a state-of-the-art open LLM for coding tasks that has the potential to make current developer workflows faster and more efficient and lower the barrier to entry for those learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust and well-documented software.

How Code Llama works

In July this year, Meta (formerly Facebook) released Llama 2, a free and commercially available open source model. The latest release, Code Llama, is a dedicated version of Llama2 specifically for coding, created by further training Llama 2 on its code-specific dataset, sampling more data from the same dataset for longer periods of time.

Overall, Code Llama has enhanced coding features, built on top of Llama 2. It can generate code and natural language about the code based on code and natural language cues (eg, "write me a function that outputs the Fibonacci sequence.") It can also be used for code completion and debugging.

Code Llama supports many of the most popular languages in use today, including Python, C++, Java, PHP, Typeface (Java), C#, and Bash.

Code Llama currently has three parameter versions: 7 billion parameters, 13 billion parameters, 34 billion parameters.

Each version is trained with 500B code tokens and code-related data. The 7 billion and 13 billion parameter base and instruction models are also trained with Fill-in-the-Intermediate (FIM) capabilities, allowing them to plug code into existing code, meaning they can support tasks like code completion out-of-the-box.

These three models meet different service and latency requirements. For example, 7 billion models can run on a single GPU. The 34 billion model returns the best results and offers better coding assistance, but the smaller 7 billion and 13 billion models are faster and better suited for tasks that require low latency, such as real-time code completion.

The Code Llama model provides stable generation with up to 100,000 context tokens. All models are trained on sequences of 16,000 tokens and show improvement on inputs of up to 100,000 tokens.

In addition to being a prerequisite for generating longer programs, having longer input sequences can unlock exciting new use cases for Code LLM. For example, users can provide models with more context from their codebase to make generations more relevant. It also helps in debugging scenarios in larger code bases, where it can be challenging for developers to grasp all the code related to a specific problem. When developers are faced with debugging large amounts of code, they can pass the entire code length into the model.

Meta also fine-tunes two additional versions of Code Llama: Code Llama - Python and Code Llama - Instruct.

  • Code Llama - Python is a language-specific variant of Code Llama, further fine-tuned on the 100B token of Python code.
  • Code Llama - Instruct is a fine-tuned and aligned version of Code Llama's instructions. Instruction adjustments continue the training process, but with different goals. The model accepts "natural language instructions" as input and expected output. This allows it to better understand what people expect from the prompt. We recommend using the Code Llama - Instruct version when using Code Llama for code generation, because Code Llama - Instruct has been fine-tuned to generate useful and safe answers in natural language.

But it is not recommended to use Code Llama or Code Llama - Python for general natural language tasks because neither model is designed to follow natural language instructions. Code Llama is designed for code-specific tasks and is not suitable as a base model for other tasks.

How does Code Llama perform?

Human and Mostly Basic Python Programming (MBPP) are two commonly used benchmarks for coding proficiency — Human tests a model's ability to complete code from a docstring, and MBPP tests a model's ability to write code from a description. Testing Code Llama against these two benchmarks shows that Code Llama outperforms the open source, code-specific Llama, and outperforms Llama 2 itself. For example, Code Llama 34B scores 53.7% on Human and 56.2% on MBPP, surpassing ChatGPT but still inferior to GPT-4 on Human.

Chart source: Meta

*CodeLlama-34B's fine-tuned model has surpassed GPT-4? *

Although Code Llama did not win the test, this is not all of Code Llama, another highlight is fine-tuning again. Users can fine-tune the open source Code Llama again to build the best version that meets their needs.

Phind recently fine-tuned CodeLlama-34B and CodeLlama-34B-Python based on its own data set, and its fine-tuned versions achieved 67.6% and 69.5% on Human, which surpassed the GPT-4 announced by OpenAI in March 67% of the total.

Related Links:

Actual use comparison ChatGPT vs. Code Llama

First of all, the GPT-3.5 version of ChatGPT and Code Llama supported by the Perplexity platform were used in this comparison. We will ask 8 questions to compare whether the two successfully generate code.

Question 1:

"Using Python. Given two strings word1 and word2. Merge the strings by adding letters in alternating order, starting with word1. If one string is longer than the other, append additional letters to the merged strings. end.

Return the merged string.

Example 1: • Input: word1="abc", word2="pqr" • Output: "apbqcr"

🟢 ChatGPT: +1 for success 🔵 Flame Code:成功+1

Question 2:

"Using Python. Given a string s, just reverse all vowels in the string and return it.

The vowels are "a", "e", "i", "o", and "u", which can occur multiple times in both lowercase and uppercase.

Example 1:

Input: s="hello" Output: "Hall"

🟢 ChatGPT: +1 for success 🔵 Code Llama: Failed +0

Question 3:

"Using Python. Given an integer array nums, shift all 0's to the end of it while maintaining the relative order of the nonzero elements. Note that you have to do this in-place, without making a copy of the array.

Example 1:

Input: nums = [0,1,0,3,12] Output: [1,3,12,0,0]"

🟢 ChatGPT: +1 for success 🔵 Code Llama: Failed +0

Question 4:

"Using Python. You have a long flowerbed where some plots are planted and some are not. However, adjacent plots cannot be planted with flowers. Given an integer array of 0 and 1 for a flower bed, where 0 means empty and 1 means not empty, and an integer n, returns true if n new flowers can be planted in the flower bed without violating the no-adjacent flower rule, Otherwise return false.

Example 1: Input: flowerbed = [1,0,0,0,1], n = 1 Output: true Example 2: Input: flower bed = [1,0,0,0,1], n = 2 output: false

🟢 ChatGPT: +1 for success 🔵 Flame Code:成功+1

Question 5:

"Using Python. Given an input string s, reverse the order of the words.

A word is defined as a sequence of non-whitespace characters. Words in s will be separated by at least one space.

Returns a string of words joined by single spaces in reverse order.

Note that s may contain leading or trailing spaces or multiple spaces between two words. The returned string should have only one space to separate words. Do not include any extra spaces.

Example 1: Input: s = "The sky is blue" Output: "Blue is the sky""

🟢 ChatGPT: +1 for success 🔵 Flame Code:成功+1

Question 6:

"Using Python. Given a string s and an integer k, return the maximum number of vowels in any substring of length k in s. The vowels in English are "a", "e", "i", "o" and "u".

Example 1: Input: s = "leetcode", k = 3 Output: 2 Explanation: "lee", "eet" and "ode" contain 2 vowels.

🟢 ChatGPT: +1 for success 🔵 Flame Code:成功+1

Question 7:

"Using Python. Given a string s that contains asterisks *.

With one operation, you can:

Choose a star in s.

Removes the nearest non-asterisk character to its left, and removes the asterisk itself.

Returns the string after removing all stars.

Example 1: Input: s="leet**cod*e" Output: "lecoe""

🟢 ChatGPT: +1 for success 🔵 Code Llama: Failed +0

Question 8:

"Using Python. Given an array of integer temperatures representing daily temperatures, return an array of answers where answer [i] is the number of days after day i you have to wait for warmer temperatures. If there is no day in the future to do this, keep the answer [i] == 0。

Example 1: Input: Temperature = [73,74,75,71,69,72,76,73] Output: [1,1,4,2,1,1,0,0]"

🟢 Chat GPT: +1 🔵 Code Camel: +1

Final Results:

🟢 ChatGPT: 8/8 🔵 CodeLlama: 5/8

In summary, Code Llama does not show obvious advantages compared with ChatGPT in the actual use effect, but the above tests cannot be completely used as the basis for judgment. Moreover, the open source Code Llama is easier than ChatGPT for users to customize according to their needs, which may bring more possibilities.

References:

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
Trade Crypto Anywhere Anytime
qrCode
Scan to download Gate App
Community
English
  • 简体中文
  • English
  • Tiếng Việt
  • 繁體中文
  • Español
  • Русский
  • Français (Afrique)
  • Português (Portugal)
  • Bahasa Indonesia
  • 日本語
  • بالعربية
  • Українська
  • Português (Brasil)