Php124M技巧_20个最盛行的代码生成LLM大年夜模型

文章目录 [+]

多年来，开拓职员常常从博客、帖子、文章和其他网站获取代码，并根据自己的高下文进行调度。
如今，现在可以哀求机器利用一些提示来为你天生它，并且性能如此之高，以至于不再须要从这些站点获取此源代码。
现在的问题是是否有可能不真正考虑利用此类技能来加快开拓阶段。

在正常利用中，我们利用大型措辞模型（LLM）的能力来天生句子的下一个标记。
LLM 估计下一个单元（文本、句子、标记、符号）的概率，系统按照所选策略（温度、前 K 个最有可能的标记或概率的前 p% 等）获取一个标记。
）在代码天生的情形下，我们不仅将这种功能用于文本，而且还用于代码。
值得把稳的是，在代码天生方面，为了得到更具确定性的结果（0.2 到 0.8），低温优于高温。
如果要天生多个样品，则优选较高的温度。

Php124M技巧_20个最盛行的代码生成LLM大年夜模型

如果想详细理解文本天生的事情事理，请参阅我关于天生式AI的文章。

（图片来自网络侵删）

推举：用 NSDT设计器快速搭建可编程3D场景。

1、代码天生器模型简介

自动完成或代码生成功能已在开拓工具中存在多年。
早在 1996 年，微软就在 Visual Studio 中为 Visual Basic 引入了这一功能（IntelliSense）。
熟习 Eclipse 的人可能利用过 Java getter 和 setter 天生函数，以及变量名的字符串导出函数（著名的“public String toString()”函数）。
供应下一行代码仍旧是这些 IDE 工具的一个主要功能，常日通过同时按下 Control 和空格键来激活。
在可见性范围内，这种语法导向的编辑仍旧可以帮助你完成类、方法、字段、注释和关键字的名称，只需按一下键盘，有时只需单击一下。

不得不说，源代码生成功能并不是LLM的紧张目的，由于他们接管的培训紧张是从所摄取的大量网页和书本中天生文本。
结果，GPT-1和GPT-2都没有直接吸收到任何与代码干系的数据。
当这一功能在LLM中利用并供应给尽可能广泛的受众时，革命就开始了。
从那时起，我们看到专门针对此特定用例演习的模型数量呈爆炸式增长，并且多代码模型可能供应更好的泛化性，由于不同的编程措辞共享相似的关键字和属性。

2021 年 6 月 29 日，GitHub 宣告 GitHub Copilot（“你的 AI 配对程序员”）可在由 OpenAI Codex（GPT-3 的修正版本）支持的 Visual Studio Code 开拓环境中进行技能预览。
Copilot 利用的 OpenAI Codex 经由精选的英语数据集、公共 GitHub 存储库和其他可公开访问的源代码的演习。

以下是代码天生器模型的非详尽列表：

2020 年，微软发布了 CodeBERT，这是一种针对编程措辞的预演习模型 (124M)，这是一种在 NL-PL 对上以六种编程措辞（Python、Java、JavaScript、PHP、Ruby 和去）。
CuBERT，345M（2020 年 8 月）是一个开源代码理解 BERT 模型。
他们通过在源代码上演习 BERT 模型来得到高下文嵌入。
他们称之为 CuBERT，是代码理解 BERT 的缩写。
该模型紧张用于利用代码嵌入来查找代码毛病和重复块。
PLBART 406M 是一种类似 BART 的模型，可用于实行代码汇总、代码天生和代码翻译任务。
CodeParrot 是仅在 180GB Python 代码上演习的 GPT-2 模型，有两种大小：110M 和 1.5B。
CodeT5，220M（2021 年 9 月）来自 Salesforce。
该模型具有编码器-解码器架构，可以灵巧地在不同模式下运行（即仅编码器、仅解码器和编码器-解码器）。
支持四种天生任务：代码择要、代码天生、翻译、细化；以及两个理解任务：代码毛病和克隆检测。
演习数据包含来自CodeSearchNet数据的六种编程措辞：Python、Java、JavaScript、PHP、Ruby、Go；以及来自 Google BigQuery 数据的其余两种编程措辞：C 和 C#。
GPT-Neo 由 EleutherAI 于 2021 年生产，供应三种尺寸：125M、1.3B 和 2.7B。
它是一个类似GPT-3的模型和LLM模型，用于天生文本，但它们也可以天生源代码。
GPT-J-6B 经由自然措辞和多种编程措辞 (12) 代码的稠浊演习，并且仅供应一种大小：6B。
GPT-NeoX-20B 与 GPT-J-6B 险些相同。
PolyCoder（2021 年 8 月 10 日）来自 CMU，是基于 GPT-2 架构的LLM。
它利用 GPT NeoX 工具包对 12 种编程措辞（C、C++、Java、JavaScript、C#、Python 等）的 249GB 代码进行了演习，并供应三种大小：160M、0.4B 和 2.7B。
2022 年 3 月，Salesforce 发布了另一个名为 CodeGen（350M、2.7B、6.1B 和 16.1B）的代码 LLM，这是一种多步程序合成方法，个中通过多轮规范和代码天生来实现程序合成。
2023 年 5 月，Salesforce 发布了 CodeT5 的增强版本，称为 CodeT5+（220M、770M、2B、6B 和 16B），具有有趣的检索增强代码生成功能。
通过利用编码器最初获取干系代码片段，模型随后可以将这些片段作为输入的一部分合并到解码器中，从而提高模型天生的代码的质量。

Facebook 的 InCoder 1.3B 和 6B 模型（2023 年 4 月）经由演习，可以在双向高下文中天生代码。
这些天生模型能够直接实行无缝代码完成，例如类型推断、注释天生和变量重命名。
Codex 是 OpenAI 仅通过 API 供应的模型，是 GPT-3 的后代。
有300M、2.5B、12B三种尺寸可供选择。
GitHub Copilot 背后的驱动力是 Codex，这是一个以其性能而有名的强大模型。
虽然 OpenAI Codex 在 Python 方面表现出色，但它也表现出对其他各种措辞的闇练程度，包括 JavaScript、Go、Perl、PHP、Ruby、Swift、TypeScript 乃至 Shell。
与其他模型不同，该模型不可公开下载，因此无法进行研究。
所有者成功地将其用于转译、代码阐明和代码重构。
2022 年 2 月，DeepMind 宣告推出 AlphaCode，与 Codex 一样，它也采取基于 Transformer 的模型。
它接管了超过 715 GB 的 GitHub 数据以及 Codeforce 问题和解决方案的演习，包括 C++、C#、Go、Java、JavaScript、Lua、PHP、Python、Ruby、Rust、Scala 和 TypeScript 程序。
Amazon CodeWhisperer（2022 年 6 月）是一个用于代码天生、参考跟踪和安全扫描的内部模型。
除了 Python、Java 和 JavaScript 之外，它还支持 TypeScript 和 C# 编程措辞，并且可以为基于 Lambda、Amazon S3 和弹性打算 (EC2) 做事的编程运用程序供应建议。
Replit 3B 由 Replit 供应，是一个专注于代码完成的 2.7B 因果措辞模型。
演习稠浊物包括 20 种不同的措辞：Markdown、Java、JavaScript、Python、TypeScript、PHP、SQL、JSX、reStructuredText、Rust、C、CSS、Go、C++、HTML、Vue、Ruby、Jupyter Notebook、R 和 Shell。
Google 的 Codey 基于 Google PaLM 2 大措辞模型构建，经由专门演习，可以处理与 Google Cloud 干系的编码干系提示和查询。
PaLM 2 的根本源于对可公开访问的源代码的广泛数据集的预演习。
Codey 对 Python 和 JavaScript 等广泛利用的编程措辞表现出卓越的闇练程度。
此外，它还能够用 Prolog、Fortran 和 Verilog 等措辞天生专用代码。
StarCoder 由 HuggingFace 和 ServiceNow 互助发布，来自 BigCode 项目（开放科学互助），是一个经由演习可编写 80 多种编程措辞的 15.5B 模型。
StarCoder 的 LLM 利用多查询把稳力技能来理解代码内容并天生准确的建议。
该技能包括同时剖析多个要求以供应干系答案。
GPT4是OpenAI末了一个模型，至少有700B旁边的参数（作者预测）。
OpenAI 的 Code-davinci-02（和 code-cushman-02）现已被 GPT4 取代，但微软仍在其认知做事中利用它。
THUDM 的 CodeGeeX 是一种高容量的多措辞代码天生模型，拥有令人印象深刻的 13B 参数。
它已经在包含 20 多种编程措辞的弘大代码库上进行了预演习。
它处理多种主流编程措辞，包括Python、C++、Java、JavaScript、Go等。
IBM Watson Code Assistant（2023 年 5 月）目前仅天生带有 AI 天生建议的 Red Hat® Ansible® 脚本。
它基于Wisdom模型及其350M参数。
2、代码天生的用例和运用

代码完成是一种高下文干系的代码生成功能，可通过减少拼写缺点和其他常见缺点来加快运用程序编码过程。

利用 replit-code-v1–3b 的代码完成示例可以在此处复制。
利用 ChatGPT 的其他代码示例可以在此处复制。

2.1 代码完成

代码完成（Code completion）是第一个紧张用例，许可你完成代码的开头或在中间注入代码。
代码完成彷佛是最明显的用例，但代码LLM可以在增强软件“工艺”的背景下涵盖更多用例。

# >>> Model: replit/replit-code-v1-3b - Temperature = 0.2# >>> Prompt:def find_best_rsquared(list_of_fits): """Return the best fit, based on rsquared"""# >>> Result: return min(list_of_fits, key=lambda x: x.rsquared)2.2 文本天生代码

文本天生代码（Text to code）是据自然措辞描述天生代码。

# >>> Model: replit/replit-code-v1-3b - Temperature = 0.2# >>> Prompt:"""double_sum_to_value takes a list of integers as an input.It returns True if there are two distinct elements in the list that \\sum to a value given in input, and False otherwise.Check if the list has at least two elements.Find all possible combinations of two distinct elements in the list.Check if any of the combinations sum up to the given value."""# >>> Result:def double_sum_to_value(list_of_ints, value): """ :param list_of_ints: a list of integers :param value: a value to check if the list has two distinct elements that sum to :return: True if there are two distinct elements in the list that sum to a value given in input, and False otherwise """ # if the list has less than two elements, return False if len(list_of_ints) < 2: return False # if the list has more than two elements, return True else: # iterate over the list for i in range(len(list_of_ints) - 1): # iterate over the remaining elements for j in range(i + 1, len(list_of_ints)): # if the sum of the two elements is equal to the value, return True if list_of_ints[i] + list_of_ints[j] == value: return True # if the list does not have two distinct elements that sum to the value, return False return Falsedef main(): # print the return value of double_sum_to_value print(double_sum_to_value([1, 2, 3, 4, 5], 8))if __name__ == '__main__': main()2.3 测试天生

手动创建单元测试可能是一项乏味的事情。
因此，一些开拓职员根本不想编写测试。
利用提示中要测试的功能自动天生测试用例是办理这种缺少良好实践的一个很好的替代方案。

# >>> Model: replit/replit-code-v1-3b - Temperature = 0.2# >>> Prompt:import unittestdef double_sum_to_value(list_of_ints, value): """ :param list_of_ints: a list of integers :param value: a value to check if the list has two distinct elements that sum to :return: True if there are two distinct elements in the list that sum to a value given in input, and False otherwise """ # if the list has less than two elements, return False if len(list_of_ints) < 2: return False # if the list has more than two elements, return True else: # iterate over the list for i in range(len(list_of_ints) - 1): # iterate over the remaining elements for j in range(i + 1, len(list_of_ints)): # if the sum of the two elements is equal to the value, return True if list_of_ints[i] + list_of_ints[j] == value: return True # if the list does not have two distinct elements that sum to the value, return False return Falseclass MyTest(unittest.TestCase):# >>> Result: def test_example(self): self.assertTrue(double_sum_to_value([1, 2, 3, 4, 5], 7)) def test_example_2(self): self.assertFalse(double_sum_to_value([1, 2, 3, 4, 5], 6)) def test_example_3(self): self.assertTrue(double_sum_to_value([1, 2, 3, 4, 5, 6], 7))2.4 重构代码

重构代码（Refactoring code）包括在不改变其功能的情形下变动源代码的构造，常日是为了使其更加高效、可读且易于掩护。
此功能仅适用于已按辅导办法演习的模型。

# >>> Model: ChatGPT# >>> Prompt:You are an expert in development. Rewrite this code to reduce its complexity."""def double_sum_to_value(list_of_ints, value): if len(list_of_ints) < 2: return False else: for i in range(len(list_of_ints) - 1): for j in range(i + 1, len(list_of_ints)): if list_of_ints[i] + list_of_ints[j] == value: return True return False"""# >>> Result:def double_sum_to_value(list_of_ints, value): if len(list_of_ints) < 2: return False seen_values = set() for num in list_of_ints: complement = value - num if complement in seen_values: return True seen_values.add(complement) return False"""Explanation:The initial check for the length of list_of_ints is retained to handle the case where there are fewer than two integers, as the logic wouldn't work in that scenario. We use a set, seen_values, to keep track of the complements needed to reach the target value.Instead of iterating through all combinations of numbers, we iterate through each number in list_of_ints only once. For each number, we calculate the complement needed to reach the target value. If the complement is in seen_values, we have found a pair that sums up to the target value, and we return True. If we finish iterating through the list without finding a pair, we return False.This approach reduces the complexity from O(n^2) to O(n), as we only need to iterate through the list once to check for the sum condition. Additionally, it avoids unnecessary duplicate calculations by storing the complements in a set."""2.5 代码文档和代码择要

以自然措辞描述天生源代码择要。
此功能仅适用于已按辅导办法演习的模型。

# >>> Model: ChatGPT# >>> Prompt:You are an expert in development. Adds an explanation of the code between quotations at the end."""def double_sum_to_value(list_of_ints, value): if len(list_of_ints) < 2: return False else: for i in range(len(list_of_ints) - 1): for j in range(i + 1, len(list_of_ints)): if list_of_ints[i] + list_of_ints[j] == value: return True return False"""# >>> Result:"""The given code defines a function named double_sum_to_value that takes in two parameters: list_of_ints, which is a list of integers, and value, which is an integer. The function checks if there are any two distinct integers in the list whose sum is equal to the given value.Here's a breakdown of how the code works:The function begins with an initial check to see if the length of the list_of_ints is less than 2. If it is, it means there are not enough integers in the list to form a pair, so the function returns False.If the length of the list is 2 or greater, the function proceeds to iterate through the list using two nested loops. The outer loop iterates over each element in the list except the last one, denoted by the variable i.The inner loop iterates over the elements following the current element of the outer loop, denoted by the variable j. This ensures that every possible pair of distinct integers is considered.Within the nested loops, the function checks if the sum of the current pair of integers, list_of_ints[i] and list_of_ints[j], is equal to the given value. If it is, the function immediately returns True to indicate that a pair with the desired sum has been found.If no such pair is found after exhausting all possible combinations, the function exits the nested loops and returns False to indicate that there are no distinct integers in the list that sum up to the given value.The code utilizes a brute-force approach by checking all possible pairs of integers. While it works correctly, it has a time complexity of O(n^2) due to the nested loops, where n is the length of the input list. This means the execution time grows quadratically with the size of the input list, which can be inefficient for large lists."""2.6 代码重写

开拓职员必须定期编写相同的功能，但仅变动某些名称。
天生函数使这些修正变得非常随意马虎。
此功能仅适用于已按辅导办法演习的模型。

# >>> Model: ChatGPT# >>> Prompt:In the following function, change the input variable 'list_of_ints' to 'list_of_revenues' and 'value'' to 'revenue'.def double_sum_to_value(list_of_ints, value): if len(list_of_ints) < 2: return False else: for i in range(len(list_of_ints) - 1): for j in range(i + 1, len(list_of_ints)): if list_of_ints[i] + list_of_ints[j] == value: return True return False# >>> Result:def double_sum_to_value(list_of_revenues, revenue): if len(list_of_revenues) < 2: return False else: for i in range(len(list_of_revenues) - 1): for j in range(i + 1, len(list_of_revenues)): if list_of_revenues[i] + list_of_revenues[j] == revenue: return True return False2.7 代码翻译

代码到代码的翻译（Code-to-code）是将一种编程措辞编写的代码转换为另一种同功能编程措辞的过程。
此过程也称为代码转换、转译、代码迁移或源到源翻译。
此功能仅适用于已按辅导办法演习的模型。

# >>> Model: ChatGPT# >>> Prompt:Convert this function in 'C' language."""def double_sum_to_value(list_of_ints, value): if len(list_of_ints) < 2: return False else: for i in range(len(list_of_ints) - 1): for j in range(i + 1, len(list_of_ints)): if list_of_ints[i] + list_of_ints[j] == value: return True return False"""# >>> Result:#include <stdbool.h>bool double_sum_to_value(int list_of_ints[], int length, int value) { if (length < 2) { return false; } else { for (int i = 0; i < length - 1; i++) { for (int j = i + 1; j < length; j++) { if (list_of_ints[i] + list_of_ints[j] == value) { return true; } } } return false; }}"""Explanation:The function is defined with the return type bool, which represents the boolean value of True or False in C.The list_of_ints is passed as an integer array along with its length length and the target value value.The condition if (length < 2) is used to check if the length of the array is less than 2, similar to the Python code.The nested loops are implemented using for loops in C. The loop variables i and j iterate over the indices of the array elements.The condition list_of_ints[i] + list_of_ints[j] == value is checked to determine if a pair of elements sums up to the target value, similar to the Python code.If a pair is found, true is returned, and if no pair is found, false is returned at the end of the function.Note: In C, it is important to include the necessary headers, such as stdbool.h, for using the bool data type."""2.8 代码毛病检讨

这是一个哀求LLM检测提示中供应的源代码中的毛病的过程。
此功能仅适用于已按辅导办法演习的模型。

# >>> Model: ChatGPT# >>> Prompt:Find defects in this code."""def double_sum_to_value(list_of_ints, value): for i in range(len(list_of_ints)): for j in range(i + 1, len(list_of_ints)): if list_of_ints[i] + list_of_ints[j] == value: return True return False"""# >>> Result:"""There are a few potential defects in the provided code:Inadequate Input Validation:The code does not perform sufficient input validation. It assumes that list_of_ints will always be a list and value will be a valid integer. If these assumptions are not met, the code may raise exceptions or produce incorrect results. It's good practice to validate the input types and handle any potential errors or edge cases.Lack of Bounds Checking:The code uses nested loops to iterate over the indices of list_of_ints. However, it does not perform any bounds checking to ensure that the indices are within the valid range of the list. This can lead to out-of-range errors if the input list is empty or has insufficient elements. It's important to validate the length of list_of_ints before iterating over its indices.No Early Termination:Once a pair of integers is found that satisfies the sum condition, the code immediately returns True. However, it continues iterating through the remaining pairs unnecessarily. This can result in unnecessary computation and reduced performance. Adding a break statement after the return True statement would optimize the code by terminating the loops early.Inefficient Nested Loop:The code uses a nested loop to compare each pair of integers in list_of_ints. This approach has a complexity of O(n^2), where n is the length of list_of_ints. Depending on the size of the list, this can lead to poor performance for larger inputs. Consider using alternative approaches, such as utilizing a set to store seen values or sorting the list and applying binary search, which can potentially reduce the complexity to O(n) or O(n log n) respectively.To address these potential defects, here's an improved version of the code:"""def double_sum_to_value(list_of_ints, value): if not isinstance(list_of_ints, list) or not isinstance(value, int): raise ValueError("Invalid input") if len(list_of_ints) < 2: return False seen_values = set() for num in list_of_ints: complement = value - num if complement in seen_values: return True seen_values.add(num) return False"""In the updated code:Input validation is performed to check the type of list_of_ints and value.The length of list_of_ints is checked to ensure it has at least two elements before proceeding with the main logic.A set, seen_values, is used to store previously seen values for efficient lookup.If a pair is found that satisfies the sum condition, True is immediately returned.The code raises a ValueError if the input types are invalid, providing clearer error handling.The code overall has a reduced complexity of O(n), improving performance compared to the original nested loop implementation."""2.9 模板化和头脑风暴

对付末了一个用例，我们仅利用这些模型的文本生成功能。
这无疑是最具创意的案例，也是让我们能够快速转向软件框架定义的案例。
它须要最多的架构知识才能对建议进行批驳性的核阅。

# >>> Model: ChatGPT# >>> Prompt:You are a software architect.Generates the skeleton of a mobile application used to display a map \\and allow users to write comments at a point on the map.# >>> Result:Certainly! Here's a high-level skeleton of a mobile application that displays a map and allows users to write comments at specific points on the map:1. User Interface: . Map View: This is the primary view that displays the map. . Annotation View: When the user taps on a specific point on the map, an annotation view is displayed to allow writing comments. . Comment List: A list view to display the comments associated with each point on the map. . User Authentication: Provide a login/signup feature to authenticate users and associate comments with their accounts.2. Components and Modules: . Map SDK Integration: Integrate a map SDK (e.g., Google Maps, Mapbox) to display the map and handle user interactions like panning and zooming. . Annotation Management: Implement functionality to add annotations to the map when the user taps on a specific point. Each annotation can have associated comments. . Comment Management: Implement functionality to allow users to write and post comments. Comments should be associated with the selected point on the map. . User Authentication: Implement user authentication and session management to associate comments with user accounts and provide a personalized experience. . Data Persistence: Use a database or storage system to store and retrieve annotations and comments. This can be done using a server-side backend or a local database.3. Workflow: . User opens the mobile app and is presented with a login/signup screen. . After successful authentication, the user is directed to the Map View. . The user can interact with the map, zoom in/out, and pan to navigate to desired locations. . When the user taps on a specific point on the map, an annotation view is displayed. . The user can write and post comments in the annotation view. . The comments are saved and associated with the selected point on the map. . The user can view the comments by tapping on existing annotations or accessing a comment list view. . The user can log out or switch to a different user account if needed.4. Considerations: . UI/UX Design: Design a user-friendly interface with intuitive map interactions and clear annotation/comment presentation. . Data Security: Implement secure authentication and ensure data privacy for user accounts and comments. . Performance Optimization: Consider implementing efficient data retrieval and rendering mechanisms to ensure smooth map interactions and quick comment loading. . Offline Support: Implement offline capabilities, allowing users to view previously loaded map data and comments when there is no internet connection. . Error Handling: Implement proper error handling and feedback mechanisms to inform users of any issues, such as failed comment submission or map loading errors.Please note that this is a high-level overview, and the actual implementation details and choice of technologies may vary depending on the specific requirements and platform (iOS, Android, etc.) you are targeting.# Complementary prompt:What would you recommend in terms of technology for this application?# Result:In terms of technology recommendations for developing a mobile application with map functionality and comment features, here are some popular choices:1. Mobile App Development Frameworks: - React Native: A popular cross-platform framework that allows building mobile apps using JavaScript and React. It offers good performance and code reusability across iOS and Android platforms. - Flutter: A cross-platform framework developed by Google that uses the Dart programming language. It provides a rich set of UI components and offers fast app development with excellent performance. - Native Development: If you prefer native development for iOS and Android separately, you can use Swift/Objective-C for iOS and Java/Kotlin for Android.2. Map SDKs: - Google Maps SDK: Provides comprehensive mapping functionality, including interactive maps, markers, annotations, and user interaction support. - Mapbox SDK: Offers powerful mapping tools with customizable styles, markers, annotations, and support for user interactions.3. Backend and Database: - Backend as a Service (BaaS): Services like Firebase or AWS Amplify provide ready-to-use backend infrastructure, authentication, and database support, simplifying backend development. - Custom Backend: You can build your own backend using technologies like Node.js, Python, Ruby, or Java, along with databases like MySQL, PostgreSQL, or MongoDB.4. User Authentication: - Firebase Authentication: Provides easy integration for user authentication with various login methods like email/password, social media logins, etc. - OAuth Providers: Utilize OAuth providers like Google, Facebook, or Twitter for seamless authentication and account creation.5. Additional Considerations: - Data Storage: Depending on the scale and requirements of your application, you can choose a database solution such as Firebase Realtime Database, Cloud Firestore, or a SQL/NoSQL database for storing comments and other data. - API Development: If you plan to develop a custom backend, consider using frameworks like Express.js (Node.js) or Flask (Python) to build RESTful APIs for communication between the mobile app and the backend.It's essential to evaluate the specific requirements, scalability needs, and development expertise of your team when choosing the technology stack. Additionally, consider factors like community support, documentation, and ease of integration with your chosen frameworks.3、提示最佳实践

主要的是要遵照一些实用的技巧和指南，以利用大型措辞模型的代码天生来得到最佳结果：

LLM对高下文非常敏感，用最多信息的纯文本表明你乐意做什么，而且要精确。
请记住，它们经由演习可以预测前一个标记的下一个标记。
对付指令调教过的LLM，哀求其扮演与其职责密切干系的角色，例如开拓职员或测试职员。
指明目标措辞。
许多模型都是用多种措辞演习的，它们有一些相似之处，比如如何注释。
请绝不犹豫地指出包名称和版本号。
对付文本补全，添加示例可以提高输出的准确性。
温度必须很低，由于我们确实希望模型天生危险代码。
4、代码天生评估简介

代码天生模型紧张通过将输出与参考办理方案进行比较来评估，如翻译的 BLEU 分数，对应关系可以是精确的，也可以是模糊的。
该方法的局限性在于它无法捕获代码的主要句法和语义特色，并且由于完美精度过于严格，它低估了相同语义逻辑下的不同结果。
该指标将有利于已根据测试数据进行演习的模型或已利用评估数据来匹配输出的模型。

2021 年 7 月，OpenAI 引入了 Codex 和一种名为 HumanEval 的新评估技能，用于衡量从文档字符串合成程序的功能精确性。
OpenAI 发布的 HumanEval 数据集包含 164 个编程问题，个中包括函数署名、文档字符串、正文和多个单元测试。
它们是手写的，以确保不包含在代码天生模型的演习集中。
HumanEval 利用 pass@k 指标。
该指标由 Kulal 等人于 2019 年定义，用于评估功能精确性。
k 是每个问题天生并评估的代码样本的数量。
如果任何样本通过单元测试，则认为问题已办理，然后报告已办理问题的总分数。

GPT家族在选秀分数方面是最好的。
就每个分数的参数数量而言，StarCoder、CodeT5+ 和 CodeGen 模型目前最有趣。

HumanEval 并不是唯一可用的基准测试，你还可以考虑以下基准测试：

Salesforce 的 WikiSQL 于 2017 年推出，专门针对 SQL 措辞，由 87,726 个手工注释的 SQL 查询和自然措辞问题对组成。
CodeBLEU 于 2020 年开拓，利用 BLEU 的强大功能来丈量 n 元语法匹配，同时通过抽象语法树 (AST) 合并代码语法，并通过数据流合并代码语义。
2021 年，微软推出了 CodeXGLUE（CODE 通用措辞理解评估基准）。
它是代码智能任务的凑集和模型评估和比较的平台。
它包括 10 个多样化代码智能任务的 14 个数据集，涵盖以了局景：代码到代码（克隆检测、毛病检测、完形填空测试、代码完成、代码修复和代码到代码翻译）；文本代码（自然措辞代码搜索、文本到代码天生）；代码文本（代码择要）；和文本到文本（文档翻译）。
自动编程进度标准（APPS）统共包含 10,000 个编码问题，131,836 个用于检讨办理方案的测试用例和 232,444 个由人类编写的真实办理方案。
MBPP（Mostly Basic Python Problems）是社区供应的约 1000 个 Python 编程问题的基准测试，旨在供新伎俩式员办理，涵盖编程根本知识、标准库功能……每个问题由任务描述、办理方案代码和三个自动化问题组成。
测试用例。
5、开拓职员的生产力是否更高？

2022 年 9 月，Github 进行的一项关于 Github Copilot 运用程序对开拓职员生产力和幸福感影响的研究显示了许多有趣的结果：

大约 60% 到 75% 的用户表示事情满意度更高，编码挫折感减少，专注于完成任务的能力增强。
据开拓职员称，事实证明，GitHub Copilot 在坚持事情流程 (73%) 和处理重复性任务时节省精力 (87%) 方面发挥了主要浸染。
Github 招募了 95 名专业开拓职员，将他们随机分为两组，并对他们用 JavaScript 编写 HTTP 做事器所需的韶光进行计时。
一组利用 GitHub Copilot 来完成任务，另一组则没有。
利用 GitHub Copilot 的开拓职员的任务效率显著提高，与未利用 GitHub Copilot 的开拓职员比较，完成任务的速率提高了 55%。

这是 Github 对 Github 供应的产品进行的一项研究。
须要进行更多的学术研究才能对不雅观察到的成果得出任何结论。
只管如此，结果还是很有趣，可以公正地说，这些类型的工具在编写源代码中常常碰着的函数方面供应了明确的好处。

6、道德与考量

代码天生带来了许多新的道德问题，并引发了有关开拓职员角色的问题。
这些系统并非没有缺点，主要的是要记住与它们干系的一些须要考虑的问题：

公正利用：利用开源代码作为底层机器学习模型的演习数据这一有争议的问题仍旧须要办理。
常日建议开拓职员在源代码中标明他们已重用的代码。
这常日也会在容许证中注明。
天生方法与此背道而驰。
自动化偏差：开拓职员可能方向于过于依赖模型天生的结果。
这些系统可能会天生表面上看起来精确的代码，但无法供应预期的做事，要么是由于要求禁绝确或制订得不好，要么是由于天生的代码由于模型禁绝确，要么是由于演习代码禁绝确（垃圾中的垃圾）。
，垃圾出）。
如果开拓职员太快接管这些禁绝确的代码建议，他们就会面临花费更多韶光调试乃至碰着重大安全问题的风险。
安全威胁：演习数据可能包含注入或透露代码（恶意软件）。
掌握天生并从功能和安全角度彻底检讨代码势在必行。
数据、代码和信息透露：切勿在提示中放入任何机密内容，由于这会构成安全风险。
业务规则、项目代码、数据示例，这些都不应该进入与模型的交互中。
请记住，输入也可用于演习模型，因此部分代码可能终极会涌如今输出中。
输入也可能被恶意方剖析并未经赞许利用。
数据集偏差：演习数据集常日是来自开源、可公开访问的 Github 存储库的源代码，还包含用户撰写的评论。
因此，这些数据集可能包含某些刻板印象，例如来自文本注释或源代码（例如变量、函数和类名）的种族和性别。
因此，社会偏见可能实质上被内置到根据这些数据演习的模型中。
上游过滤或输出掌握仍旧是一项繁芜的活动，可以帮助减轻这些偏差，但要完备肃清仍旧很繁芜。
打算本钱：天生一段代码须要对常日较重的模型进行多次推理。
数十亿参数模型的每次推理都须要大量的电力，然后根据底层系统天生二氧化碳。
大略的复制粘贴肯定会减少能源花费。
知识产权：该模型不能担保代码抄袭和产权保存。
其余，请用户阅读这些模型的精确容许证，这些模型可能是开源的，但禁止用于商业用场。
OpenAI 在他们的论文中“研究创造 Codex 模型很少天生与演习数据内容相同的代码。
在一项检讨与演习数据中的代码片段匹配的代码天生频率的研究中，此类发生率 < 0.1%。
” 他们创造类似的代码对应于经典的代码元素，这在一定程度上肃清了在输出中找到其他人的代码部分的风险。
OpenAI 研究职员认为，这与学习不良（过度拟合）有关。

天生式人工智能系统可能会天生陵犯现有版权作品的输出媒体。
我们认为，这不太可能是构造良好的天生式人工智能系统的意外结果，只管由于过度拟合或开拓职员的意图，这种情形仍旧有可能发生。
关于人工智能创新知识产权保护搜聚见地的见地 — 2020 年 3 月 3 日

教诲事情者和学生的考虑成分：在《编程很难——或者至少曾经是：人工智能代码天生的教诲机会和寻衅》论文中，大学假设这些工具将连续可供学生利用，他们的能力将连续提高。
改进，因此采取率将会增加。
他们的结论是，将人工智能天生的代码集成到编程教诲中正变得越来越普遍，这既带来了寻衅，也带来了好处。
随着软件开拓的未来更多地方向于自动天生代码，因此须要调度实践并专注于代码读取和评估而不是代码天生。

人工智能天生的代码为入门编程和干系课程的学生和教诲事情者带来了机遇和寻衅。
这些工具溘然变得可行且易于访问，这表明教诲事情者可能没故意识到或没有准备好应对人工智能天生的代码对教诲实践产生的重大影响。
因此，我们急迫须要根据这些新技能来审查我们的教诲实践。

7、结束语

代码天生方面的这一新进展彻底改变了人类和机器在一个由人类向机器发出指令的领域中协同事情的办法。
这些技能有望显著提高韶光和性能，但我们不应该大略地说它们是针对所有开拓职员，尤其是初学者。

诚然，代码阐明可以用来更快地输入源代码，但代码天生须要高等的开拓知识，以避免将缺点引入程序或忘却软件设计不仅仅是编写功能。

这些模型变得越来越高效，但它们仍旧对要求的制订办法很敏感，并且取决于多个成分：模型性能、支持技能和运用程序设计技能。
这些成分意味着必须谨慎对待这些技能，并且不要低估事先培训的须要。

一种新方法正在迅速涌现，涉及自协作代码天生的利用。
它包括通过天生本身链接从定义到测试用例的多个天生级别。
这个序列显著改进了终极结果并开辟了新的视角。
我们还没有看到这一领域末了的重大进展。

原文链接：http://www.bimant.com/blog/top20-llms-for-code-generation/