Introduction

As Large Language Models (LLMs) are getting more popular than ever these days, there is a rising trend among developers to use Large Language Models to assist their daily code writing. Famous products include GitHub Copilot or simply ChatGPT. However, just like those codes written by human developers, code generated by LLMs can sometimes have a few problems as well, ranging from its functional correctness to security.

In this post, we introduce some of the latest efforts made by researchers to improve your code-writing experience when assisted by LLMs.

Language-Oriented Code Sketching

When you use LLMs for writing code, the first thing you might want to consider is how to craft your prompt, that is, how to tell LLMs what exactly you need out of the generated code. This is, unfortunately, not an easy task because users often need to mentally imagine possible outcomes until the code is eventually generated.

Recently, researchers observed that users’ prompts often include phrases directly referencing code elements, which are embedded within narratives that can somehow reflect the relationships among these elements. For example, consider the following prompt:

“Define a PyTorch dataset with a loader function that reads Json”

Here, PyTorch dataset and loader function refer to a class and a function, respectively. Their linguistic relationship indicates that the loader function should be a component inside the PyTorch dataset class. This linguistic structure can help to derive an incomplete code outline named Code Sketch that offers a preview of the intended code structure from the prompt. It also can serve as a foundational layer for LLMs to further refine and complete.

Left: Original Prompt and derived Code Sketch. Right: Original Prompt and Code Sketch completed by LLMs. (Chen Zhu-Tian et al., 2024) Link: https://arxiv.org/abs/2405.03998

Based on this observation, an interactive approach, named Language-Oriented Code Sketching, that can incrementally derive a code sketch as the user types a prompt was proposed (which works pretty much like an AI extension on a code editor or IDE), It includes three key steps:

Map: As the user types, the system maps the current phrase of the prompt to a list of potential code elements.
Assemble: For each element, the system assembles it with the existing code that was generated using the previous phrases within the prompt by comparing the linguistic relationships between the current phrase and previous phrases against a predefined rule set. Valid assemblies are presented as suggestions and previewed for the user in the code editor upon selection.
Preserve: Once the user accepts a suggestion, the system inserts the corresponding code elements into the code editor and completes the typing phrase accordingly. The association between the phrase and the code element is preserved.

Extensive experiments show that the derived code sketch not only provides instant, incremental feedback to the user but also can guide the subsequent code generation process, opening up exciting opportunities for future research.

Codexity

Now that one can effectively get code responses from LLMs, the next thing to worry about is probably how secure the generated code is and whether it contains some vulnerabilities that can be easily manipulated by hackers. In fact, researchers have already expressed concerns that LLMs can introduce security vulnerabilities into the auto-generated code that developers could overlook.

With the aim of tackling the above challenge, the first security-focused code generation framework, Codexity, was proposed recently. Codexity integrates LLMs with static analysers to establish security awareness and acts as the first guard to prevent potential vulnerabilities introduced by AI programming assistants.

In Codexity’s workflow, the user first needs to select a repair strategy (such as Iteration Repair, where the model is iteratively queried until a secure answer is found, or Preshot Repair which uses an additional cheaper model for an initial completion) in the configuration setting to activate the system. Then, the user can invoke Codexity to complete their code while programming. Codexity will take the existing code snippet to initiate a prompt and generate an initial completion with the selected LLM. The completed code will be routed to a vulnerability detection phase by a series of static analysis tools. If the static analysis tools report any vulnerability, Codexity extracts the error or warning message and location information, along with the vulnerable program, to formulate a vulnerability-exposing prompt. Finally, Codexity sends the vulnerability-exposing prompt to the LLM in the background and requests a vulnerable-free program.

Workflow for Codexity shown in real-world experiments. (Sung Yong Kim et al., 2024) Link: https://arxiv.org/abs/2405.03927

Extensive experiments with 990 real-world code completion attempts show that, compared to Chat-GPT, Codexity prevents the generation of 60% of the vulnerabilities, providing avenues for future research.

Conclusion

The use of Large Language Models has witnessed significant advancements in multiple directions, especially when they are used in assisting code implementation. However, code generated by LLMs can have a few problems. In this post, we discussed the latest efforts to improve the quality of code generated by LLMs. Through continuous investigation and refinement, we believe that the use of Large Language Models can open up exciting opportunities for us in code generation.

References

Zhu-Tian, C., Xiong, Z., Yao, X., & Glassman, E. (2024). Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches. https://arxiv.org/abs/2405.03998
Kim, S. Y., Fan, Z., Noller, Y., & Roychoudhury, A. (2024). Codexity: Secure AI-assisted Code Generation. https://arxiv.org/abs/2405.03927

Catch the latest version of this article over on Medium.com. Hit the button below to join our readers there.

Learn more on Medium