PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
"PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback" by Shen et al. introduces a novel framework called RRTF (Rank Responses to align Test & Teacher Feedback) that can effectively and efficiently boost pre-trained large language models for code generation. The authors present PanGu-Coder2, which achieves a 62.20% pass on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, they show that PanGu-Coder2 consistently outperforms all previous Code LLMs.
The paper discusses how Large Language Models for Code (Code LLM) are flourishing, with new and powerful models being released on a weekly basis. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. The authors propose the RRTF framework as a new approach to boosting pre-trained large language models for code generation.
The results achieved by PanGu-Coder2 demonstrate the effectiveness of the RRTF framework. The model improves nearly 30% over its base model and achieves new state-of-the-art performance on the HumanEval, CoderEval, and LeetCode benchmarks, surpassing all previously published Code LLMs.
In conclusion, this paper presents a novel approach to boosting pre-trained large language models for code generation using the RRTF framework. The authors demonstrate the effectiveness of this approach through the results achieved by PanGu-Coder2 on various benchmarks.
Check out the full paper here: https://arxiv.org/pdf/2307.14936.pdf
Scaling TransNormer to 175 Billion Parameters
"Scaling TransNormer to 175 Billion Parameters" by Qin et al. presents TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanism, tensor normalization, inference acceleration and stabilization.
Specifically, the authors use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, they propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, they leverage a gating mechanism to smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over 20%.
Furthermore, the authors have developed a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. Scalability is at the heart of their model's design, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, all while maintaining outstanding performance metrics.
Rigorous validation of their model design is achieved through a series of comprehensive experiments on their self-collected corpus, boasting a size exceeding 6TB and containing over 2 trillion tokens. To ensure data quality and relevance, they implement a new self-cleaning strategy to filter their collected data. Their pre-trained models will be released to foster community advancements in efficient LLMs.
In conclusion, this paper presents TransNormerLLM, a linear attention-based Large Language Model that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. The authors demonstrate the effectiveness of their approach through a series of comprehensive experiments on their self-collected corpus.
Check out the full paper here: https://arxiv.org/pdf/2307.14995.pdf
Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition
"Scaling Up and Distilling Down: Language-Guided Robot Skill Acquisition" by Ha et al. presents a framework for robot skill acquisition that efficiently scales up data generation of language-labelled robot data and effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. For scaling up data generation, the authors use a large language model (LLM) to guide high-level planning, and sampling-based robot planners (e.g. motion or grasp samplers) for generating diverse and rich manipulation trajectories. To robustify this data-collection process, the LLM also infers a code-snippet for the success condition of each task, simultaneously enabling the data-collection process to detect failure and retry as well as the automatic labeling of trajectories with success/failure.
For distilling down the data, the authors extend the diffusion policy single-task behavior-cloning approach to multi-task settings with language conditioning. They propose a new multi-task benchmark with 18 tasks across five domains to test long-horizon behavior, common-sense reasoning, tool-use, and intuitive physics. Their distilled policy successfully learned the robust retrying behavior in its data collection policy, while improving absolute success rates by 34.8% on average across five domains.
In conclusion, this paper presents a framework for robot skill acquisition that efficiently scales up data generation of language-labelled robot data and effectively distills this data down into a robust multi-task language-conditioned visuo-motor policy. The authors demonstrate the effectiveness of their approach through a series of comprehensive experiments on their proposed multi-task benchmark.
Check out the full paper here: https://arxiv.org/pdf/2307.14535.pdf
I love your work thanks for all the amazing summaries!