Abstract:Large language models pre-trained for code generation can generate high-quality short code but often struggle with generating coherent long code and understanding higher-level or system-level specifications. This issue is also observed in language modeling for long text generation, and one proposed solution is the use of a latent stochastic process. This approach involves generating a document plan and then producing text that is consistent with it. In this study, we investigate whether this technique can be applied to code generation to improve coherence. We base our proposed encoder and decoder on the pre-trained GPT-2 based CodeParrot model and utilize the APPS dataset for training. We evaluate our results using the HumanEval benchmark and observe that the modified Time Control model performs similarly to CodeParrot on this evaluation.
Abstract:Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their inherent biases are harmful to the targeted social groups. In this work, we study the general trend in bias reduction as newer pre-trained models are released. Three recent models ( ELECTRA, DeBERTa, and DistilBERT) are chosen and evaluated against two bias benchmarks, StereoSet and CrowS-Pairs. They are compared to the baseline of BERT using the associated metrics. We explore whether as advancements are made and newer, faster, lighter models are released: are they being developed responsibly such that their inherent social biases have been reduced compared to their older counterparts? The results are compiled and we find that all the models under study do exhibit biases but have generally improved as compared to BERT.