Below is a comparison of the phi-1's performance with other models. phi-1 showed high accuracy of 50.6% in HumanEval, a dataset for evaluating programming ability, and 55.5% in MBPP. This result is ...
The team used its technique against a few different performance tests. In the HumanEval test, which consists of 164 Python programming problems the model has never seen, GPT-4 scored a record 67%, but ...
Elon Musk’s xAI company is upgrading its Grok AI chatbot. The new model outperformed OpenAI’s AI model on one key HumanEval test. Musk stated in a Friday social media post that Grok 1.5 should be ...
Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. is a ...
have surpassed OpenAI's GPT-4 in coding capabilities, has announced its new flagship large-scale language model , Phind-405B. Phind-405B scores 92% on HumanEval, matching Claude 3.5 Sonnet. We're ...
Today, Paris-based Mistral, the AI startup that raised Europe’s largest-ever seed round a year ago and has since become a rising star in the global AI domain, marked its entry into the programming and ...
The small 7B model beats Mistral 7B and Gemma 7B. The 70B beats Claude 3 Sonnet (closed source Anthropic model) and competes against Gemini Pro 1.5 (closed source model from Google). Meta will be ...
Inflection AI, the creators of the PI AI Personal Assistant announced the creation of a powerful new large language model called Inflection-2 that outperforms Google’s PaLM language model across a ...
Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results