Humaneval - Search News

Microsoft announces ``phi-1'' that hits HumanEval 50.6% exceeding GPT-3.5 with only 1.3 billion parameters

Below is a comparison of the phi-1's performance with other models. phi-1 showed high accuracy of 50.6% in HumanEval, a dataset for evaluating programming ability, and 55.5% in MBPP. This result is ...

New Atlas

GPT-4 becomes 30% more accurate when asked to critique itself

The team used its technique against a few different performance tests. In the HumanEval test, which consists of 164 Python programming problems the model has never seen, GPT-4 scored a record 67%, but ...

Entrepreneur

Elon Musk’s Newest AI Chatbot Outperformed ChatGPT in One Key Area

Elon Musk’s xAI company is upgrading its Grok AI chatbot. The new model outperformed OpenAI’s AI model on one key HumanEval test. Musk stated in a Friday social media post that Grok 1.5 should be ...

The Verge

Meta’s free Code Llama AI programming tool closes the gap with GPT-4

Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. Code Llama 70B can generate and debug larger programming strings than Meta’s previous models. is a ...

GIGAZINE

AI search engine startup Phind announces flagship model 'Phind-405B'

have surpassed OpenAI's GPT-4 in coding capabilities, has announced its new flagship large-scale language model , Phind-405B. Phind-405B scores 92% on HumanEval, matching Claude 3.5 Sonnet. We're ...

VentureBeat

Mistral announces Codestral, a code-generation LLM it says outperforms all others

Today, Paris-based Mistral, the AI startup that raised Europe’s largest-ever seed round a year ago and has since become a rising star in the global AI domain, marked its entry into the programming and ...

NextBigFuture

Meta Llama 3 70B Open Source Model Beats Claude 3 Sonnet

The small 7B model beats Mistral 7B and Gemma 7B. The 70B beats Claude 3 Sonnet (closed source Anthropic model) and competes against Gemini Pro 1.5 (closed source model from Google). Meta will be ...

Searchenginejournal.com

New AI Model Outperforms Google’s Powerful PaLM-2

Inflection AI, the creators of the PI AI Personal Assistant announced the creation of a powerful new large language model called Inflection-2 that outperforms Google’s PaLM language model across a ...

VentureBeat

Microsoft’s GRIN-MoE AI model takes on coding and math, beating competitors in key benchmarks

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More Microsoft has unveiled a groundbreaking artificial intelligence model, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results