Models Grammar Test - Search News

News

32mon MSN

OpenAI admits GPT-5 hallucinates: ‘Even advanced AI models can produce confidently wrong answers’–Here's why

OpenAI explains persistent “hallucinations” in AI, where models produce plausible but false answers. The issue stems from ...

TechCrunch5mon

A new, challenging AGI test stumps most AI models - TechCrunch

The Arc Prize Foundation has a new test for AGI that leading AI models from Anthropic, Google, and DeepSeek score poorly on.

MIT Technology Review1y

AI models can outperform humans in tests to identify mental states

Large language models don’t have a theory of mind the way humans do—but they’re getting better at tasks designed to measure it in humans.

Snopes.com8mon

AI Models Were Caught Lying to Researchers in Tests — But It's Not ...

AI Models Were Caught Lying to Researchers in Tests — But It's Not Time to Worry Just Yet OpenAI's o1 model, which users can access on ChatGPT Pro, showed "persistent" scheming behavior ...

TechCrunch1y

Kolena, a startup building tools to test AI models, raises $15M

Kolena, a startup building a platform to test and validate AI models, has raised $15 million in a venture funding round.

VentureBeat8mon

Hugging Face shows how test-time scaling helps small language models ...

Given enough time to "think," small language models can beat LLMs at math and coding tasks by generating and verifying multiple answers.

Ars Technica23d

Is GPT-5 really worse than GPT-4o? Ars puts them to the test.

To see just how much the new model changed things, we decided to put both GPT-5 and GPT-4o through our own gauntlet of test prompts.

VentureBeat1mon

Anthropic researchers discover the weird AI problem: Why thinking ...

Anthropic research reveals AI models perform worse with extended reasoning time, challenging industry assumptions about test-time compute scaling in enterprise deployments.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results