💙 Gate Square #Gate Blue Challenge# 💙
Show your limitless creativity with Gate Blue!
📅 Event Period
August 11 – 20, 2025
🎯 How to Participate
1. Post your original creation (image / video / hand-drawn art / digital work, etc.) on Gate Square, incorporating Gate’s brand blue or the Gate logo.
2. Include the hashtag #Gate Blue Challenge# in your post title or content.
3. Add a short blessing or message for Gate in your content (e.g., “Wishing Gate Exchange continued success — may the blue shine forever!”).
4. Submissions must be original and comply with community guidelines. Plagiarism or re
The hottest big language models all love "nonsense". Who has the worst "illusion" problem?
Source: Wall Street News
Author: Du Yu
Arthur AI, a New York-based artificial intelligence startup and machine learning monitoring platform, released its latest research report on Thursday, August 17, comparing Microsoft-backed OpenAI, Metaverse Meta, Google-backed Anthropic, and Nvidia-backed generation The ability of large language models (LLMs) to "hallucinate" (AKA nonsense) from companies like AI unicorn Cohere.
Arthur AI regularly updates the aforementioned research program, dubbed "Generative AI Test Evaluation," to rank the strengths and weaknesses of industry leaders and other open-source LLM models.
In the "AI Model Hallucination Test," the researchers examined the answers given by different LLM models with questions in categories as diverse as combinatorics, U.S. presidents, and Moroccan political leaders. Multiple steps of reasoning about the information are required."
The study found that, overall, OpenAI's GPT-4 performed the best of all the models tested, producing fewer "hallucinating" problems than the previous version, GPT-3.5, such as reduced hallucinations on the math problem category 33% to 50%.
At the same time, Meta's Llama-2 performed in the middle of the five tested models, and Anthropic's Claude-2 ranked second, second only to GPT-4. And Cohere's LLM model is the most capable of "nonsense" and "very confidently giving wrong answers".
The researchers also tested the extent to which the AI models would "hedge" their answers with irrelevant warning phrases to avoid risk, common phrases including "As an AI model, I cannot provide an opinion."
GPT-4 had a 50% relative increase in hedging warnings over GPT-3.5, which the report says "quantifies the more frustrating experience users cited with GPT-4." And Cohere's AI model provides no hedge at all in the above three problems.
By contrast, Anthropic's Claude-2 was the most reliable in terms of "self-awareness," the ability to accurately measure what it knows and what it doesn't know, and only answer questions backed by training data.
On the same day that the above-mentioned research report was published, Arthur Company also launched Arthur Bench, an open source AI model evaluation tool, which can be used to evaluate and compare the performance and accuracy of various LLMs. Enterprises can add customized standards to meet their own business needs. The goal is to help Businesses make informed decisions when adopting AI.
"AI hallucinations" (hallucinations) refer to chatbots completely fabricating information and appearing to spout facts in response to user prompt questions.
Google made untrue statements about the James Webb Space Telescope in a February promotional video for its generative AI chatbot Bard. In June, ChatGPT cited a "bogus" case in a filing in New York federal court, and the lawyers involved in the filing could face sanctions.
OpenAI researchers reported in early June that they had found a solution to the "AI illusion", that is, training the AI model to give self-reward for each correct step in deducing the answer, not just waiting until the correct final conclusion is inferred Only rewarded. This "process supervision" strategy will encourage AI models to reason in a more human-like "thinking" way.
OpenAI acknowledged in the report:
Soros, the investment tycoon, also published a column in June saying that artificial intelligence can most aggravate the polycrisis facing the world at the moment. One of the reasons is the serious consequences of the AI illusion:
Previously, Geoffrey Hinton, who was regarded as the "godfather of artificial intelligence" and left Google, publicly criticized the risks brought by AI many times, and may even destroy human civilization, and predicted that "artificial intelligence only takes 5 to It can surpass human intelligence in 20 years."