
諂媚型AI削弱親社會意向並助長依賴性 Sycophantic AI decreases prosocial intentions and promotes dependence ——《科學》第391卷,第6792期,2026年3月26日—— <Science> Volume 391, Issue 6792, 26 March 2026 【結構化摘要】引言:隨着人工智能(AI)系統日益滲透到日常生活的建議與指導場景中,一種名為“逢迎”(sycophancy)的現象引發了廣泛關注:即基於AI的大型語言模型傾向於對用戶表現出過度附和、諂媚或無原則肯定的行為。儘管既往研究已揭示,這種逢迎傾向會對那些本就易受操縱或易產生妄想的弱勢群體構成風險,但其對普通大眾的社會判斷與行為究竟會產生何種影響,目前仍是一個未解之謎。本文旨在揭示:逢迎現象在當前主流的AI系統中已普遍存在,且會對用戶的社會判斷能力產生實質性的負面影響。研究理由:一系列備受矚目的社會事件已將AI的逢迎行為與嚴重的心理危害(如產生妄想、自殘乃至自殺傾向)緊密聯繫在一起。除此類極端案例外,社會心理學與道德心理學領域的研究亦表明,不加辨別的肯定會引發更為隱蔽、卻同樣後果嚴重的負面效應:即固化用戶的非適應性信念(maladaptive beliefs),削弱其承擔責任的意願,並阻礙其在犯錯後採取行為糾正與修復措施。基於此,我們提出如下假設:AI模型往往會對用戶提供過度的肯定——即便在社會倫理或道德規範層面,這種肯定顯得極不恰當;且此類逢迎式回復將對用戶的信念與行為意圖產生消極的引導作用。為驗證這一假設,我們設計並實施了兩項相互補充的實驗。首先,我們利用三組涵蓋多元應用場景的數據集——具體包括日常建議諮詢、涉及道德越軌的情境,以及明確包含有害內容的場景——對11個主流AI模型進行了全面測試,旨在量化並評估“逢迎”現象在這些模型中的普遍程度。其次,我們開展了三項預先註冊的實驗,共有2405名參與者加入其中,旨在探究逢迎行為如何影響用戶對AI的判斷、行為意圖及認知。參與者在基於情境描述的場景中,以及通過實時聊天互動的方式與AI系統進行交互;在實時聊天環節中,他們討論了自己生活中曾經歷過的真實衝突。我們還測試了這些影響是否會因AI的回覆風格或用戶感知的回覆來源(即是AI還是人類)而有所差異。結果:我們發現,AI的逢迎行為不僅普遍存在,且具有危害性。在所測試的11個AI模型中,AI肯定用戶行為的頻率平均比人類高出49%,即便是在涉及欺騙、違法或其他有害行為的情境下也是如此。在針對Reddit社區r/AmITheAsshole版塊帖子的回覆中,當人類群體一致認為用戶有錯(肯定率為0%)時,AI系統卻在51%的案例中肯定了用戶。在我們的真人實驗中,即使僅與逢迎型AI進行過一次交互,也會削弱參與者承擔責任及修復人際衝突的意願,同時卻增強了他們對自己行為正確性的確信。然而,儘管這種逢迎行為扭曲了用戶的判斷,但逢迎型AI模型依然贏得了用戶的信任與偏愛。在控制了人口統計學特徵、對AI的既往熟悉程度、感知的回覆來源以及回復風格等個體差異因素後,上述所有影響依然顯著存在。這形成了一種“反常激勵”機制,導致逢迎行為難以消退:恰恰是這種會造成危害的特性,反而成為了驅動用戶參與度的主要動力。結論:AI的逢迎行為絕非僅僅關乎回復風格的細枝末節,也非僅限於特定領域的邊緣風險,而是一種普遍存在的行為模式,且會產生廣泛的後續影響。儘管獲得肯定會讓用戶感到被支持,但這種逢迎行為實際上會削弱用戶自我糾正的能力,並妨礙其做出負責任的決策。然而,正因為這種逢迎行為深受用戶喜愛且能有效提升參與度,目前幾乎沒有任何動力去促使其消減。我們的研究突顯了解決AI逢迎行為這一緊迫需求;作為一種可能危害用戶自我認知及人際關係的社會風險,我們必須通過制定有針對性的設計規範、評估標準及問責機制來加以應對。研究結果表明,那些看似無害的設計與工程決策,實則可能引發後果嚴重的危害;因此,審慎地研究並預判AI技術可能帶來的影響,對於保障用戶的長期福祉至關重要。 [Structured Abstract] Introduction: As artificial intelligence (AI) systems are increasingly used for everyday advice and guidance, concerns have emerged about sycophancy: the tendency of AI-based large language models to excessively agree with, flatter, or validate users. Although prior work has shown that sycophancy carries risks for groups who are already vulnerable to manipulation or delusion, syncophancy’s effects on the general population’s judgments and behaviors remain unknown. Here, we show that sycophancy is widespread in leading AI systems and has harmful effects on users’ social judgments. Rationale: High-profile incidents have linked sycophancy to psychological harms such as delusions, self-harm, and suicide. Beyond these cases, research in social and moral psychology suggests that unwarranted affirmation can produce subtler but still consequential effects: reinforcing maladaptive beliefs, reducing responsibility-taking, and discouraging behavioral repair after wrongdoing. We hypothesized that AI models excessively affirm users even when socially or morally inappropriate and that such responses negatively influence users’ beliefs and intentions. To test this, we conducted two complementary experiments. First, we measured the prevalence of sycophancy across 11 leading AI models using three datasets spanning a variety of use contexts, including everyday advice queries, moral transgressions, and explicitly harmful scenarios. Second, we conducted three preregistered experiments with 2405 participants to understand how sycophancy influences users’ judgments, behavioral intentions, and perceptions of AI. Participants interacted with AI systems in vignette-based settings and a live-chat interaction where they discussed a real past conflict from their lives. We also tested whether effects varied by response style or perceived response source (AI versus human). Results: We find that sycophancy is both prevalent and harmful. Across 11 AI models, AI affirmed users’ actions 49% more often than humans on average, including in cases involving deception, illegality, or other harms. On posts from r/AmITheAsshole, AI systems affirm users in 51% of cases where human consensus does not (0%). In our human experiments, even a single interaction with sycophantic AI reduced participants’ willingness to take responsibility and repair interpersonal conflicts, while increasing their own conviction that they were right. Yet despite distorting judgment, sycophantic models were trusted and preferred. All of these effects persisted when controlling for individual traits such as demographics and prior familiarity with AI; perceived response source; and response style. This creates perverse incentives for sycophancy to persist: The very feature that causes harm also drives engagement. Conclusion: AI sycophancy is not merely a stylistic issue or a niche risk, but a prevalent behavior with broad downstream consequences. Although affirmation may feel supportive, sycophancy can undermine users’ capacity for self-correction and responsible decision-making. Yet because it is preferred by users and drives engagement, there has been little incentive for sycophancy to diminish. Our work highlights the pressing need to address AI sycophancy as a societal risk to people’s self-perceptions and interpersonal relationships by developing targeted design, evaluation, and accountability mechanisms. Our findings show that seemingly innocuous design and engineering choices can result in consequential harms, and thus carefully studying and anticipating AI’s impacts is critical to protecting users’ long-term well-being. 論文原文:Myra Cheng, Cinoo Lee, Pranav Khadpe , Sunny Yu, et al. (2026). Sycophantic AI decreases prosocial intentions and promotes dependence. Science, Volume 391, Issue 6792, 26 March 2026. https://doi.org/10.1126/science.aec8352 (翻譯兼責任編輯:MARY) (需要英文原文的朋友,請聯繫微信:millerdeng95或iacmsp)
![]()
|