GPT-4.1淘汰了4.5！全系列百万上下文，主打一个性价比 #科技 #The #cards #系列 #Mini #方面

鱼羊发自凹非寺

量子位 | 公众号 QbitAI

4.1与4.5孰大？OpenAI刚刚给出答案：

发布GPT-4.1，比GPT-4.5强的那种。

今日霍州(www.jrhz.info)©️

新模型系列更新，一共带来三个版本：GPT-4.1，GPT-4.1 mini、GPT-4.1 nano——

与通常中杯大杯超大杯的设置不同，这回翻译过来，是中杯、小杯、超小杯。

OpenAI表示，4.1系列是API专供，不过列位非开发者先别急哈，人家也补充了，在ChatGPT里，4.1的能力将主要通过“融入最新版本的GPT-4o”体现。

今日霍州(www.jrhz.info)©️

能力方面，总结起来4.1系列纸面上最突出的优势有两点：

jrhz.info

长上下文，3个型号均拥有100万token上下文窗口；

性价比，用内部老哥的说法就是：

现在你可以用4%的价格，畅享GPT-4o模型品质。

今日霍州(www.jrhz.info)©️

OpenAI还表示，GPT-4.1系列会在API里取代GPT-4.5 Preview，后者将于今年（2025年）7月14日下架。

GPT-4.1：主打性价比

展开来看，OpenAI整体上是把GPT-4.1和GPT-4o拿来对比的。

今日霍州(www.jrhz.info)©️

以延迟为横轴，以智能为纵轴，可以看到，GPT-4.1比GPT-4o强了一丢丢，而4.1 mini则超出了4o mini一大截。

定量比较的结果是，编码方面，GPT-4.1在衡量真实世界软件工程技能的SWE-bench Verified上得分为54.6%，比GPT-4o的分数提高了21.4%，比GPT-4.5强了26.6%。

今日霍州(www.jrhz.info)©️

指令遵循方面，在MultiChallenge基准中，GPT-4.1得分38.3%，而GPT-4o的得分是27.8%。

今日霍州(www.jrhz.info)©️

长上下文方面，在多模态长下文理解基准Video-MME上，GPT-4.1刷新SOTA，在长篇无字幕类别中得分72.0%，比GPT-4o高了6.7%。

今日霍州(www.jrhz.info)©️

值得注意的是，GPT-4.1 mini在多项基准测试中超过了GPT-4o。

比如在智能评估基准MMLU上，GPT-4.1 mini的得分为87.5%，超过了GPT-4o的85.7%，同时延迟降低一半，成本降低83%。

今日霍州(www.jrhz.info)©️

GPT-4.1 nano则被定位为OpenAI“目前速度最快、成本最低”的模型。并且在部分测试中有超出GPT-4o mini的表现。

编码能力

OpenAI着重强调了GPT-4.1的编码能力。除了在各种编程任务上都超过GPT-4o，OpenAI还演示了其在前端编程方面的实际优势：

能够创建功能更强大、更美观的Web应用。

人类评分的结果显示，在80%的对比测试中，GPT-4.1的网站都比GPT-4o的网站更受欢迎。

能够创建功能更强大、更美观的Web应用。

人类评分的结果显示，在80%的对比测试中，GPT-4.1的网站都比GPT-4o的网站更受欢迎。

比如给出同一段提示词：

Prompt: Make a flashcard web application. The user should be able to create flashcards, search through their existing flashcards, review flashcards, and see statistics on flashcards reviewed. Preload ten cards containing a Hindi word or phrase and its English translation. Review interface: In the review interface, clicking or pressing Space should flip the card with a smooth 3-D animation to reveal the translation. Pressing the arrow keys should navigate through cards. Search interface: The search bar should dynamically provide a list of results as the user types in a query. Statistics interface: The stats page should show a graph of the number of cards the user has reviewed, and the percentage they have gotten correct. Create cards interface: The create cards page should allow the user to specify the front and back of a flashcard and add to the user’s collection. Each of these interfaces should be accessible in the sidebar. Generate a single page React app (put all styles inline).

GPT-4o生成的网站长这样：