Москвичей предупредили о резком похолодании09:45
美國到底在幹什麼?伊朗戰爭正在動搖中國的雄心
海南春节文旅热度飙升,星巴克区域门店实现高增长,更多细节参见新收录的资料
### The "Serves As" Dodge。业内人士推荐新收录的资料作为进阶阅读
Overall, the answers we received this year pretty closely match the results of last year, differences are often under a single percentage point. The number of respondents decreases slightly year over year. In 2025, we published multiple surveys (such as the Compiler Performance or Variadic Generics survey), which might have also contributed to less people answering this (longer) survey. We plan to discuss how (and whether) to combine the State of Rust survey with the ongoing work on the Rust Vision Doc.
BenchmarkSarvam-105BGLM-4.5-Air (106B)GPT-OSS-120BQwen3-Next-80B-A3B-ThinkingGENERALMath50098.697.297.098.2Live Code Bench v671.759.572.368.7MMLU90.687.390.090.0MMLU Pro81.781.480.882.7Arena Hard v271.068.188.568.2IF Eval84.883.585.488.9REASONINGGPQA Diamond78.775.080.177.2AIME 25 (w/ tools)88.3 (96.7)83.390.087.8HMMT (Feb 25)85.869.290.073.9HMMT (Nov 25)85.875.090.080.0Beyond AIME69.161.551.068.0AGENTICBrowseComp49.521.3-38.0SWE Bench Verified (SWE-Agent Harness)45.057.650.634.46Tau2 (avg.)68.353.265.855.0,这一点在新收录的资料中也有详细论述