{"id":904,"date":"2025-08-05T11:59:24","date_gmt":"2025-08-05T03:59:24","guid":{"rendered":"https:\/\/www.agidt.com\/?p=904"},"modified":"2025-08-05T11:59:24","modified_gmt":"2025-08-05T03:59:24","slug":"%e5%bc%80%e6%ba%90-top-%e9%a1%b9%e7%9b%ae_deepeval","status":"publish","type":"post","link":"https:\/\/www.agidt.com\/?p=904","title":{"rendered":"\u5f00\u6e90 TOP \u9879\u76ee_DeepEval"},"content":{"rendered":"\n<p><strong>DeepEval\uff1a<\/strong>\u4e13\u95e8\u7528\u6765\u6d4b\u8bd5\u548c\u8bc4\u4f30 LLM \u8868\u73b0\uff0c\u7ed3\u5408\u4e86\u6700\u65b0\u7684\u7814\u7a76\u6210\u679c\uff0c\u53ef\u4ee5\u7528\u5404\u79cd\u6307\u6807\u6765\u8861\u91cf\uff0c\u5982 G-Eval\u3001RAGAS\u3001\u5e7b\u89c9\u3001\u56de\u7b54\u76f8\u5173\u6027\u7b49\u3002\u8fd9\u4e9b\u8bc4\u4f30\u53ef\u4ee5\u5728\u4f60\u81ea\u5df1\u7684\u8bbe\u5907\u4e0a\u672c\u5730\u8fd0\u884c\uff0c\u652f\u6301\u591a\u79cd\u6a21\u578b\u548c\u5de5\u5177\uff0c\u6bd4\u5982 LangChain \u548c LlamaIndex \u7b49\u3002 <\/p>\n\n\n\n<p><strong>\u5b83\u80fd\u505a\u4ec0\u4e48\uff1f<\/strong> DeepEval \u5c31\u50cf\u4e00\u4e2a LLM \u7684\u201c\u4f53\u68c0\u533b\u751f\u201d\uff0c\u80fd\u5e2e\u4f60\u68c0\u67e5\u4f60\u7684\u804a\u5929\u673a\u5668\u4eba\u3001RAG \u7ba1\u9053\u6216\u5176\u4ed6 AI \u5e94\u7528\u7684\u5065\u5eb7\u72b6\u51b5\u3002\u65e0\u8bba\u4f60\u662f\u60f3\u4f18\u5316\u6a21\u578b\u3001\u8c03\u6574\u63d0\u793a\u8bcd\uff0c\u8fd8\u662f\u786e\u4fdd\u4f60\u7684\u5e94\u7528\u4e0d\u4f1a\u201c\u8dd1\u504f\u201d\uff0cDeepEval \u90fd\u80fd\u6d3e\u4e0a\u7528\u573a\u3002\u5b83\u652f\u6301\u4ee5\u4e0b\u529f\u80fd\uff1a<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>\u5168\u9762\u8bc4\u4f30\uff1a\u65e2\u80fd\u6d4b\u8bd5\u6574\u4e2a LLM \u5e94\u7528\uff08\u7aef\u5230\u7aef\uff09\uff0c\u4e5f\u80fd\u68c0\u67e5\u5355\u4e2a\u7ec4\u4ef6\uff08\u6bd4\u5982\u68c0\u7d22\u5668\u3001\u5de5\u5177\u8c03\u7528\uff09<\/li>\n\n\n\n<li>\u4e30\u5bcc\u6307\u6807\uff1a\u63d0\u4f9b\u591a\u79cd\u73b0\u6210\u7684\u8bc4\u4f30\u6307\u6807\uff0c\u6bd4\u5982\u56de\u7b54\u7684\u51c6\u786e\u6027\u3001\u4e0a\u4e0b\u6587\u76f8\u5173\u6027\u3001\u662f\u5426\u6709\u6bd2\u6027\uff08toxicity\uff09\u6216\u504f\u89c1\uff08bias\uff09\u3002\u4f60\u751a\u81f3\u53ef\u4ee5\u81ea\u5b9a\u4e49\u6307\u6807<\/li>\n\n\n\n<li>\u6570\u636e\u96c6\u751f\u6210\uff1a\u53ef\u4ee5\u751f\u6210\u6a21\u62df\u6570\u636e\u6765\u6d4b\u8bd5\u4f60\u7684\u6a21\u578b<\/li>\n\n\n\n<li>\u5b89\u5168\u6d4b\u8bd5\uff1a\u901a\u8fc7\u7ea2\u961f\u6d4b\u8bd5\uff08red-teaming\uff09\uff0c\u68c0\u67e5\u4f60\u7684 LLM \u662f\u5426\u5bb9\u6613\u53d7\u5230\u63d0\u793a\u6ce8\u5165\uff08prompt injection\uff09\u7b49\u653b\u51fb<\/li>\n\n\n\n<li>\u57fa\u51c6\u6d4b\u8bd5\uff1a\u8f7b\u677e\u7528 MMLU\u3001HellaSwag \u7b49\u77e5\u540d\u6570\u636e\u96c6\u6d4b\u8bd5\u4f60\u7684\u6a21\u578b\u6027\u80fd<\/li>\n\n\n\n<li>\u4e91\u7aef\u652f\u6301\uff1a\u901a\u8fc7 Confident AI \u5e73\u53f0\uff0c\u53ef\u4ee5\u5b58\u50a8\u6d4b\u8bd5\u6570\u636e\u3001\u751f\u6210\u62a5\u544a\u3001\u5bf9\u6bd4\u4e0d\u540c\u7248\u672c\u7684\u8868\u73b0\uff0c\u751a\u81f3\u76d1\u63a7\u751f\u4ea7\u73af\u5883\u4e2d\u7684 LLM \u8868\u73b0<\/li>\n<\/ul>\n\n\n\n<p><strong>\u4e3a\u4ec0\u4e48\u91cd\u8981\uff1f<\/strong> DeepEval \u586b\u8865\u4e86 LLM \u5f00\u53d1\u4e2d\u7684\u4e00\u4e2a\u5173\u952e\u7a7a\u767d\uff1a<strong>\u5982\u4f55\u7cfb\u7edf\u5316\u5730\u6d4b\u8bd5\u548c\u4f18\u5316\u6a21\u578b\u8f93\u51fa<\/strong>\u3002\u5b83\u4e0d\u4ec5\u9002\u5408\u5f00\u53d1\u8005\u8c03\u8bd5 RAG \u7ba1\u9053\u6216\u804a\u5929\u673a\u5668\u4eba\uff0c\u8fd8\u80fd\u5e2e\u52a9\u56e2\u961f\u5728\u751f\u4ea7\u73af\u5883\u4e2d\u6301\u7eed\u76d1\u63a7\u548c\u6539\u8fdb LLM \u7684\u8868\u73b0\u3002\u5b83\u7684\u6a21\u5757\u5316\u8bbe\u8ba1\u548c\u4e0e CI\/CD \u5de5\u5177\u7684\u65e0\u7f1d\u96c6\u6210\uff0c\u8ba9\u6d4b\u8bd5\u8fc7\u7a0b\u50cf\u5199\u4ee3\u7801\u4e00\u6837\u81ea\u7136\u3002 <\/p>\n\n\n\n<p><strong>\u8c01\u5728\u7528\uff1f<\/strong> DeepEval \u7531 Confident AI \u56e2\u961f\u5f00\u53d1\uff0c\u9002\u7528\u4e8e\u4ece\u521d\u521b\u516c\u53f8\u5230\u5927\u4f01\u4e1a\u7684 LLM \u5f00\u53d1\u8005\u3002\u5982\u679c\u4f60\u7528 LangChain\u3001LlamaIndex \u6216\u8005\u60f3\u4ece OpenAI \u5207\u6362\u5230\u81ea\u6258\u7ba1\u6a21\u578b\uff08\u6bd4\u5982 Deepseek R1\uff09\uff0cDeepEval \u90fd\u80fd\u5e2e\u4f60\u786e\u4fdd\u6a21\u578b\u8d28\u91cf\u3002<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" src=\"https:\/\/pbs.twimg.com\/media\/GxhBizwasAALy3t?format=jpg&amp;name=medium\" alt=\"\"\/><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>DeepEval\uff1a\u4e13\u95e8\u7528\u6765\u6d4b\u8bd5\u548c\u8bc4\u4f30 LLM \u8868\u73b0\uff0c\u7ed3\u5408\u4e86\u6700\u65b0\u7684\u7814\u7a76\u6210\u679c\uff0c\u53ef\u4ee5\u7528\u5404\u79cd\u6307\u6807\u6765\u8861\u91cf\uff0c\u5982 G-Eval\u3001RAGAS\u3001\u5e7b\u89c9\u3001\u56de\u7b54\u76f8\u5173\u6027\u7b49\u3002\u8fd9\u4e9b\u8bc4\u4f30\u53ef\u4ee5\u5728\u4f60\u81ea\u5df1\u7684\u8bbe\u5907\u4e0a\u672c\u5730\u8fd0\u884c\uff0c\u652f\u6301\u591a\u79cd\u6a21\u578b\u548c\u5de5\u5177\uff0c\u6bd4\u5982 LangChain \u548c Lla&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[22],"tags":[],"topic":[],"class_list":["post-904","post","type-post","status-publish","format-standard","hentry","category-ai-essentials"],"_links":{"self":[{"href":"https:\/\/www.agidt.com\/index.php?rest_route=\/wp\/v2\/posts\/904","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.agidt.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.agidt.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.agidt.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.agidt.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=904"}],"version-history":[{"count":1,"href":"https:\/\/www.agidt.com\/index.php?rest_route=\/wp\/v2\/posts\/904\/revisions"}],"predecessor-version":[{"id":910,"href":"https:\/\/www.agidt.com\/index.php?rest_route=\/wp\/v2\/posts\/904\/revisions\/910"}],"wp:attachment":[{"href":"https:\/\/www.agidt.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=904"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.agidt.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=904"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.agidt.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=904"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/www.agidt.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftopic&post=904"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}