Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
- Updated
Apr 28, 2025 - Python
Industrial-level evaluation benchmarks for Coding LLMs in the full life-cycle of AI native software developing.企业级代码大模型评测体系,持续开放中
Pip compatible CodeBLEU metric implementation available for linux/macos/win
Backend for automated evaluation of programming tasks in higher education
The SF Code Evaluator
An open-source Python library for code encryption, decryption, and safe evaluation using Python's built-in AST module, complete with allowed functions, variables, built-in imports, timeouts, and blocked access to attributes.
Python library to interact synchronously and asynchronously with tio.run
Python toolkit for automated evaluation and benchmarking of code efficiency, performance, and resource usage. Easily analyze, compare, and score scripts or code snippets in a fast, modular CLI workflow.
Add a description, image, and links to the code-evaluation topic page so that developers can more easily learn about it.
To associate your repository with the code-evaluation topic, visit your repo's landing page and select "manage topics."