Description:
contribute to cutting-edge AI evaluation systems by designing real-world benchmark tasks that challenge advanced AI agents. If you're passionate about debugging, system analysis, infrastructure, data pipelines, or AI evaluation, this opportunity is for you.
Position Details
🔹 Role: Terminal Bench Expert
🔹 Employment Type: Contractor Assignment
🔹 Duration: 5 Weeks
🔹 Location: Remote (India, Bangladesh, Brazil, Colombia, Egypt, Ghana, Indonesia, Kenya, Nigeria, Pakistan, Turkey, Vietnam)
🔹 Experience: 3–10 Years
🔹 Start Date: Immediate
🔹 Commitment: Full-time (40 hrs/week)
What You'll Do
✔ Design and develop realistic AI benchmark tasks
✔ Create debugging, investigation, and system failure scenarios
✔ Define evaluation criteria and validation logic
✔ Document solutions and technical workflows
✔ Collaborate with reviewers to improve task quality and difficulty
Ideal Candidate
✅ Strong software engineering, debugging, and analytical skills
✅ Experience with production systems, pipelines, infrastructure, or large-scale workflows
✅ Exposure to AI/LLMs, evaluation frameworks, MLOps, DevOps, Cloud, Data Engineering, Security, Distributed Systems, or related domains
✅ Excellent technical writing and documentation skills
| Organization | Highbrow Technology Inc |
| Industry | Other Jobs Jobs |
| Occupational Category | AI Evaluation AND Debugging |
| Job Location | Islamabad,Pakistan |
| Shift Type | Morning |
| Job Type | Full Time |
| Gender | No Preference |
| Career Level | Intermediate |
| Experience | 2 Years |
| Posted at | 2026-05-30 10:23 pm |
| Expires on | 2026-07-14 |