Ai Evaluation & Debugging

 

Description:

contribute to cutting-edge AI evaluation systems by designing real-world benchmark tasks that challenge advanced AI agents. If you're passionate about debugging, system analysis, infrastructure, data pipelines, or AI evaluation, this opportunity is for you.

Position Details

🔹 Role: Terminal Bench Expert

🔹 Employment Type: Contractor Assignment

🔹 Duration: 5 Weeks

🔹 Location: Remote (India, Bangladesh, Brazil, Colombia, Egypt, Ghana, Indonesia, Kenya, Nigeria, Pakistan, Turkey, Vietnam)

🔹 Experience: 3–10 Years

🔹 Start Date: Immediate

🔹 Commitment: Full-time (40 hrs/week)

What You'll Do

✔ Design and develop realistic AI benchmark tasks

✔ Create debugging, investigation, and system failure scenarios

✔ Define evaluation criteria and validation logic

✔ Document solutions and technical workflows

✔ Collaborate with reviewers to improve task quality and difficulty

Ideal Candidate

✅ Strong software engineering, debugging, and analytical skills

✅ Experience with production systems, pipelines, infrastructure, or large-scale workflows

✅ Exposure to AI/LLMs, evaluation frameworks, MLOps, DevOps, Cloud, Data Engineering, Security, Distributed Systems, or related domains

✅ Excellent technical writing and documentation skills

Organization Highbrow Technology Inc
Industry Other Jobs Jobs
Occupational Category AI Evaluation AND Debugging
Job Location Islamabad,Pakistan
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2026-05-30 10:23 pm
Expires on 2026-07-14