Ai Evaluation & Debugging

Description:

contribute to cutting-edge AI evaluation systems by designing real-world benchmark tasks that challenge advanced AI agents. If you're passionate about debugging, system analysis, infrastructure, data pipelines, or AI evaluation, this opportunity is for you.

Position Details

🔹 Role: Terminal Bench Expert

🔹 Employment Type: Contractor Assignment

🔹 Duration: 5 Weeks

🔹 Location: Remote (India, Bangladesh, Brazil, Colombia, Egypt, Ghana, Indonesia, Kenya, Nigeria, Pakistan, Turkey, Vietnam)

🔹 Experience: 3–10 Years

🔹 Start Date: Immediate

🔹 Commitment: Full-time (40 hrs/week)

What You'll Do

✔ Design and develop realistic AI benchmark tasks

✔ Create debugging, investigation, and system failure scenarios

✔ Define evaluation criteria and validation logic

✔ Document solutions and technical workflows

✔ Collaborate with reviewers to improve task quality and difficulty

Ideal Candidate

✅ Strong software engineering, debugging, and analytical skills

✅ Experience with production systems, pipelines, infrastructure, or large-scale workflows

✅ Exposure to AI/LLMs, evaluation frameworks, MLOps, DevOps, Cloud, Data Engineering, Security, Distributed Systems, or related domains

✅ Excellent technical writing and documentation skills

Organization	Highbrow Technology Inc
Industry	Other Jobs Jobs
Occupational Category	AI Evaluation AND Debugging
Job Location	Islamabad,Pakistan
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Intermediate
Experience	2 Years
Posted at	2026-05-30 10:23 pm
Expires on	2026-07-14