Ai Data Labeler | Technical Content & Evaluation Specialist

Description:

They should NOT use AI or LLM generated prompts. We are strictly checking the responses and AI generated answers will be rejected.

Skill

2+ years of proven experience in technical writing, content creation, curriculum design, or AI data labeling/review.

Availability

8 hours per day with 4 hours of overlap with PST.

Role Overview:

This role is central to advancing AI agent capabilities beyond current performance benchmarks.

Analyze example questions and guidelines to determine the core skill being tested (e.g., complex reasoning, multi-source synthesis, nuance detection).
Create entirely new questions on the same topic and with similar complexity to the example, ensuring the new challenge requires deep resourcefulness and avoids simple recall or pattern matching.
Develop accurate, and comprehensive "Ground Truth" answer** for the newly created question. This answer must serve as the gold standard for AI performance.
Design a detailed Checklist to evaluate the quality of an answer. This checklist must be precise, quantifiable, and outline all necessary components for a "successful" response, including criteria for accuracy, completeness, logical flow, and resource citation/synthesis.
Obtain and document the responses to the newly created question from leading large language models (e.g., **ChatGPT 5 and Claude Sonnet 4.5**).

Requirements:

Proven experience in technical writing, content creation, curriculum design, or AI data labeling/review.
Exceptional analytical and critical thinking skills with the ability to deconstruct complex problems into core logical components.
Mastery of synthesis: demonstrated ability to accurately and concisely combine information from multiple, potentially conflicting, sources.
Meticulous attention to detail - for generating both high-quality questions and error-free, comprehensive Ground Truth answers.
Deep understanding of Large Language Models (LLMs) and the common failure modes (e.g., hallucination, superficial answers, lack of depth).
Ability to strictly adhere to complex guidelines and quality control standards.