Description:
We are looking for a skilled AI Data Scientist & Annotation Expert to drive the full lifecycle of AI/ML solutions from data collection and annotation through advanced model development, training, and deployment. The role combines hands-on annotation expertise (critical for building high-quality training datasets) with data science and machine learning skills to ensure robust, production-ready AI systems.
You will play a central role in curating high-quality datasets, experimenting with algorithms, and ensuring model performance by leveraging both advanced data science techniques and precise annotation workflows.
Key Responsibilities:
Data Science & Machine Learning
- Design, build, and evaluate machine learning/deep learning models for NLP, Computer Vision, or multimodal tasks.
- Perform feature engineering, exploratory data analysis (EDA), and statistical modeling.
- Experiment with different model architectures (CNNs, RNNs, Transformers, LLMs) and optimize hyperparameters for maximum performance.
- Develop and deploy scalable ML pipelines in production (using FastAPI, TensorFlow Serving, TorchServe, or containerized environments).
- Collaborate with engineers and product teams to integrate models into end-user applications.
Data Annotation & Curation
- Design annotation workflows and guidelines for various data types (images, PDFs, audio, video, and text).
- Perform and manage manual and semi-automated annotation tasks (e.g., bounding boxes, polygons, segmentation masks, labeling, and text tagging).
- Implement quality control measures to ensure annotations meet accuracy and consistency standards.
- Train and supervise annotation teams, providing feedback and ensuring adherence to project requirements.
- Work with annotation tools (e.g., Label Studio, CVAT, Supervisely, Prodigy, Labelbox).
- Assist in building gold-standard datasets that serve as the foundation for AI/ML training.
MLOps & Data Ops
- Set up pipelines for dataset versioning, experiment tracking, and model reproducibility (using MLflow, DVC, or similar).
- Monitor dataset drift, class imbalances, and annotation consistency to avoid bias in models.
- Collaborate with DevOps teams for deployment, scaling, and monitoring of AI models.