Description:
We’re hiring an AI/ML Engineer with expertise in data processing, pipeline automation, and cloud-based AI/ML implementations. You’ll design robust ETL workflows, optimize data for LLMs and RAG systems, and deploy scalable solutions on Google Cloud. If you’re passionate about turning messy data into actionable insights and automating workflows end-to-end, this role is for you.
Key Responsibilities:
- Data Preparation & Sanitization: Clean, preprocess, and analyze structured/unstructured data for AI/ML models (e.g., NLP, LLMs).
- Pipeline Development: Build ETL/ELT pipelines for efficient data ingestion, transformation, and storage using tools like Apache Airflow or Google Dataflow.
- RAG & LLM Integration: Implement Retrieval-Augmented Generation (RAG) systems, including embedding generation, vector database management (FAISS, Pinecone), and model fine-tuning.
- Cloud Services: Deploy and optimize AI/ML workflows on Google Cloud Platform (GCP) (e.g., BigQuery, Vertex AI, Cloud Storage).
- Automation: Develop Python scripts to automate data tasks, model training, and pipeline monitoring.
- Collaboration: Work with data scientists, DevOps, and product teams to ensure seamless integration of data pipelines into production systems.
- Innovation: Stay updated on AI/ML trends (e.g., vectorization, distributed computing) and propose improvements.
Requirements
- Education: Bachelor’s/Master’s in Computer Science, Data Science, or related field.
- Experience: 3+ years in AI/ML engineering with a focus on data processing and pipelines.
- Technical Skills:
- Data Tools: Proficiency in Pandas, NumPy, SQL, and data visualization libraries.
- ETL/MLOps: Experience with Airflow, Luigi, or Kubeflow.
- LLMs & Embeddings: Hands-on work with Hugging Face, LangChain, or OpenAI APIs.
- Vector Databases: FAISS, Pinecone, Chroma, or Milvus.
- Cloud: GCP (BigQuery, Dataflow, Vertex AI) or equivalent (AWS/Azure).
- Automation: Advanced Python scripting and familiarity with PySpark.
- Soft Skills: Strong problem-solving, communication, and teamwork abilities.
Nice to Have
- GCP Professional Certification (e.g., Data Engineer, Machine Learning Engineer).
- Experience with CI/CD pipelines, Docker, or Kubernetes.
- Contributions to open-source AI/ML projects.