Software Engineer - AI/ML
Full-time AssociateJob Overview
We’re hiring a hands-on AI Learning Engineer who can build and fine-tune generative AI (diffusion & LLMs ), vision-language models (VLMs), classical & deep models from scratch, and productionize them end-to-end.
This role blends modeling (you’ll train and fine-tune models) with production systems (MLOps, LLMops, model optimization, serving, and API/backends).
You will not only use pre-trained models, you will design, train, optimize, and serve custom models for production use (GenAI, Stable Diffusion, OCR, theft detection, recommenders, etc.).
Requirements
Develop production inference stacks: convert & optimize models (Torch → ONNX → TensorRT when appropriate), quantize/prune, profile FLOPs and latency, and deliver low-latency GPU inference with minimal accuracy loss.
- Create robust model serving infrastructure: FastAPI / gRPC services for inference, streaming outputs (token-level streaming for LLMs, frame/segment streaming for CV), model versioning and routing, autoscaling, model rollback and A/B testing.
- Build CV solutions from scratch: object detection, theft/theft-detection pipelines, OCR (document parsing, structured extraction), surveillance analytics, and integrate + finetune Hugging Face pretrained models when beneficial.
- Fine-tune Stable Diffusion and other generative image models for brand/style-consistent image generation and downstream tasks.
- Train and fine-tune VLMs (vision-language models) for multimodal tasks (captioning, visual QA, multimodal retrieval), using both from-scratch training and transfer learning from HF checkpoints.
- Design, train & fine-tune GenAI models (LLMs) for use cases such as conversational agents, summarization, retrieval-augmented generation (RAG), and domain adaptation.
- MLOps / LLMops / AIOps: CI/CD for training & deployment, dataset versioning, experiments tracking, model registry, monitoring (latency, throughput, model drift, data drift), alerting and automated retraining pipelines.
- Data acquisition & pipeline work: build scrapers/collectors and scalable ingestion pipelines; implement proxy pools, rate limit handling, and rotation for reliability (with compliance & respect for target site terms).
- Third-party model integration: call and compose third-party inference APIs (Hugging Face, OpenAI, other vendors), build fallback & hybrid inference strategies that combine local and cloud models.
Required qualifications:
- Strong experience with computer vision: object detection, segmentation, OCR pipelines (training from scratch and transfer learning).
- Deep knowledge of model optimization: quantization, pruning, distillation, FLOPs analysis, CUDA profiling, mixed precision (AMP), and inference time tradeoffs.
- Demonstrated ability to design & implement models from scratch (not only using pretrained checkpoints): architecture design, loss selection, training loops, evaluation metrics.
- Practical experience training and fine-tuning LLMs (transformers) and generative image models (Stable Diffusion or diffusion frameworks).
- Experience exporting & running models with ONNX, TensorRT, TorchScript, and familiarity with Triton, TorchServe, or ONNX Runtime for production serving.
- Hands-on with GPU infrastructure and CUDA (profiling with nvprof/nsight, memory management, multi-GPU training).
- Solid backend engineering skills: Python, FastAPI (or Flask), asynchronous programming, WebSockets/SSE, REST design.
- Containerization and orchestration: Docker, Kubernetes, Helm, and experience deploying GPU workloads to AWS / GCP / Azure or on-prem.
- Good understanding of classical ML (scikit-learn): regression, classification, clustering; able to design experiments and baselines.
- Strong software engineering practices: unit tests, CI/CD, code reviews, reproducibility.
- Excellent communication skills, able to explain ML tradeoffs to product and frontend teams.
Preferred / Nice-to-have:
- Knowledge of privacy-preserving ML (DP, federated learning) or regulatory constraints for data handling.
- Experience with logging & observability: Prometheus, Grafana, Sentry, OpenTelemetry.
Make Your Resume Now