AI Platform Engineer
Full TimeJob Overview
About the Role
We're building an ambitious internal AI Platform to power Bright's next generation of AI-driven products and services. This Kubernetes-hosted platform provides teams across the organisation with the tools to build, deploy, and observe AI-powered applications without managing complex infrastructure themselves.
As an AI Platform Engineer, you'll join a small, high-impact team building critical platform infrastructure for LLM operations (LLMOps). Working under the supervision of two senior/principal platform engineers and reporting to the Head of AI, you'll be instrumental in delivering self-service AI capabilities that enable developers across Bright to build sophisticated AI applications with confidence.
This is an opportunity to work on cutting-edge AI infrastructure, learn from experienced platform engineers, and make a significant impact on how Bright leverages AI technology at scale.
Key Responsibilities
Our roadmap spans multiple interconnected platform epics. You'll contribute to key initiatives including:
Core Platform Services
- Observability & Experimentation: Enhancing Langfuse for LLM tracing, evaluation, and experimentation capabilities
- Developer Self-Service: Building and improving Backstage as an internal developer portal for platform discoverability
- LLM Operations: Deploying and maintaining LiteLLM proxy, Langflow runtime, and other core LLM services
- Monitoring & Logging: Implementing platform-wide monitoring (Prometheus/Grafana) and logging infrastructure (Loki)
Security & Compliance
- LLM Ops Security: Implementing guardrails (LlamaGuard, Azure Guardrails) and security controls
- GDPR & PII Management: Building automated PII detection, minimization strategies, and compliance tooling
- Incident Response: Establishing security incident response procedures for LLM operations
Infrastructure & Reliability
- Kubernetes Operations: Managing AKS clusters, implementing reliable deployment tooling via ArgoCD
- Infrastructure as Code: Productionizing infrastructure with Terraform, eliminating manual configuration
- Autoscaling & Performance: Implementing workload management and autoscaling for AI services
- Storage Solutions: Migrating from self-hosted MinIO to managed Azure Blob Storage
Applications Support
You'll also support the deployment and operation of AI applications built on the platform, including:
- RAG (Retrieval-Augmented Generation) applications like Ask IPASS and Ask UK Pay Centre
- Document processing applications (BrightCapture)
- Employee onboarding automation (Oscar)
- Internal AI assistant (Bright GPT)
Skills, Knowledge and Expertise
What We're Looking For
Essential Skills & Experience
- Platform Engineering Fundamentals: 2-4 years experience with cloud infrastructure, preferably Azure
- Kubernetes: Practical experience deploying and managing applications in Kubernetes (AKS experience is a plus)
- Infrastructure as Code: Hands-on experience with Terraform or similar IaC tools
- CI/CD: Experience with GitOps workflows and tools like ArgoCD, GitHub Actions, or similar
- System Programming: Proficiency in Python or Go for automation and tooling; shell scripting essential
- Linux & Containers: Solid understanding of containerization with Docker and container orchestration
Desirable Experience
- Exposure to LLM technologies or AI/ML infrastructure
- Experience with observability tools (Prometheus, Grafana, Loki)
- Knowledge of Helm and Helmfile for Kubernetes deployments
- Knowledge of Kustomize
- Understanding of security best practices and compliance requirements (GDPR)
- Backend-as-a-Service platforms (Supabase or similar)
- Developer portal platforms (Backstage or similar)
- Application programming experience with .NET and/or TypeScript
What Makes You a Great Fit
- Learning Mindset: You're excited to learn about LLM operations and emerging AI infrastructure patterns
- Systems Thinking: You understand how distributed systems work and can reason about failure modes
- Pragmatic Approach: You balance perfect solutions with shipping value quickly
- Collaboration: You work well with both technical and product stakeholders
- Documentation: You believe good documentation is as important as good code
- Ownership: You take responsibility for your work from development through to production
Team Structure & Reporting
- Reports to: Head of AI
- Works closely with: Two senior/principal platform engineers
- Collaborates with: Application development teams, product managers, and security/compliance stakeholders
- Team size: Small, full-stack AI team covering development, DevOps, operations, and support
What Success Looks Like
In your first 3 months:
- You've contributed to multiple platform epics from our roadmap
- You understand the architecture of our AI platform and can navigate the codebase
- You've successfully deployed services to our Kubernetes clusters
- You're participating in on-call rotation and can troubleshoot platform issues
In your first 6 months:
- You're independently owning epics and driving them to completion
- You're contributing to architectural decisions and technical direction
- You've improved platform reliability, observability, or developer experience
- You're mentoring junior engineers or helping onboard new team members
Technical Stack
Infrastructure: Azure (AKS, Blob Storage, Cognitive Services), Kubernetes, Terraform
Platform Services: LiteLLM, Langflow, Langfuse, Supabase, Open Web UI, Backstage
Observability: Prometheus, Grafana, Loki, Langfuse tracing
CI/CD: ArgoCD, GitHub Actions, Helmfile
Languages: Python, Go, Shell scripting
Security: Azure Guardrails, LlamaGuard, PII detection tooling
Platform Services: LiteLLM, Langflow, Langfuse, Supabase, Open Web UI, Backstage
Observability: Prometheus, Grafana, Loki, Langfuse tracing
CI/CD: ArgoCD, GitHub Actions, Helmfile
Languages: Python, Go, Shell scripting
Security: Azure Guardrails, LlamaGuard, PII detection tooling
Why Join Bright's AI Platform Team?
- Impact: Your work directly enables AI innovation across the entire organization
- Growth: Learn from experienced platform engineers in a supportive environment
- Cutting Edge: Work with the latest AI infrastructure and tooling
- Autonomy: Small team means you'll have significant ownership and influence
- Mission: Help accountants and finance professionals work more efficiently with AI
Make Your Resume Now