Linux PCIe Driver Developer ( 8 - 12 years)
Full-time Mid-Senior LevelJob Overview
We are seeking a talented and driven User-Mode Driver (UMD) / system software engineer to join our team and contribute to the host-side software stack for machine learning in the Next Gen Computational PCIe Flash Controller project. In this role, you will be responsible for building high-performance user-space driver frameworks and runtime interfaces that enable efficient communication and data flow between applications and our device via the kernel driver. You will work on key components including user-space APIs, command queues, memory orchestration, and multi-device management to enable scalable ML workloads.
User-Mode Driver Development: Architect and implement high-performance user-space driver libraries for Linux. This includes designing scalable abstractions for multi-device and multi-card systems, and enabling efficient interaction with PCIe devices through kernel interfaces.
Device Discovery & Topology Management: Design and implement mechanisms for PCIe device discovery, enumeration, and logical grouping across multiple endpoints and cards. Develop topology-aware abstractions for managing devices in complex multi-card and switched environments.
Custom Protocol Design: Design and implement a custom, NVMe-like command and control protocol in user space. You'll be responsible for the host-side orchestration, including:
- Command Queues: Manage submission and completion queue abstractions and request tracking
- Command Orchestration: Implement tag-based request management, async/sync execution, and callback handling
- Event Handling: Design event-driven completion handling using poll/eventfd mechanisms
User ↔ Kernel Interface Integration: Develop efficient interaction with the kernel driver using ioctl, mmap, and event mechanisms. Translate high-level runtime requests into kernel-mediated operations while maintaining performance and isolation.
Memory & Dataflow Management: Design user-space data movement strategies, including DMA buffer lifecycle management, memory pinning workflows, and zero-copy data paths for efficient host-device communication.
Multi-Application Support: Enable concurrent multi-process access to devices using per-file descriptor resource management, ensuring isolation, scalability, and robustness across workloads.
Collaboration: Work closely with runtime, kernel driver, firmware, and hardware teams to define clean interfaces and deliver a cohesive, high-performance end-to-end solution.
Make Your Resume Now