Network Simulation Engineer
Salaried, full-timeJob Overview
Position Overview
We are seeking a highly motivated Network Simulation Engineer to lead the simulation and analysis of AI communication workloads (e.g. collective communications) across various data center network topologies. In this role, you will apply network simulation tools to model real-world AI applications—including LLMs and DLRMs—to inform architectural decisions across Eridu’s product development lifecycle and to showcase our value to prospective customers and investors.
You will collaborate cross-functionally with customers, ASIC designers, and simulation tool providers to optimize performance, influence design, and deliver transformative AI networking solutions.
Responsibilities
- Model AI Workloads: Simulate communication patterns of distributed AI workloads (e.g., LLMs, DLRMs) across diverse network topologies to analyze performance and scalability.
- Drive Architecture Optimization: Work with customers to evaluate their AI workloads and provide recommendations for topology design, protocol tuning, and system architecture.
- Influence ASIC Design: Collaborate with the internal ASIC and architecture teams by providing simulation-based insights that shape chip design for optimized AI traffic flows.
- Tool Development & Partnership: Interface with simulation tool providers (who have optimized versions of NS-3, ASTRA-sim, etc.) to customize, tune, and enhance modeling frameworks for Eridu’s specific requirements and to operate these tools to run simulations.
- Documentation & Communication: Create clear and compelling reports, documentation, and presentations to communicate insights to technical and non-technical stakeholders.
Qualifications
- MSc or PhD in Computer Science, Electrical Engineering, or a related field with some specialization in AI/ML communications or equivalent hands-on experience
- Strong experience with network simulation tools such as NS3, OMNeT++, or custom-built simulators.
- Familiarity with distributed training frameworks (e.g., PyTorch, TensorFlow), collective communication libraries (e.g., NCCL, RCCL), and GPU programming (CUDA or ROCm).
- Deep understanding of frontier model architectures, parallelism approaches and operational functionality
- Deep understanding of Ethernet, InfiniBand, and high-performance data center networking technologies.
- Solid grasp of AI system architecture, including compute, memory, and interconnect bottlenecks in large-scale training/inference clusters.
- Strong programming skills in C++ and Python.
- Clear and confident communication skills, both written and verbal.
- 2+ years of relevant experience preferred; exceptional early-career candidates will also be considered.
Why Join Us?
At Eridu, you’ll have the opportunity to shape the future of AI infrastructure, working with a world-class team on groundbreaking technology that pushes the boundaries of AI performance. Your contributions will directly impact the next generation of AI infrastructure solutions, transforming the performance of AI data centers.
The starting base salary for the selected candidate will be established based on their relevant skills, experience, qualifications, work location, market trends, and the compensation of employees in comparable roles.
Notice to Recruiting Agencies
Eridu does not accept unsolicited resumes or candidate profiles from staffing agencies or third-party recruiters. Any candidate submitted to Eridu without prior written authorization from our recruiting team will be considered unsolicited and will become the property of Eridu. Eridu reserves the right to pursue and hire such candidates without any obligation to pay fees. Recruiting agencies are expressly instructed not to contact hiring managers, employees, or executives regarding open positions.
Make Your Resume Now