Data Engineer - SDE 1
Full TimeJob Overview
Fam is India’s first payments app for everyone above 11. FamApp helps make online and offline payments through UPI and FamCard. We are on a mission to raise a new, financially aware generation, and drive 250 million+ young users in India to kickstart their financial journey super early in their life.
We’re reimagining how the next generation experiences fintech—going beyond payments to build a lifestyle brand that blends money, identity, and everyday experiences into one seamless, intuitive journey.
We are looking for a Data Engineer (SDE-1) to join our data team. The ideal candidate will be a play a key role to develop of high performant and scalable Data Lake-house, moving us toward a world of sub-minute data latency and unified batch/streaming compute. This is an engineering-heavy role where you will manage complex CDC flows, optimize distributed query engines and leverage AI to accelerate our development lifecycle.
Technical Priorities
- Real-time CDC: Ownership of high-throughput ingestion from RDBMS to Lakehouse using Debezium, PeerDB.
- Lakehouse Architecture: Designing and optimizing table formats (Iceberg, Delta, Hudi) for both performance and storage efficiency.
- Unified Compute: Developing robust ETL/ELT frameworks in PySpark and Flink (handling both batch and streaming workloads).
- Infrastructure & Ops: Managing data workloads on AWS (EMR, EKS, MSK, S3) and automating everything via Gitlab/Github Actions.
- Query & BI: Tuning Trino or Clickhouse to power real-time dashboards in Metabase, Superset, and PowerBI.
Requirements
- Experience: 1–3 years in Data Engineering, specifically with distributed systems and cloud-native architectures.
- Coding: Expert-level Python/PySpark and SQL.
- Familiarity with Go/Java/Scala is a plus
- Infrastructure: Hands-on experience with AWS (S3, EKS, MSK) and Infrastructure-as-Code.
- Orchestration: Experience with Airflow or Temporal for complex workflow management.
- AI-Native: Proficiency in using AI tools (Claude, Codex, Copilot) to write, test, and document code efficiently.
- Systems Thinking: Ability to explain the trade-offs between different storage formats and processing frameworks.
- Domain Modelling - Should be hands on in designing Domain models for OLAP like Fact, Dimension and types of SCD’s and OBT pattern tables.
- Customer First - Interact with the Product & Key Stakeholders & help them by adding value to the business workflow with data & analytics.
Our Tech Stack
- Ingestion: Debezium, PeerDB, Olake
- Storage: Delta, Iceberg, Hudi (S3-based Lakehouse)
- Compute: PySpark, Flink, EMR, EKS
- Streaming: MSK (Kafka)
- Query Engines: Trino, Clickhouse
- Orchestration: Airflow, Temporal
- DevOps: Gitlab, Github Actions, Terraform
- Visualization: Metabase, Superset, Tableau, PowerBI
Fam is India’s first payments app for everyone above 11. FamApp helps make online and offline payments through UPI and FamCard. We are on a mission to raise a new, financially aware generation, and drive 250 million+ young users in India to kickstart their financial journey super early in their life.
We’re reimagining how the next generation experiences fintech—going beyond payments to build a lifestyle brand that blends money, identity, and everyday experiences into one seamless, intuitive journey.
We are looking for a Data Engineer (SDE-1) to join our data team. The ideal candidate will be a play a key role to develop of high performant and scalable Data Lake-house, moving us toward a world of sub-minute data latency and unified batch/streaming compute. This is an engineering-heavy role where you will manage complex CDC flows, optimize distributed query engines and leverage AI to accelerate our development lifecycle.
Technical Priorities
- Real-time CDC: Ownership of high-throughput ingestion from RDBMS to Lakehouse using Debezium, PeerDB.
- Lakehouse Architecture: Designing and optimizing table formats (Iceberg, Delta, Hudi) for both performance and storage efficiency.
- Unified Compute: Developing robust ETL/ELT frameworks in PySpark and Flink (handling both batch and streaming workloads).
- Infrastructure & Ops: Managing data workloads on AWS (EMR, EKS, MSK, S3) and automating everything via Gitlab/Github Actions.
- Query & BI: Tuning Trino or Clickhouse to power real-time dashboards in Metabase, Superset, and PowerBI.
Requirements
- Experience: 1–3 years in Data Engineering, specifically with distributed systems and cloud-native architectures.
- Coding: Expert-level Python/PySpark and SQL.
- Familiarity with Go/Java/Scala is a plus
- Infrastructure: Hands-on experience with AWS (S3, EKS, MSK) and Infrastructure-as-Code.
- Orchestration: Experience with Airflow or Temporal for complex workflow management.
- AI-Native: Proficiency in using AI tools (Claude, Codex, Copilot) to write, test, and document code efficiently.
- Systems Thinking: Ability to explain the trade-offs between different storage formats and processing frameworks.
- Domain Modelling - Should be hands on in designing Domain models for OLAP like Fact, Dimension and types of SCD’s and OBT pattern tables.
- Customer First - Interact with the Product & Key Stakeholders & help them by adding value to the business workflow with data & analytics.
Our Tech Stack
- Ingestion: Debezium, PeerDB, Olake
- Storage: Delta, Iceberg, Hudi (S3-based Lakehouse)
- Compute: PySpark, Flink, EMR, EKS
- Streaming: MSK (Kafka)
- Query Engines: Trino, Clickhouse
- Orchestration: Airflow, Temporal
- DevOps: Gitlab, Github Actions, Terraform
- Visualization: Metabase, Superset, Tableau, PowerBI
Make Your Resume Now