Data Engineer (Azure)
Full-time Mid-Senior LevelJob Overview
About You
You are a Data Engineer passionate about building scalable, production-grade data ecosystems in Azure. You thrive when transforming complex, fragmented data into reliable analytical assets that drive meaningful business decisions. You operate with autonomy, bring strong architectural foundations, and champion high-quality engineering practices across data modeling, pipelines, orchestration, and performance optimization.
You love simplifying complexity — whether converting legacy SQL logic into distributed PySpark jobs, designing canonical data layers, or ensuring data integrity and governance across systems. Collaboration, clarity, and business impact fuel your work.
You bring to Applaudo the following competencies:
- Bachelor’s Degree in Computer Science, Data Engineering, Software Engineering, or related field — or equivalent experience.
- 5+ years of experience designing, building, and maintaining production data pipelines at scale.
- Expert-level SQL: window functions, query performance, partitioning, execution plan tuning.
- Strong knowledge of data modeling: star/snowflake, facts & dimensions, SCDs, curated/canonical layers.
- Advanced hands-on experience with Python for Data Engineering, including PySpark transformations at scale.
- Strong experience working with Azure data services:
- Azure Data Factory
- Azure Databricks
- ADLS Gen2 / Azure Storage
- Azure SQL
- Azure Logic Apps (orchestration)
- Experience building incremental ETL/ELT pipelines: dependencies, CDC, retries, failure handling.
- Hands-on experience optimizing big data workloads: partitioning strategies, Delta Lake performance (Z-order, OPTIMIZE, VACUUM).
- Solid experience integrating REST APIs and handling schema drift & pagination.
- Proficiency with Git workflows and CI/CD for data codebases.
- Strong communication, collaboration, and autonomy in Agile environments.
- (Nice to have): Experience with Snowflake, PostgreSQL, cost optimization in cloud workloads.
You will be accountable for the following responsibilities:
- Design and implement conceptual, logical, and physical data models aligned to analytics and business needs.
- Build, optimize, and monitor end-to-end ETL/ELT workflows using ADF, Databricks, and Logic Apps.
- Convert existing SQL and legacy pipeline logic into resilient and scalable PySpark jobs.
- Automate performance best practices: partitioning, indexing, schema management, storage optimization.
- Ensure data quality, lineage, governance, and security across environments.
- Establish observability: logging, DQ checks, alerting, and SLA monitoring.
- Collaborate closely with analysts, architects, and business stakeholders on requirements and data contracts.
- Maintain cost-efficient cloud architectures and continuously improve pipeline reliability and velocity.
Make Your Resume Now