Job Title : Senior Data Engineer
Responsibilities:
- Design and Implement Data Pipelines: Create robust data pipelines to seamlessly integrate data from various sources into a unified Data Lakehouse.
- Ensure Data Quality: Develop and implement data quality pipelines to validate and maintain the accuracy of data, ensuring the creation of trusted datasets.
- Data Integration: Utilize your expertise in various data integration patterns, including ETL, ELT, Pub/Sub, and Change Data Capture, to connect and transform data efficiently.
- Programming: Apply your extensive programming skills in Python, ANSI SQL, PLSQL, and TSQL to develop and maintain data pipeline architectures.
- Data Engineering Packages: Leverage common Python Data Engineering packages such as pandas, Numpy, Pyarrow, Pytest, Scikit-Learn, and Boto3 to enhance data processing.
- Software Development Practices: Apply best software development practices, including Design Principles and Patterns, Testing, Refactoring, CI/CD, and version control, to ensure data solutions are reliable and maintainable.
- Data Lakehouse Implementation: Utilize your experience to implement and manage a Data Lakehouse using technologies like Apache Iceberg or Delta Lake.
Qualifications:
- 5+ years of experience as a Data Engineer, with a strong background in designing and maintaining data pipeline architectures.
- Proficiency in programming languages such as Python, ANSI SQL, PLSQL, and TSQL, with a minimum of 5 years of experience in these languages.
- Expertise in various data integration patterns, including ETL, ELT, Pub/Sub, and Change Data Capture.
- Familiarity with Python Data Engineering packages, including pandas, Numpy, Pyarrow, Pytest, Scikit-Learn, and Boto3.
- Strong knowledge of software development practices, including Design Principles and Patterns, Testing, Refactoring, CI/CD, and version control.
- Experience in implementing and managing Data Lakehouse solutions, preferably using Apache Iceberg or Delta Lake.
- Proficiency in modern data platform technologies like Apache Airflow, Kubernetes, and S3 Object Storage.
Preferred experience with Dremio and Airbyte.
#10802