PySpark Developer

Job Description

Responsibilities:
 Design and develop ETL integration patterns using Python on Spark.
 Develop framework for converting existing PowerCenter mappings and to
PySpark(Python and Spark) Jobs.
 Create Pyspark frame to bring data from databases such as DB2, Dynamo, Cosmos,
SQL, etc to Amazon S3.
 Translate business requirements into maintainable software components and
understand impact (Technical and Business)
 Provide guidance to development team working on PySpark as ETL platform
 Makes sure that quality standards are defined and met.
 Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
 Provide workload estimates to client
 Developed framework for Behaviour Driven Development (BDD).
 Migrated On prem informatica ETL process to AWS cloud and Snowflakes
 Implement CICD (Continuous Integration and Continuous Development) pipeline for
Code Deployment
 Data acquisition from internal/external data sources
 Create and maintain optimal data pipeline architecture
 Identify, design, and implement internal process improvements
 Automating manual processes, optimizing data delivery, re-designing infrastructure
for greater scalability.
 Build the infrastructure required for optimal extraction, transformation, and loading
(ETL) of data from a wide variety of data sources like Salesforce, SQL Server, Oracle &
SAP using Azure, Spark, Python, Hive, Kafka and other Bigdata technologies.
 Data QA/QC for data transfer and data lake or data warehouse.
 Build analytics tools that utilize the data pipeline to provide actionable insights into
customer acquisition, operational efficiency and other key business performance
metrics.
 Review components developed by the team members
Technologies:
 AWS Cloud,S3,EC2,Postgre Spark, Python 3.6, Bigdata, Snowflakes, Hadoop,
Kubernetes, Dockers, Airflow, Splunk, DB2,PostgreSQL,CICD, HDFS, MapReduce,
Hive, Kafka, ETL, Oozie, Python

50-100 EUR/hr

File name:

File size: