Python+pyspark
Company Name
Infosys Ltd ( Bangalore )
Job Description
Job Description:
We are hiring an experienced Python + PySpark Developer with 3+ years of experience in data engineering and distributed computing using modern big data tools. You will work as part of the Infosys Consulting Team, contributing across the entire data pipeline ? from analysis and design to deployment ? helping clients solve business problems using scalable, high-performance solutions.
Key Responsibilities:
-
Collaborate with consulting and development teams through the full project lifecycle, including:
problem definition, effort estimation, diagnosis, solution generation, and deployment. -
Explore and recommend solution alternatives through research, vendor evaluations, and proof-of-concepts (POCs).
-
Translate business requirements into technical specifications and define the to-be architecture.
-
Design, develop, and optimize data pipelines using Python and PySpark.
-
Integrate structured and unstructured data from various data sources.
-
Apply best practices in data wrangling, transformation, and performance tuning.
-
Troubleshoot issues, perform root cause analysis, and recommend solutions.
-
Contribute to internal initiatives, innovation projects, and knowledge-sharing sessions.
Technical Skills Required:
-
Python Programming
-
PySpark (RDDs, DataFrames, Spark SQL)
-
Big Data Technologies (HDFS, Hive, Spark)
-
SQL and Data Query Optimization
-
Git / Version Control
-
Debugging, Logging, and Unit Testing in Python
Preferred Skills:
-
Experience with cloud platforms (AWS, Azure, or GCP)
-
Data ingestion tools (Kafka, Sqoop, etc.)
-
Exposure to CI/CD tools (Jenkins, Airflow)
-
Knowledge of Agile/Scrum methodology
-
Working knowledge of Linux/Unix environments
-