Python+pyspark
Company Name
Infosys Ltd ( Chennai )
Job Description
Job Description:
We are hiring an experienced Python + PySpark Developer with 3+ years of experience in data engineering and distributed computing using modern big data tools. You will work as part of the Infosys Consulting Team, contributing across the entire data pipeline ? from analysis and design to deployment ? helping clients solve business problems using scalable, high-performance solutions.
Key Responsibilities:
Collaborate with consulting and development teams through the full project lifecycle, including:
problem definition, effort estimation, diagnosis, solution generation, and deployment.Explore and recommend solution alternatives through research, vendor evaluations, and proof-of-concepts (POCs).
Translate business requirements into technical specifications and define the to-be architecture.
Design, develop, and optimize data pipelines using Python and PySpark.
Integrate structured and unstructured data from various data sources.
Apply best practices in data wrangling, transformation, and performance tuning.
Troubleshoot issues, perform root cause analysis, and recommend solutions.
Contribute to internal initiatives, innovation projects, and knowledge-sharing sessions.
Technical Skills Required:
Python Programming
PySpark (RDDs, DataFrames, Spark SQL)
Big Data Technologies (HDFS, Hive, Spark)
SQL and Data Query Optimization
Git / Version Control
Debugging, Logging, and Unit Testing in Python
Preferred Skills:
Experience with cloud platforms (AWS, Azure, or GCP)
Data ingestion tools (Kafka, Sqoop, etc.)
Exposure to CI/CD tools (Jenkins, Airflow)
Knowledge of Agile/Scrum methodology
Working knowledge of Linux/Unix environments