Summary
Overview
Work History
Education
Skills
Certification
Languages
Timeline
Generic

Praful Jain

Indore

Summary

A visionary data engineering leader with over 9+ years of experience architecting and scaling modern data platforms across healthcare and Travel industries. Expert in designing high-impact, cost-efficient solutions using Azure, Databricks, Snowflake, and GCP, with a strong emphasis on real-time analytics, ML/AI integration, and cloud-native architectures.

Overview

10
10
years of professional experience
1
1
Certification

Work History

Lead Software Engineer

Impetus Technologies
07.2024 - Current

Projects: Genesys

Client: McKesson (PSaS)

Domain: Healthcare

Key Responsibilities:

  • Led the architecture of enterprise-grade data platforms using Databricks, incorporating Unity Catalog, Delta Live Tables (DLT), Structured Streaming, and Job/Serverless clusters.
  • Led Rearchitecture effort for PySpark Pipelines.
  • Implemented cost governance strategies, achieving a 25% reduction in storage costs and 30% in compute spend by optimizing clusters and leveraging spot instances.
  • Reduced data pipeline cost by 36% by implementing a DLT-based solution.
  • Improved execution time by 53% through best practices such as Adaptive Query Execution (AQE) and use of Photon clusters.
  • Built a dashboard to track costs across various use cases for better financial visibility.
  • Led a Databricks App PoC that enabled users to interact with Gold layer data using natural language prompts.
  • Built POC for RAG application uses in the Impetus internal accelerator.
  • Delegated tasks to junior team members after coordinating with the Scrum Master for sprint planning.
  • Bridged gaps between technical requirements and business objectives through effective communication and collaboration with stakeholders.

Module Lead Software Engineer

Impetus Technologies
07.2023 - 06.2024

Projects: Environmental Social & Governance (ESG).

Client: McKesson (PSaS)

Domain: Healthcare
Key Responsibilities:

  • Collaborated with stakeholders to gather and analyze business and technical requirements, ensuring alignment with project goals.
  • Built end-to-end data pipelines to ingest data from SFTP, Azure SQL using Azure Data Factory (ADF).
  • Implemented Delta Live Tables (DLT) in Databricks to process data in ADLS, following the medallion architecture (bronze, silver, gold).
  • Designed and enforced data quality checks using DLT expectations across all layers to ensure data reliability.
  • Developed reusable notebooks to automate bronze and silver table creation, enhancing code maintainability and efficiency.
  • Built metadata-driven ADF pipelines to support dynamic ingestion from diverse sources such as Blob Storage, SFTP, and Azure SQL Server.
  • Led a team of 5 engineers, assisting with issue resolution and guiding technical implementation.

Senior Software Engineer

Impetus Technologies
12.2022 - 05.2023

Projects: Tranvoyant

Client: McKesson

Key Responsibilities:

  • Established Snowflake storage integration to securely ingest landing data from ADLS.
  • Implemented Snowflake notification integration to automate Snowpipe triggers upon new file arrivals.
  • Configured Snowpipe to continuously load raw data from ADLS into the Snowflake intake table.
  • Developed and maintained a Snowflake stored procedure to transform, cleanse, and upsert data into the final target table.
  • Maintained comprehensive documentation of development work, facilitating knowledge sharing among team members.
  • Improved execution time by 30% in phase 2 delivery.
  • Conducted data modeling, performance and integration testing.

Senior Software Engineer

Impetus Technologies
05.2021 - 12.2022

Projects: McKesson Business Analytics(MBA)

Client: McKesson

Key Responsibilities:

  • Utilized SnowSQL and Sqoop to extract data from Snowflake, Azure SQL, Oracle, and Blob Storage into Hive/Spark warehouse.
  • Applied data processing and transformations using Hive and PySpark; loaded processed data into Elasticsearch indices for business use.
  • Built end-to-end data pipelines to load data from various sources and store results in Elasticsearch.
  • Authored Oozie workflows for daily execution of data pipelines.
  • Created custom Elasticsearch mappings using Python scripts.
  • Developed a Power BI dashboard to ingest and visualize data from Blob storage.
  • Implemented Spark jobs to call SOAP APIs with dynamic inputs and store results in external Hive tables.
  • Optimized over 60 Spark jobs, reducing processing time by one-third.

Senior Software Engineer

Impetus Technologies
01.2019 - 04.2021

Projects: Gaps For Growth(G4G)

Client: McKesson

Key Responsibilities:

  • Provisioned a transient Google Cloud Dataproc cluster to support on-demand big data processing workflows, optimizing cost and resource usage.
  • Developed a Google Cloud Function to programatically start and stop the Dataproc cluster based on workflow execution, ensuring automation and resource efficiency.
  • Extracted data from Snowflake and securely loaded it into Google Cloud Storage (GCS) buckets for downstream processing.
  • Designed and implemented an Oozie workflow to orchestrate Hive-based data transformation tasks on the Dataproc cluster.
  • Performed data transformation and enrichment using Hive queries and stored the results in intermediate tables.
  • Exported the final processed data from GCS/Hive back into Snowflake tables for business analytics and reporting.

Java Developer

PATH India Infotech
05.2015 - 07.2018
  • TRS Is Web based project, based on spring and hibernate framework. This project provides around 86 Reports for client including daily activity of employees and it is integrated with ERP System.
  • It Contains around fifteen module for users and we also use privilege for different user, this project is used for reporting purpose, we are using jasper reports for and Project also have a some advance feature like auto email, auto message etc.
  • Developed JSP and Controller Page (Using Spring Annotation over POJO)
  • Development of login module, security module, user module, cash up module, Float module, rate module and many types of jasper reports.
  • Use Ajax in login module to refresh Data table in HTML form without Reloading HTML Page

Education

B.Tech - Computer Science

Lakshmi Narain College Of Technology(LNCT)
Indore
07.2014

Skills

Programming Language : Python, Java

Cloud Platforms

Microsoft Azure: Azure Data Factory(Ingestion Pipeline), Azure HDinsight, Event Hub, Key Vault, ADLS, ADLS Gen2, Logic Apps

Google Cloud: Dataproc, Cloud Functions, Pub Sub, Cloud Scheduler

Data Management

Databricks: Delta Table, DLT, Unity Catalog, Databricks Asset Bundle, DQX, Vector Search Index

Snowflake: Expertise in SnowSQL, Streams, Tasks, Procedures, CDC, Storage Integration, Notification Integration

Big Data Technologies:

  • Proficient in Hive, Sqoop, Oozie ,PySpark for data processing,Orchestration and analytics

File & Table Formats:

  • File Formats: Parquet, CSV, ORC
  • Table Formats: Delta, Iceberg

ETL Tool:

  • Matillion

Database Management:

  • SQL: Expertise in ANSI SQL MYSQL , Oracle, SQL Server, SAP HANA for data querying and manipulation
  • No SQL: Elastic Search
  • Vector Database: Databricks Vector Store

Version Control:

  • GIT, GitHub Actions, Azure DevOps (ADO)

Certification

  • Databricks Data Engineer Associate.
  • Databricks Data Engineer Professional.
  • Google Cloud Certified Professional Data Engineer.
  • Databricks Generative AI Associate.

Languages

English
Advanced (C1)
Hindi
Bilingual or Proficient (C2)

Timeline

Lead Software Engineer

Impetus Technologies
07.2024 - Current

Module Lead Software Engineer

Impetus Technologies
07.2023 - 06.2024

Senior Software Engineer

Impetus Technologies
12.2022 - 05.2023

Senior Software Engineer

Impetus Technologies
05.2021 - 12.2022

Senior Software Engineer

Impetus Technologies
01.2019 - 04.2021

Java Developer

PATH India Infotech
05.2015 - 07.2018

B.Tech - Computer Science

Lakshmi Narain College Of Technology(LNCT)
Praful Jain