Available July 2026  ·  NYC / Boston / Seattle / Remote

Data & AI engineer
building systems that ship and survive_

I build end-to-end data platforms — from ingestion and ELT to ML models and serving layers. Currently at CGI (Lafayette), shipping BillingShield, a healthcare payment-integrity platform on 10M+ CMS Medicare claims.

Vedant Achole

Vedant Achole

Lafayette, LA

10M+
Claims processed
$2.3B
Anomalies surfaced
40%
Lift over baseline
1,707
LinkedIn followers
Databricks
PySpark
AWS
dbt
Python
SQL
Delta Lake
FastAPI
Airflow
Snowflake
Azure
XGBoost
SHAP
Dagster
FAISS
LLMs
Databricks
PySpark
AWS
dbt
Python
SQL
Delta Lake
FastAPI
Airflow
Snowflake
Azure
XGBoost
SHAP
Dagster
FAISS
LLMs
Databricks
PySpark
AWS
dbt
Python
SQL
Delta Lake
FastAPI
Airflow
Snowflake
Azure
XGBoost
SHAP
Dagster
FAISS
LLMs
01 — Selected Work

Three projects. Three real decisions.

Every project on my resume has a technical decision I'd defend in an interview. These are the ones that shaped how I think about data systems.

In progress2026 — Solo · 10 weeks

BillingShield — a healthcare payment-integrity platform, built end-to-end.

Full medallion pipeline on CMS Medicare data — 10M+ provider-procedure claims flowing through Bronze → Silver → Gold Delta tables on Databricks, a PySpark + dbt transformation layer, an XGBoost fraud classifier with SHAP explanations, served through FastAPI and a Streamlit dashboard.

The decision I'd defend

Splitting train/test at the NPI level, not the row level. Row-level splits leak provider identity and inflate accuracy 10+ points. The kind of thing that looks fine in a notebook and breaks in production.

DatabricksPySparkdbtDelta LakeXGBoostSHAPFastAPIStreamlitDagster
Read on GitHub
BillingShield end-to-end architecture diagram
Shipped2026 — Solo · 3 weeks

Healthcare claims analytics — AWS medallion on Parquet.

Production ELT on AWS Glue (PySpark) across four normalized claims tables. Joined data, applied window functions for provider rankings, computed derived fields, and delivered six Gold-layer KPIs in Parquet — with sub-second Athena query performance on large-scale claims.

The quiet win

Automated schema detection via Glue Crawlers cataloged 10 tables. That's the kind of plumbing nobody notices until it isn't there — and it's where most pipelines rot.

AWS GluePySparkS3AthenaParquetMedallion
Read on GitHub
AWS GLUE PIPELINES3raw claims · 4 tablesGlue + PySparkjoin · window fn · rankAthenasub-second queriesSIX GOLD KPIs01 — 229 flagged claims · $25M+02 — $88.9M cancer treatment03 — 193 critical risk patients04 — top 1% high-cost claims05 — provider rankings06 — regional exposure
Case study2025 — Solo · 2 weeks

LLM-powered resume matching, demoed to CGI senior leadership.

Built an end-to-end RAG pipeline — dense vector search in FAISS plus GPT-4 re-ranking — to semantically match 3,000 candidate profiles against open roles. Live proof-of-concept to CGI senior leadership: 40% improvement in candidate relevance over keyword ATS.

What I kept

The code is archived. What I kept is the lesson that AI in enterprise is a communication problem as much as a technical one. A model that doesn't explain itself is a model nobody adopts.

OpenAI GPT-4FAISSRAGHuggingFaceSentence Transformers
Code archived · Case study only
VECTOR SPACEquerymatchFAISS · 3K VECTORS · TOP-K RERANK40%LIFT OVER KEYWORD ATS
02 — About

The long version.

I grew up in Maharashtra, moved to Pune for a BTech in Artificial Intelligence, then to Boston for an MS in Management at Questrom. Now in Lafayette, Louisiana — consulting at CGI by day, building healthcare data systems by night.

The short version of my career is that I kept following one question: where does the data actually come from, and why does it break? That led from ML research in undergrad to data quality work at KPMG, then to a management degree (because I wanted to understand why companies make the data decisions they do), and now to building data and AI platforms full-time.

I'm looking for a Data Engineering or AI Engineering role where the work is real — healthcare, fintech, anywhere the numbers matter. I care about pipelines that hold up, documentation people can actually read, and communicating technical work to non-technical audiences.

Off the clock: cricket on Saturdays, vlogs on YouTube, eggs most mornings, and ~23 hours a week of deliberate practice toward being genuinely good at this craft.

Education

MS Management

Boston University, Questrom
Director's Honors · 2025

BTech, AI & Data Science

VIIT, Pune
9.03 / 10 · 2024

Experience

CGI — Consultant

Sep 2025 — present

KPMG — Analyst

Feb 2024 — Aug 2024

Certifications

Databricks Fundamentals
AWS + Azure Essentials
HubSpot Digital Marketing
03 — Off-screen

Being a whole person.

Saturdays are for cricket. Evenings sometimes become vlogs. I think being a whole person is part of being a decent engineer.

YouTube · Life by Vedant Achole

Life by Vedant Achole

Cricket · vlogs · @vedantacholee ↗

"Started by engineers for just their backchodi and enjoyment."

Into right now

  • i.

    F1 race data — building a telemetry-analysis side project to sit alongside BillingShield.

  • ii.

    Cricket league in Louisiana. Saturdays are sacred — catch my practice shorts on YouTube.

  • iii.

    Push/pull/legs split, 2,200 kcal, eggetarian and protein-obsessed.

  • iv.

    Kimball's Data Warehouse Toolkit. Still relevant in 2026.

04 — Contact

Let's talk.

Looking for a Data Engineering or AI Engineering role where I can ship real pipelines and learn from senior engineers. Available July 2026. NYC, Boston, Seattle, or strong remote teams.