Areas of Interest

Data Engineering

I enjoy building scalable data pipelines and integrating diverse data sources to support analytics and machine learning workflows. My experience includes ETL pipeline automation, data modeling, and warehousing using tools like Snowflake, Airflow, and Python.

Data Analysis

I am passionate about uncovering insights through data. From cleaning and transforming datasets to building dashboards and reports, I use tools like Excel, Tableau, and SQL to help stakeholders make informed decisions backed by data.

Machine Learning

I explore machine learning solutions for classification, regression, and prediction tasks. I’ve worked with Scikit-learn, TensorFlow, and PyTorch to train and evaluate models that drive business impact.

Natural Language Processing

I have a deep interest in language models and text analysis. I have used NLP techniques for tasks like summarization, classification, and building both basic and semantic search engines. I use tools like Hugging Face Transformers, NLTK, and vector similarity techniques to develop models that understand text beyond keywords.

Recent Projects

SQL Data Warehouse

Built enterprise-grade SQL data warehouse using Medallion Architecture to process and transform 60K+ raw ERP/CRM records through advanced T-SQL ETL pipelines and comprehensive validation frameworks, achieving 99.9% data quality improvement while delivering performance-optimized star schema that reduced complex analytical query runtime from hours to minutes and enabling real-time business intelligence for data-driven decision-making.

  • T-SQL
  • SQL Server
  • git Git
  • git GitHub
  • draw.io Draw.io

Beauty Product Purchase Pipeline

Architected automated end-to-end data pipeline using Python, Docker-containerized Airflow, and Snowflake to extract, transform, and load beauty purchase data from Google Sheets through weekly orchestrated ETL processes, implementing snowflake schema with 7 dimensions and optimized stored procedures that enabled real-time Tableau analytics for spending trend analysis, budget optimization, and data-driven purchase decision-making.

  • Python Python
  • SQL
  • Snowflake Snowflake
  • Airflow
  • tableau Tableau
  • Docker

Coffee Sales Excel Dashboard

Developed comprehensive Excel analytics dashboard leveraging advanced formulas, pivot tables, and interactive filtering to analyze coffee sales across customers, regions, and time periods, uncovering critical business insights including loyalty program inefficiencies and regional market opportunities that informed strategic recommendations for revenue optimization.

  • excel Excel
  • data-viz Dashboard Design

Sales & Customer Dashboard

Designed and implemented comprehensive Tableau analytics platform featuring dual interactive dashboards for sales performance tracking and customer behavior analysis, enabling stakeholders to identify revenue trends, optimize product strategies, and enhance customer retention through dynamic KPI monitoring, year-over-year comparisons, and actionable insights derived from multi-dimensional data exploration across regions, products, and time periods.

  • tableau Tableau
  • data-viz Dashboard Design

Sentiment Classification on Product Reviews

Built and evaluated five NLP models (FastText, BERT, DistilBERT, RoBERTa, XLNet) for sentiment classification of product reviews, leveraging pre-trained transformer architectures and fine-tuning to achieve >90% accuracy. Designed a full training and evaluation pipeline with preprocessing, tokenization, and performance benchmarking across models, enabling businesses to automate customer feedback analysis, improve product insights, and enhance customer experience strategies.

  • Python Python
  • huggingface Transformers
  • scikit-learn Flask
  • scikit-learn Scikit-learn
  • PyTorch

Bank Institution Term Deposit Predictive Model

Built a predictive machine learning model using XGBoost and Stratified K-Fold Cross-Validation to identify customers likely to subscribe to bank term deposits. Addressed class imbalance with SMOTE, optimized feature selection with PCA, and compared multiple models (Logistic Regression, MLP, XGBoost) to maximize predictive performance. Achieved 87% accuracy and 94% AUC, enabling the bank to improve targeted marketing campaigns, reduce customer acquisition costs, and increase subscription conversion rates.

  • Python Python
  • Python Pandas
  • Python XGBoost
  • scikit-learn Scikit-learn
  • eda Matplotlib