Keya Dobriyal- Projects completed during Academics

Strategic Data Science Portfolio

B.Tech (Honors) Computer Science | Specialization in Data Science

Specializing in Computer Science with an Honors distinction in Data Science, I architect systems that bridge the gap between algorithmic complexity and strategic decision-making.

My work encompasses the end-to-end data lifecycle—from engineering high-performance preprocessing pipelines and synthetic features to deploying optimized, production-grade ensemble models. By combining mathematical rigor with a "performance-first" mindset, I focus on uncovering the high-fidelity signals often buried within the noise of complex, high-dimensional datasets.

These projects reflect more than just technical competence; they represent my passion for building scalable technology that transforms raw data into a strategic asset. Whether optimizing for low-latency execution or communicating insights through advanced visualization, my goal is to drive impact where data meets technology.

Malware Detection using ML

“Malware doesn’t just exploit code — it exploits predictability. Machine learning breaks that pattern.”

This study explores the efficacy of this integration, utilizing an open-source dataset to train and evaluate various ML algorithms for identifying complex malware signatures. By leveraging the ability of ML models to learn from data, identify hidden patterns, and adapt to new threats, researchers and practitioners can significantly enhance cybersecurity defenses. The study provides an overview of common machine learning approaches for malware detection, compares their performance, and highlights the potential of ML-driven systems to strengthen future malware defense strategies.

Read Article published on Medium
Read Article as pdf

Restaurant Recommendation System

“Smart Dining Decisions — A Content-Based Restaurant Recommendation System for Personalized, Data-Driven Choices.”

A Restaurant Recommendation System is proposed to address the challenge users face in choosing from the vast and growing number of restaurant options. Leveraging content-based filtering techniques, the system analyzes user preferences, locality, ratings, cuisine, and past experiences to provide personalized suggestions. The primary goal is to save user time and simplify the decision making process by filtering a large dataset and providing the most relevant restaurant recommendations. Developed using Python libraries, the system increases user convenience and business retention rates.

Read Article published on Medium
Read Article as pdf

EcoPool

“A Smart Carpooling Application — Engineering a Greener, Secure, Sustainable, and Female-Friendly Digital Transit Commute”

EcoPool is a smart, user-friendly, and lightweight web-based car ride booking application designed to bridge the gap between urban transportation needs to save time, convenience, safety and environmental sustainability by reducing carbon footprint. Rapid urbanization leads to an increase in private vehicle usage, hence most of the cities face escalating challenges regarding carbon emissions, fuel consumption, and traffic congestion. EcoPool addresses these issues by providing a digital platform that seamlessly connects drivers and riders, encouraging solo travel to communal carpooling.

Read Article published on Medium
Read Article as pdf

Kaggle Playground Competitions

The Competitive Edge: Kaggle Global Community

Beyond formal education, I immerse myself in the Kaggle Playground ecosystem to stay at the forefront of the field. By engaging with these global challenges, I refine my ability to extract signals from noisy data and build generalizable models. Analyzing top-tier notebooks and participating in discussion forums has allowed me to adopt industry-standard best practices in Pythonic data manipulation and predictive storytelling.

Predicting Diabetes Challenge

December 2025

“Playground Series - Season 5, Episode 12”

Goal: Predict the probability that a patient will be diagnosed with diabetes.

Rank on Leader Board - 2330 / 4386

View my Notebooks

Predicting Student Test Score

January 2026

“Playground Series - Season 6, Episode 1”

Goal: Predict the probability for the exam score.

Rank on Leader Board - 646 / 4317

View my Notebooks

Predicting Heart Disease

February 2026

“Playground Series - Season 6, Episode 2”

Goal: Predict the likelihood of heart disease.

Rank on Leader Board - 440 / 4013

View my Notebooks

Predicting Customer Churn

March 2026

“Playground Series - Season 6, Episode 3”

Goal: Predict the likelihood of customer churn

Currently ongoing challenge

View my Notebooks

🛠️ Technical Expertise

Quantitative Analysis & Modeling

Predictive Architectures: Expertise in high-performance Gradient Boosting Machines (XGBoost, LightGBM, CatBoost) and Bagging techniques (Random Forest).

Ensemble Optimization: Advanced implementation of Weighted Blending and Stacking using solvers (e.g., scipy.optimize) to minimize variance and maximize AUC.

Stochastic Stability: Implementation of Multi-Seed Averaging and Nested Cross-Validation to ensure model robustness against market noise.

Software Engineering & Performance

Production-Grade Code: Focus on Model Persistence (joblib, pickle) and clean execution (custom logging and verbosity suppression).

Computational Optimization: Experience in High-Performance Computing concepts, including multi-threading optimization (force_row_wise) and memory-efficient data processing.

Version Control & CI/CD: Proficient in Git for collaborative development and managing the machine learning lifecycle.

Data Engineering & Visualization

Feature Engineering: Deriving synthetic signals from high-dimensional data, including interaction features, frequency encoding, and target-based probability mapping.

Data Lifecycle: Full-stack workflow management—from ETL processes (Pandas, SQL) to Inferential Statistical Testing.

Insight Communication: Crafting professional-grade visualizations using Seaborn and Matplotlib to translate complex model outputs into actionable business intelligence.

Download Resume

Contact Me

Currently, I am seeking opportunities to apply my skills in data analytics, business intelligence, and machine learning while continuing to grow through hands-on projects, collaborations, and real-world problem solving.

📌 Open to internships, research roles, and collaborative opportunities in data-driven domains.

My Interests: Data Analytics | Predictive Modeling | Business Intelligence | AI for Insights

My Current Location

Amity University

Noida, Uttar Pradesh, INDIA

Phone Number

+91 79817 Five Six 581

Email Address

keya@keyadobriyal.in

keyadobriyal(@)gmail(dot)com