Knowledge Space

Workbench

A curated collection of published articles, research explorations, and technical insights — the thinking behind the engineering.

Sharing practical knowledge on LLM optimization, edge AI deployment, and system architecture design.

03 Medium articles

03 research items

01 LinkedIn posts

Published Writing

Medium Articles

In-depth technical articles covering LLM optimization, GPU programming, and edge deployment — written for engineers who build.

Medium 8 min read

Semantic Caching with Redis: How to Optimize LLM Cost and Latency

Learn how to reduce redundant LLM API calls using semantic similarity-based caching with Redis, sentence embeddings, and vector search — with a full Python implementation.

RedisLLMSemantic CachingVector Search

Oct 7, 2024 Read on Medium

Medium 6 min read

Building OpenCV with CUDA Support on Jetson Devices

A practical guide for compiling OpenCV with CUDA support on NVIDIA Jetson platforms, covering prerequisites, CMake configuration, build troubleshooting and verification steps.

OpenCVCUDAJetsonEdge AICMake

Mar 3, 2025 Read on Medium

Deep Dives

Research & Exploration

Extended research work and technical explorations — from edge inference frameworks to foundational papers.

Featured Research 12 min read

Adaptive Edge AI Controller

An intelligent edge AI layer for NVIDIA Jetson devices that monitors temperature during real-time YOLO inference and dynamically adjusts workload to reduce throttling, FPS drops, and shutdown risk.

Edge AIControllerİnferenceFOPDTFuzzy Logic

May 21, 2026

Read Research Article

Cover image for Adaptive Edge AI Controller

PDF

Adaptive Edge Research Paper

Senior thesis research document

ResearchPaper

PDF

Model Training Research Paper

A benchmarking study where I trained, compared, and evaluated multiple deep learning models.

ResearchPaper

Social Insights

LinkedIn Posts

Selected technical posts and architecture breakdowns shared with the engineering community.

RAG Architecture Design

End-to-end design of a production-grade Retrieval-Augmented Generation pipeline — covering chunking strategies, embedding model selection, vector store indexing, hybrid retrieval with re-ranking, and prompt engineering patterns for grounded LLM responses.

RAG - Web searchLLM TrainingHybrid Vector SearchMultimodal ChatONNX-INT8

Next Step

Interested in the projects behind the writing?

The Workbench captures the ideas and research. The deeper implementation work lives in projects — take a look or reach out to discuss systems, AI, and engineering.

Explore Projects Start a Conversation