Wheeloh

Semantic Car Search:
A Vector-Based Approach

Leveraging High-Dimensional Embeddings for Intelligent Automotive Discovery

ThéophileWheeloh EngineeringNovember 2025

Abstract

We present a novel semantic search system for automotive data, leveraging state-of-the-art vector embeddings to enable intelligent query understanding beyond traditional keyword matching. Our approach indexes 22,180 vehicle models into a 1,536-dimensional vector space using a high-performance transformer model.

The system achieves a 75.6% validation accuracy on 897 test queries, demonstrating robust performance in matching user intent to relevant vehicles. Through efficient batch processing and local vector comparisons, we reduce search latency from ~7 minutes to ~2 seconds for bulk operations.

1. Introduction

1.1 Motivation

Traditional automotive search systems rely on exact string matching, failing to capture semantic relationships between queries. A user searching for "Beamer M3" should find BMW M3 results, yet keyword-based systems struggle with:

  • Brand synonyms (Beamer → BMW, Merco → Mercedes)
  • Model variations (911 Turbo → 911 Turbo S)
  • Typographical errors (Ferari → Ferrari)
  • Cross-language queries (voiture sportive → sports car)

1.2 Problem Statement

Given a database of 22,180 automotive models and user queries in natural language, design a system that:

  • Understands semantic intent beyond literal text
  • Scales efficiently for real-time search
  • Maintains accuracy across diverse query patterns
  • Minimizes computational cost

2. Methodology

2.1 Vector Embeddings

We employ a state-of-the-art transformer model to transform textual car descriptions into dense 1,536-dimensional vectors.

# Example: Vector representation
text = "Ferrari 458 Italia"
embedding = model.encode(
input=text,
dimensions=1536
)
# Result: [0.021, -0.034, 0.156, ...] (1536 dimensions)

2.2 Similarity Computation

Semantic similarity between a query Q and document D is computed using cosine similarity in the embedded space. Since the embeddings are pre-normalized, this simplifies to a dot product, enabling rapid batch comparisons via NumPy matrix operations.

3. Key Metrics

22,180
Indexed Vehicles
1,536
Dimensions
75.6%
Accuracy
~2s
Bulk Search Time

4. Interactive Demonstration

Explore the 22,180-vehicle embedding space reduced to 2D via PCA. Search for any car model to see semantic clustering in action.

5. Experimental Results

Match Distribution

Confidence Levels

ApproachTime (897 queries)Cost
Individual API Calls~7 min~$0.25
Our Approach (Batch + Local)~2s~$0.00

6. Conclusion & Future Work

We successfully demonstrated a production-ready semantic search system for automotive data, achieving 75.6% validation accuracy while reducing search latency by 210× compared to naive approaches.

Future Directions

  • Multimodal Search: Incorporate vehicle images via CLIP embeddings
  • Fine-tuning: Domain-specific embedding models for automotive terminology
  • Real-time Updates: Incremental indexing for new vehicle releases