Dimensionality Reduction in Embedding Spaces: A Comparative Analysis of DCT and SVD for RAG Systems

RAG
dimension-reduction
embeddings
information-retrieval
Comparing DCT and SVD for compressing embeddings in RAG systems while preserving retrieval quality.
Author

Kevin Scott and Ethan Davis

Published

January 20, 2024

Abstract

We present a methodological framework for evaluating dimensionality reduction techniques in dense embedding spaces for Retrieval-Augmented Generation (RAG) systems. Specifically, we examine Discrete Cosine Transform (DCT) and Singular Value Decomposition (SVD) applied to the column space of embedding matrices, followed by quantization. Our evaluation protocol centers on retrieval consistency - measuring whether reduced representations retrieve the same contextual chunks as full-dimensional embeddings. This work was conducted in collaboration with Ethan Davis at SAS Institute Inc.

Introduction

Dense embeddings from transformer-based models typically operate in high-dimensional spaces (d ∈ {384, 768, 1024, 1536}). While these representations capture rich semantic information, their dimensionality poses computational and storage challenges. This work examines column-space reduction techniques that aim to preserve retrieval quality in RAG pipelines.

Theoretical Framework

Column Space Reduction

Given an embedding matrix \(\mathbf{E} \in \mathbb{R}^{n \times d}\) where \(n\) represents the number of documents and \(d\) the embedding dimension, we seek a transformation \(\mathbf{T}: \mathbb{R}^d \rightarrow \mathbb{R}^k\) where \(k \ll d\).

Discrete Cosine Transform (DCT)

The DCT provides an orthogonal transformation that concentrates signal energy in low-frequency components:

\(\mathbf{E}' = \mathbf{E} \mathbf{D}^T\)

where \(\mathbf{D} \in \mathbb{R}^{d \times d}\) is the DCT matrix. We retain the first \(k\) columns of \(\mathbf{E}'\), exploiting the energy compaction property commonly observed in natural signals.

Singular Value Decomposition (SVD)

We apply SVD directly to the embedding matrix for dimensionality reduction:

  1. Compute SVD: \(\mathbf{E} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T\)
  2. Truncate to \(k\) components: \(\mathbf{E}_k = \mathbf{U}_k \boldsymbol{\Sigma}_k \mathbf{V}_k^T\)
  3. Project embeddings: \(\mathbf{E}' = \mathbf{E} \mathbf{V}_k\)

where \(\mathbf{V}_k \in \mathbb{R}^{d \times k}\) contains the first \(k\) right singular vectors, effectively projecting from \(\mathbb{R}^d\) to \(\mathbb{R}^k\).

Quantization Strategies

Post-reduction quantization further compresses representations:

Scalar Quantization

  • Map continuous values to discrete levels
  • B-bit quantization: 2^B levels
  • Consider Lloyd-Max quantizer for optimal level placement based on distribution

Vector Quantization

  • k-means clustering in reduced space
  • Codebook size determines compression ratio
  • Product quantization for large-scale applications

Proposed Evaluation Methodology

Retrieval Consistency Metric

For query \(q\), let: - \(\mathcal{R}_{\text{full}}(q, k)\) = top-\(k\) retrieved chunks using full embeddings - \(\mathcal{R}_{\text{reduced}}(q, k)\) = top-\(k\) retrieved chunks using reduced embeddings

We define retrieval consistency as:

\(\text{RC@}k = \frac{|\mathcal{R}_{\text{full}}(q, k) \cap \mathcal{R}_{\text{reduced}}(q, k)|}{k}\)

Experimental Protocol

  1. Corpus Preparation: Segment documents into semantically coherent chunks
  2. Embedding Generation: Apply pre-trained encoder (e.g., BERT, Sentence-BERT, E5)
  3. Dimensionality Reduction:
    • For PCA: Compute on training subset, apply to full corpus
    • For DCT: Direct application of transformation matrix
  4. Quantization: Apply scalar or vector quantization to reduced embeddings
  5. Evaluation:
    • Sample diverse query set
    • Compare retrieval sets between full and reduced representations
    • Measure RC@k for k ∈ {1, 5, 10, 20, 50}
    • Analyze distribution of retrieval rank changes

Additional Metrics

Beyond retrieval consistency, consider: - Semantic Similarity Preservation: Correlation between cosine similarities in original vs. reduced space - Computational Efficiency: Indexing time, query latency, memory footprint - Reconstruction Error: \(\|\mathbf{E} - \mathbf{E}'\mathbf{T}^{\dagger}\|_F\) where \(\mathbf{T}^{\dagger}\) is the pseudo-inverse

Theoretical Considerations

Information-Theoretic Perspective

The fundamental question: what is the intrinsic dimensionality of semantic embeddings? Rate-distortion theory suggests embeddings contain redundancy that can be exploited.

DCT vs SVD: Key Differences

SVD: - Data-dependent transformation - Optimal linear dimensionality reduction (in Frobenius norm) - Requires singular value decomposition - Captures global variance structure

DCT: - Data-independent transformation - Assumes local smoothness in embedding space - \(O(n \log d)\) computation via FFT - Natural frequency interpretation

Implementation Considerations

Computational Complexity

  • SVD: \(O(\min(n^2d, nd^2))\) for full decomposition
  • DCT: \(O(nd \log d)\) via FFT implementation
  • Quantization: \(O(nd)\) for scalar, \(O(ndk_c)\) for \(k_c\)-means vector quantization

Numerical Stability

When implementing PCA on high-dimensional embeddings:

# Direct SVD on embedding matrix
U, S, Vt = np.linalg.svd(embeddings, full_matrices=False)
projection_matrix = Vt[:k].T
reduced_embeddings = embeddings @ projection_matrix

Open Research Questions

  1. Optimal Dimensionality Selection: How to determine k without extensive empirical search?
  2. Query-Dependent Reduction: Can we adaptively select dimensions based on query characteristics?
  3. Non-linear Alternatives: Would manifold learning techniques preserve retrieval quality better?
  4. End-to-End Learning: Can we train embeddings aware of downstream compression?

Conclusion

This work outlines a rigorous framework for evaluating dimensionality reduction techniques in RAG systems. The trade-off between computational efficiency and retrieval quality remains an active area of research. Future work should focus on establishing theoretical guarantees and developing adaptive compression strategies.

Acknowledgments

This research was conducted in collaboration with Ethan Davis at SAS Institute Inc. We thank the SAS Institute for computational resources and support.

References

  1. Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space.
  2. Jegou, H., Douze, M., & Schmid, C. (2011). Product quantization for nearest neighbor search.
  3. Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks.

This methodology paper outlines ongoing research. Implementations and empirical results will be shared upon completion of experiments.