Are embeddings relevant to the GDPR?

Yes. From a legal standpoint, embeddings are considered pseudonymized data. Since they can theoretically be traced back to the original personally identifiable information through mathematical methods (re-identification), they fall within the scope of the GDPR. Proper data management, EU-based hosting, and strict data deletion policies are therefore essential.

How does current case law (2026) assess the risk associated with embeddings?

According to recent ECJ rulings, the decisive factor is whether re-identification is “sufficiently likely” for the data processor. Since modern AI methods make it easier to reconstruct original texts from vectors, supervisory authorities generally classify embeddings in a business context as personal data, which imposes high requirements on technical and organizational security (TOMS).

How does BOTfriends address compliance with vector databases?

BOTfriends is committed to "Privacy by Design." This includes hosting in certified EU data centers, encryption of vector databases (encryption at rest and in transit), and granular access controls. In addition, sensitive data is filtered through anonymization layers even before vectorization to minimize risk from the outset.

Embeddings

May 8, 2026

|By Julia Schönau

–-> Go to BOTwiki

Embeddings are numerical representations of text, images, or other data in a high-dimensional vector space. They translate meaning into numbers. Content with similar meanings is located close together in the vector space, regardless of the specific wording. Embeddings thus make possible what traditional keyword matching cannot achieve: semantic search, in which “reading the electricity meter” and “submitting the meter reading” are recognized as related.

In modern AI agents , embeddings form the basis of semantic language processing in applications such as initial intent recognition, Retrieval Augmented Generation (RAG) in knowledge bases, and many other functions.

How embeddings work technically

An embedding model—usually a specially trained neural network—takes an input text (e.g., “How do I report water damage?”) and translates it into a vector that typically has several hundred or thousand dimensions. Similar content generates similar vectors. Using distance metrics such as cosine similarity, the texts most relevant to the query can be efficiently identified from a large volume of content, such as a knowledge base or a product catalog.

In the RAG setup, the semantically relevant information is first retrieved from the knowledge base and provided to the LLM as context. Instead of letting the model “guess,” it responds based on verified sources. This is one of the most effective ways to reduce hallucinations and a key component through which BOTfriends ensures factual accuracy.

Best Practices for Using Embeddings

In enterprise projects, you can primarily influence the quality of the knowledge stored in the bot, as the chunking quality of knowledge base entries plays a key role in determining the quality of search results. Chunks that are too small lose context, while those that are too large dilute semantic accuracy. BOTfriends uses various mechanisms to optimize the chunks as much as possible and ensure that the most relevant information is always provided. They make the difference between an agent that “just answers” and one that retrieves the right information from the right source, even when dealing with extensive, multilingual knowledge bases.

Frequently Asked Questions (FAQ)

Yes. Depending on the content, embeddings may contain personal information or make it traceable. That is why proper data management, EU-based hosting, and a clear authorization and deletion policy are mandatory. BOTfriends addresses these requirements by default during setup and operation.

–> Back to the BOTwiki

Product

Features

Integrations

Resources

Documentation & Know-How

Recommendations

Embeddings

How embeddings work technically

Best Practices for Using Embeddings

Frequently Asked Questions (FAQ)

Product

Features

Integrations

Resources

Documentation & Know-How

Recommendations

Embeddings

How embeddings work technically

Best Practices for Using Embeddings

Frequently Asked Questions (FAQ)

Are embeddings relevant to the GDPR?+