Embedding API Overview
The Embedding API provides access to the vector embedding capabilities of the Quran Knowledge Graph, enabling semantic search, similarity analysis, and thematic discovery based on meaning rather than just keywords.Key Concepts
Vector Embeddings
Vector embeddings are numerical representations of text in a high-dimensional space, where:- Semantically similar texts are positioned close to each other
- The distance between vectors represents semantic dissimilarity
- The relationships between vectors can reveal conceptual connections
- Verses: Capturing the semantic meaning of complete verses
- Words: Representing individual words in context
- Topics: Aggregating verse embeddings to represent thematic concepts
Embedding Model
The Quran Knowledge Graph uses a multilingual BERT model (bert-base-multilingual-cased) to generate 768-dimensional embeddings. This model is capable of understanding both Arabic and English text, making it suitable for cross-lingual semantic analysis.
Similarity Measures
The API supports different similarity measures:- Cosine Similarity: Measures the cosine of the angle between vectors (range: -1 to 1)
- Euclidean Distance: Measures the straight-line distance between vectors
- Dot Product: Measures the product of the vectors’ magnitudes and the cosine of the angle between them
API Structure
The Embedding API is organized into several components:Embedding Generation
- Generate embeddings for text
- Retrieve pre-computed embeddings for verses, words, and topics
- Batch processing for multiple texts
Semantic Search
- Search for verses semantically similar to a query
- Find verses similar to a specific verse
- Perform hybrid keyword and semantic search
Thematic Analysis
- Discover thematic relationships based on embedding similarity
- Cluster verses by semantic similarity
- Map verses to topics based on embedding proximity