Skip to content

embeddings

embeddings

Embeddings abstraction for dense retrieval backends.

Provides an ABC and a default SentenceTransformerEmbedder that wraps the sentence-transformers library.

Classes

Embedder

Bases: ABC

Base class for text embedding models.

Subclasses must implement :meth:embed and :meth:dim.

Functions
embed abstractmethod
embed(texts: list[str]) -> Any

Embed texts and return a numpy array of shape (n, dim).

Source code in src/openjarvis/tools/storage/embeddings.py
@abstractmethod
def embed(self, texts: list[str]) -> Any:
    """Embed *texts* and return a numpy array of shape (n, dim)."""
dim abstractmethod
dim() -> int

Return the dimensionality of the embedding vectors.

Source code in src/openjarvis/tools/storage/embeddings.py
@abstractmethod
def dim(self) -> int:
    """Return the dimensionality of the embedding vectors."""

SentenceTransformerEmbedder

SentenceTransformerEmbedder(model_name: str = 'all-MiniLM-L6-v2')

Bases: Embedder

Embedder backed by sentence-transformers.

PARAMETER DESCRIPTION
model_name

HuggingFace model identifier. Defaults to the lightweight all-MiniLM-L6-v2 (384-dim, ~22 MB).

TYPE: str DEFAULT: 'all-MiniLM-L6-v2'

Source code in src/openjarvis/tools/storage/embeddings.py
def __init__(
    self, model_name: str = "all-MiniLM-L6-v2"
) -> None:
    try:
        from sentence_transformers import (
            SentenceTransformer,
        )
    except ImportError as exc:
        raise ImportError(
            "sentence-transformers is required for "
            "SentenceTransformerEmbedder. Install it with: "
            "pip install sentence-transformers"
        ) from exc

    self._model = SentenceTransformer(model_name)
    self._dim: int = (
        self._model.get_sentence_embedding_dimension()
    )
Functions
embed
embed(texts: list[str]) -> Any

Return a numpy array of shape (len(texts), dim).

Source code in src/openjarvis/tools/storage/embeddings.py
def embed(self, texts: list[str]) -> Any:
    """Return a numpy array of shape ``(len(texts), dim)``."""
    return self._model.encode(
        texts, convert_to_numpy=True
    )
dim
dim() -> int

Return the embedding dimensionality.

Source code in src/openjarvis/tools/storage/embeddings.py
def dim(self) -> int:
    """Return the embedding dimensionality."""
    return self._dim