ingest
ingest
¶
Document ingestion — file reading, type detection, directory walking.
Classes¶
DocumentMeta
dataclass
¶
Metadata about an ingested document.
Functions¶
detect_file_type
¶
Map a file extension to one of: text, markdown, pdf, code.
Source code in src/openjarvis/tools/storage/ingest.py
read_document
¶
read_document(path: Path) -> Tuple[str, DocumentMeta]
Read a file and return (text, metadata).
| RAISES | DESCRIPTION |
|---|---|
ImportError
|
If the file is a PDF and |
FileNotFoundError
|
If path does not exist. |
Source code in src/openjarvis/tools/storage/ingest.py
ingest_path
¶
ingest_path(path: Path, *, config: Optional[ChunkConfig] = None) -> List[Chunk]
Ingest a file or directory into chunks.
If path is a file, reads and chunks it. If path is a directory, recursively walks it (skipping hidden and common non-content directories) and chunks each file.