Drop files here or click to browse
Supported formats: PDF, Word, Excel, PowerPoint, EPUB, ODT, ZIP, and 200+ text formats
Add documents and search using natural language - everything stays in your browser
Drop files here or click to browse
Supported formats: PDF, Word, Excel, PowerPoint, EPUB, ODT, ZIP, and 200+ text formats
No, absolutely not. Everything happens entirely in your browser. Your files are processed locally using WebAssembly and JavaScript. No data is transmitted to any server - your documents never leave your device.
This means you can safely use this tool with confidential or sensitive documents. Even if you disconnect from the internet after loading the page, the search will still work.
This tool supports a wide range of file formats:
Binary formats like images, audio, or video files are not supported. All document processing happens entirely in your browser - no server required.
There is no strict file size limit, but practical limits depend on your device:
If you experience slowdowns, try closing other browser tabs or using fewer/smaller files.
RAG stands for Retrieval-Augmented Generation. In simple terms, it's a technique that helps you find relevant information in your documents using natural language questions.
Here's how it works:
This allows you to search by meaning rather than exact keywords. For example, searching for "how to handle errors" will find content about "exception handling" or "error management" even if those exact words aren't in your query.
When you first add documents, the page downloads an AI model (approximately 130 MB) that runs entirely in your browser. The model used is paraphrase-multilingual-MiniLM-L12-v2, a sentence transformer that supports over 50 languages including English and German.
This model converts text into mathematical representations (embeddings) that capture semantic meaning, enabling you to search by concept rather than just keywords.
Good news: The model is cached in your browser, so future visits won't require another download.
The search uses a hybrid approach that combines two techniques:
The relevance percentage shown for each result is this combined score. A document with a specific keyword match might appear high in results even if the semantic similarity alone would be low.
Documents are automatically split into chunks at two granularity levels:
The search automatically considers both granularities and adjusts scoring based on your query length: short queries favor precise chunks, longer questions favor context chunks.
Results only appear if they exceed a minimum relevance threshold. Here are some tips:
This tool is a proof of concept demonstrating that sophisticated semantic search can run entirely in the browser without any server infrastructure. It showcases the potential of client-side AI for privacy-preserving document search.
For enterprise-grade implementations, multimodal textualization would be a typical enhancement - using OCR and vision models to extract and describe content from images, diagrams, charts, and scanned documents, making visual information searchable alongside text.
That said, this implementation already demonstrates the core principles and can handle real-world document collections effectively within browser constraints.