Gemini API's File Search upgrades to multimodal RAG: mixed image and text retrieval, metadata filtering, page-level referencing

CryptoWorld News reports that Google has launched three updates for the Gemini API’s File Search tool. First is multimodal retrieval: based on the Gemini embedding 2 model, images and text uploaded by developers can be uniformly indexed and retrieved within the same knowledge base, so users can use natural language to find materials in the image library that match a specific visual style or emotional tone. Second is customizable metadata filtering: when uploading files, key-value labels (such as department: legal) can be added, and queries can pre-filter by labels to narrow the search scope. Third is page-level precise referencing: when the model responds, it will indicate which page of which file the information comes from, making it easy for users to jump directly to verify. File Search is a fully managed Retrieval-Augmented Generation (RAG) system built into the Gemini API by Google; it automatically handles file storage, chunking, vectorization, and context injection. Embedding generation during storage and querying is free; charges apply only during the initial indexing at $0.15 per million tokens.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin