Full-text Grounding, Tag Audit & BBT Export
Retrieve PDF passages with page locators for grounding, audit tag hygiene against a controlled vocabulary, and export with Better BibTeX formatting.
zotero_get_fulltext — retrieve PDF text for grounding
Retrieve the full text of a PDF attachment for use as grounding context. Pass either:
- A parent item key — the best PDF child attachment is resolved automatically (prefers
application/pdfcontent type; falls back to the first attachment if no PDF is found). - An attachment key directly — returned as-is.
Retrieval modes
One of three modes is selected based on the arguments:
query mode (pass query): Returns the top-k passages most relevant to the query, ranked by an ephemeral BM25 index (fused with vector re-ranking when an embedder is configured). Each passage carries:
charStart/charEnd— inclusive/exclusive character offsets in the source text.section— nearest preceding section heading (best-effort).pageApprox— proportional page estimate (1-based), orpage(exact) whenprecise_pagessucceeds.score— BM25 score.max_passagescaps the number of passages returned (default 5, max 20).
page_range mode (pass page_range, e.g. "3-7"): Returns the text for the specified page span (1-based, inclusive). Uses exact page text when precise_pages is available; otherwise approximates the character range proportionally.
Document mode (neither argument): Returns a truncated head of the document with a notice prompting use of query or page_range for targeted retrieval.
In all modes, max_chars caps total returned text (default 12000, max 100000). A single passage is never split, so one passage may slightly exceed the cap.
Page locators
By default, page numbers are approximate (pageApprox): a proportional estimate derived from the character offset divided by the total character count, clamped to 1-based page numbers. This requires only the Zotero cloud full-text index.
Pass precise_pages: true to re-extract the PDF for exact page numbers. This:
- Downloads the attachment bytes from the cloud API.
- Lazily imports the optional
pdfjs-distdependency (declared as anoptionalDependency). - Extracts per-page text and locates each passage.
If the PDF bytes are unavailable or pdfjs-dist is not installed, the tool degrades to approximate pages and sets pageSource: "approximate" with a notice in structuredContent.notice. It never throws — the degrade is transparent.
Install the optional dependency for exact pages:
npm i pdfjs-distRead-only mode
zotero_get_fulltext is annotated readOnlyHint: true and remains available under ZOTEUS_READ_ONLY=true.
zotero_tag_audit — audit tags against a controlled vocabulary
Audit a Zotero library's tags against a controlled vocabulary with optional required tiers.
Vocabulary schema
Supply inline as vocabulary (a JSON object) or as a JSON file path via vocabulary_path:
{
"tags": [
{ "name": "machine-learning", "tier": "topic" },
{ "name": "RQ1", "tier": "subquestion" }
],
"tiers": [
{ "name": "topic", "required": true },
{ "name": "subquestion", "required": false }
]
}tags[].name— canonical tag name.tags[].tier— optional tier membership (used for the missing-tier report).tiers[].name— tier name.tiers[].required— iftrue, every item must have at least one tag from this tier.
Reports
The tool produces three reports:
-
Off-taxonomy tags (
offTaxonomy): library tags that are not in the vocabulary. Zotero automatically-applied tags (meta.type === 1, e.g. PDF keyword extraction) are bucketed separately asautoTagsrather than flagged as off-taxonomy — unlessinclude_auto: trueis passed, in which case they are included inoffTaxonomy. -
Missing required tiers (
missingByTier): for each required tier, the items that have no tag belonging to that tier. Each entry liststier,itemCount, and a capped list ofitems(key + title). -
Per-collection coverage (
collections): passscope.collection_keyswith an array of collection keys to run the missing-tier analysis scoped to each collection separately.
Other options
limit— caps the number of items listed per report entry (default 50, max 500). Does not limit the tag or item enumeration — all are scanned.include_auto— treat Zotero auto-applied tags as off-taxonomy too.library_type/library_id— target a group library.
Tags and items are enumerated via the cloud Web API with automatic pagination.
Read-only mode
zotero_tag_audit is annotated readOnlyHint: true and remains available under ZOTEUS_READ_ONLY=true.
zotero_export format:"better-biblatex" — Better BibTeX export
zotero_export accepts format: "better-biblatex" in addition to all existing formats.
Built-in biblatex vs better-biblatex
| Format | Route | BBT options | Availability |
|---|---|---|---|
biblatex | Zotero cloud Web API (stock translator) | Not available | Always (cloud) |
better-biblatex | Local desktop Better BibTeX plugin | Your configured BBT options apply | Desktop-local only |
biblatex uses Zotero's stock cloud translator. BBT-specific features (citation-key generation rules, sentence-case handling, biblatexExtendedNameFormat, unicode→LaTeX transliteration, and any BBT export options you have configured) are not available.
better-biblatex uses the Better BibTeX plugin running in your desktop Zotero instance at http://127.0.0.1:23119/better-bibtex. It requires:
- Desktop Zotero running locally.
- The Better BibTeX for Zotero plugin installed.
When better-biblatex is requested but Better BibTeX / desktop Zotero is unavailable (e.g. the hosted cloud connector, or Zotero is not running), the tool degrades to the built-in biblatex stock translator and includes a notice in the response. structuredContent.degradedToBuiltIn is set to true.
better-biblatex requires explicit item_keys; whole-library or query-based exports fall back to built-in biblatex.
zotero_list_tags and zotero_list_collections — read-only listing tools
Two read-only tools surface information that was previously only accessible through the mutating zotero_manage_tags and zotero_manage_collections tools.
zotero_list_tags
Lists tags in a Zotero library with usage counts and an auto flag (true for Zotero-applied tags, false for manual). Supports an optional q substring filter and limit. Available under ZOTEUS_READ_ONLY=true.
zotero_list_collections
Lists collections with key, name, parent collection key, and item count. Optional top: true returns only top-level collections. Collection keys can be passed to zotero_search_items (collectionKey) or zotero_tag_audit (scope.collection_keys). Available under ZOTEUS_READ_ONLY=true.
Read-only mode summary
Under ZOTEUS_READ_ONLY=true, the following tools remain available:
| Tool | Purpose |
|---|---|
zotero_get_fulltext | Retrieve PDF passages |
zotero_tag_audit | Audit tag vocabulary |
zotero_list_tags | List tags with usage/auto flag |
zotero_list_collections | List collections |
Scholarly Context Graph
Explore the wider literature around a paper — its references, citing works, and related papers — and see which ones you already have in your library.
Prompts (Workflows)
Seven MCP Prompts — user-triggered scholarly workflows exposed as slash commands that orchestrate the zotero_* tools for common research tasks.