Last 7 Days (April 26 โ May 02, 2026)
Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.
Primary: Open Gigantic
All Institutions: Open Gigantic
ObjectGraph has the potential for significant broader impact across several dimensions: 1. **Cost and Efficiency**: The dramatic reduction in token consumption (up to 95.3%) and mitigation of context compounding (36.5x reduction) can substantially lower the operational costs of LLM agents and enable more complex, multi-turn workflows within existing context window limits. 2. **Agent Capabilities**: By providing structured, queryable knowledge, ObjectGraph can enhance agent reasoning, planning, and execution capabilities, leading to more reliable and autonomous agents. 3. **System Simplification**: The "ObjectGraph as Infrastructure" concept is powerful. Role-scoped access control, executable assertions, and delta loading natively within the document format can eliminate the need for external middleware, validation prompt templates, and change tracking systems, simplifying the architecture of multi-agent deployments. 4. **Human-Agent Collaboration**: Being a strict superset of Markdown, ObjectGraph allows both humans and agents to interact with the same source document, reducing maintenance overhead and fostering better alignment between human-authored instructions and agent execution. 5. **Knowledge Management**: It offers a more robust framework for managing agent knowledge bases, enabling features like automated staleness detection and structured updates. 6. **New Paradigm for Documents**: This work challenges the fundamental assumption of linear document consumption, proposing a new paradigm for how information is structured and accessed in the agentic era. If widely adopted, it could lead to a new ecosystem of tools and practices for agent-native content creation and consumption. This paper introduces ObjectGraph, a novel file format that re-imagines documents as typed knowledge graphs for LLM agents, achieving up to 95.3% token reduction and significant context compounding mitigation without degrading task accuracy. The work presents a comprehensive, well-designed solution to a fundamental problem in LLM agent deployment, offering a paradigm shift in document consumption that promises to enhance agent efficiency, capabilities, and simplify multi-agent system architectures.
The paper introduces ObjectGraph (.og), a novel file format designed to address the "Document Consumption Problem" for LLM agents. The core methodology reconceives documents as typed, directed knowledge graphs rather than linear text strings. The authors formalize this problem and derive six structural properties (Query-Addressable Index, Layered Compression, Typed Dependency Graph, Role-Scoped Access Control, Executable Assertions, Human Readability) that existing formats fail to satisfy simultaneously. ObjectGraph is presented as a strict superset of Markdown, ensuring backward compatibility. Key methodological components include: 1. **ObjectGraph Format Specification**: A detailed structure comprising a file-level manifest (meta, index, changelog blocks) and atomic knowledge units (nodes). Nodes are typed containers with stable identifiers, scope annotations, confidence scores, and versioning metadata. Content-type tags (e.g., `code`, `steps`, `warning`) provide explicit semantic meaning beyond visual cues. 2. **Progressive Disclosure Model (PDM)**: A three-pass reading model (Index, Dense, Full) that enables agents to retrieve only relevant information at the necessary fidelity level, significantly reducing token consumption. 3. **Typed Edge Declarations**: Supports explicit, machine-traversable relationships between nodes (e.g., `:requires`, `:precedes`, `:see-also`), allowing for automatic dependency resolution. 4. **Role-Based Access Control**: The `scope` attribute on nodes and index entries enables content filtering at the format level, eliminating the need for external middleware in multi-agent systems. 5. **Executable Assertion Nodes**: Allows embedding validation logic, retry mechanisms, and escalation paths directly within the document, triggered by the query protocol. 6. **Delta Loading via Changelog**: A `__changelog` meta-node facilitates incremental document updates, reducing the cost of checking for changes. 7. **LLM-Native Query Protocol**: A minimal two-primitive interface (`search_index`, `resolve_context`) that leverages the LLM itself as a "Router" for semantic index search, rather than relying on traditional keyword matching or embeddings. This is a particularly clever design choice. 8. **Transpiler**: A hybrid Markdown-to-ObjectGraph transpiler that uses deterministic parsers for content extraction and bounded LLM calls for metadata synthesis (dense blocks, index keywords), ensuring high fidelity and bounding hallucination risk. The methodology is comprehensive, well-articulated, and addresses the identified problems systematically. The design choices, such as Markdown superset and LLM-as-Router, are pragmatic and innovative.
The empirical evaluation is robust and addresses key research questions effectively. 1. **Corpus**: A benchmark of 240 documents across five classes (Skill Files, Operational Runbooks, Execution Plans, Technical Documentation, Knowledge Bases), ranging from 200 to 15,000 tokens, provides a diverse testbed. 2. **Task Suite**: Eight distinct task types (information lookup, procedure execution, multi-step planning, role-conditional access, cross-node reasoning, update detection, assertion verification, multi-agent handoff) cover a broad range of agent interactions. 3. **Models & Baselines**: Evaluation uses Claude Sonnet 4.5 (primary), Claude Haiku 4.5 (Router), and GPT-4o (cross-model validation). Baselines include Full Markdown injection, RAG (text-embedding-3-large), and SkillReducer-optimized Markdown. 4. **RQ1: Token Consumption**: ObjectGraph achieved a mean token reduction from 2,340 to 187 tokens (92.0% average, up to 95.3%), demonstrating significant cost savings. 5. **RQ2: Context Compounding Reduction**: In a 5-turn workflow, ObjectGraph (Architecture B) reduced cumulative token cost by 36.5x compared to Markdown (46,000 vs. 1,260 tokens), effectively mitigating the super-linear growth of context. 6. **RQ3: Task Accuracy**: ObjectGraph matched or exceeded Markdown accuracy on 7 of 8 task types. Notably, it showed dramatic improvements on Role-conditional access (+18.4%) and Update detection (+30.1%), tasks where Markdown lacks native support. The "less-is-more" effect, where reduced context improves accuracy by reducing attention dilution, is a significant finding. 7. **RQ4: Transpiler Fidelity**: The transpiler achieved a mean fidelity of 0.987 (SD=0.018) on 180 held-out documents, ensuring high content preservation. 8. **RQ5: Human Authoring Burden**: A user study with 18 participants rated authoring burden as low (mean 2.8/7), suggesting good usability for human authors. 9. **Ablation Study**: An ablation study clearly demonstrated the individual contributions of different ObjectGraph features to token reduction, providing valuable insights into the design's effectiveness. The experimental setup is comprehensive, the results are statistically significant (p > 0.05 for accuracy degradation), and the findings strongly support the claims of the paper.
The paper provides a detailed specification of the ObjectGraph format, including its structure, node types, edge syntax, and query protocol. The LLM prompt template for metadata synthesis is explicitly provided. The algorithms for structural extraction and the query protocol are outlined. While no direct code repository or dataset links are provided, the level of detail in the format specification and methodology sections is high enough that a motivated researcher could likely implement the format and protocol. The benchmark corpus is described in terms of document classes and token ranges, but the specific documents are not publicly available. The LLM models used are identified. Overall, the paper offers a strong foundation for reproducibility, though direct code access would enhance it further.
The authors acknowledge several limitations: 1. **Scale**: The benchmark of 240 documents, while curated, may not fully represent the diversity of real-world enterprise-scale corpora. 2. **Cross-file Federation**: The current specification does not support cross-file edge resolution, limiting its applicability to mono-repo or single-domain knowledge bases. This is a significant limitation for truly distributed knowledge graphs. 3. **Standardisation**: Without a standards body or broad community adoption, the format risks fragmentation into incompatible dialects. 4. **Adversarial Inputs**: The evaluation did not consider adversarial document authors who might craft misleading `dense` blocks or `index` entries to manipulate agent routing. Additional minor limitations could include the reliance on LLMs for routing, which, while a feature, could introduce its own set of challenges (e.g., prompt engineering for optimal routing, potential for misinterpretation if the index is poorly crafted).
ObjectGraph has the potential for significant broader impact across several dimensions: 1. **Cost and Efficiency**: The dramatic reduction in token consumption (up to 95.3%) and mitigation of context compounding (36.5x reduction) can substantially lower the operational costs of LLM agents and enable more complex, multi-turn workflows within existing context window limits. 2. **Agent Capabilities**: By providing structured, queryable knowledge, ObjectGraph can enhance agent reasoning, planning, and execution capabilities, leading to more reliable and autonomous agents. 3. **System Simplification**: The "ObjectGraph as Infrastructure" concept is powerful. Role-scoped access control, executable assertions, and delta loading natively within the document format can eliminate the need for external middleware, validation prompt templates, and change tracking systems, simplifying the architecture of multi-agent deployments. 4. **Human-Agent Collaboration**: Being a strict superset of Markdown, ObjectGraph allows both humans and agents to interact with the same source document, reducing maintenance overhead and fostering better alignment between human-authored instructions and agent execution. 5. **Knowledge Management**: It offers a more robust framework for managing agent knowledge bases, enabling features like automated staleness detection and structured updates. 6. **New Paradigm for Documents**: This work challenges the fundamental assumption of linear document consumption, proposing a new paradigm for how information is structured and accessed in the agentic era. If widely adopted, it could lead to a new ecosystem of tools and practices for agent-native content creation and consumption. This paper introduces ObjectGraph, a novel file format that re-imagines documents as typed knowledge graphs for LLM agents, achieving up to 95.3% token reduction and significant context compounding mitigation without degrading task accuracy. The work presents a comprehensive, well-designed solution to a fundamental problem in LLM agent deployment, offering a paradigm shift in document consumption that promises to enhance agent efficiency, capabilities, and simplify multi-agent system architectures.