Last 14 Days (April 21 โ May 04, 2026)
ReLU neural networks trained as surrogate models can be embedded exactly in mixed-integer linear programs (MILPs), enabling global optimization over the learned function. The tractability of the resulting MILP depends on structural properties of the network, i.e., the number of binary variables in associated formulations and the tightness of the continuous LP relaxation. These properties are determined during training, yet standard training objectives (prediction loss with classical weight regularization) offer no mechanism to directly control them. This work studies training regularizers that directly target downstream MILP tractability. Specifically, we propose simple bound-based regularizers that penalize the big-M constants of MILP formulations and/or the number of unstable neurons. Moreover, we introduce an LP relaxation gap regularizer that explicitly penalizes the per-sample gap of the continuous relaxation at training points. We derive its associated gradient and provide an implementation from LP dual variables without custom automatic differentiation tools. We show that combining the above regularizers can approximate the full total derivative of the LP gap with respect to the network parameters, capturing both direct and indirect sensitivities. Experiments on non-convex benchmark functions and a two-stage stochastic programming problem with quantile neural network surrogates demonstrate that the proposed regularizers can reduce MILP solve times by up to four orders of magnitude relative to an unregularized baseline, while maintaining competitive surrogate model accuracy.
Primary: Imperial College London
All Institutions: Imperial College London
This paper has significant broader impact across several domains: * **Mathematical Optimization:** It provides a powerful new tool for integrating neural network surrogates into global optimization problems, particularly those formulated as MILPs. This can unlock new capabilities in fields where complex black-box functions need to be optimized. * **Engineering Design and Operations:** Applications in process design, energy systems, and planning, where NN surrogates are increasingly used, will directly benefit from the ability to train more tractable models. This can lead to faster design cycles and more efficient operational decisions. * **Decision-Focused Learning:** The work contributes to the broader paradigm of training ML models with their downstream use in mind. While decision-focused learning often targets solution quality, this paper focuses on *computational tractability*, offering a complementary and equally important objective. * **Certified Robustness and Verification:** The techniques share methodological roots with certified robustness, demonstrating how insights from that field can be repurposed for optimization tractability. * **ML System Design:** It highlights the importance of considering the entire ML-to-optimization pipeline, suggesting that training objectives should be informed by the downstream application's computational characteristics. This could lead to more holistic ML system designs. The dramatic speedups demonstrated could make previously intractable problems solvable within reasonable timeframes, thereby expanding the practical applicability of NN surrogates in optimization. This paper introduces novel regularization techniques that enable the training of ReLU neural network surrogate models which are dramatically more tractable for downstream Mixed-Integer Linear Program (MILP) optimization, achieving up to four orders of magnitude speedup in MILP solve times while maintaining competitive accuracy. The work makes significant methodological contributions, including a novel LP relaxation gap regularizer with an elegant gradient derivation using LP dual variables and a practical straight-through estimator implementation, alongside a theoretical decomposition linking combined regularizers to the total derivative of the LP gap. This research provides a critical advancement for integrating machine learning models into mathematical optimization, with profound implications for engineering, design, and decision-making applications.
The paper proposes a family of novel regularization terms designed to improve the tractability of Mixed-Integer Linear Programs (MILPs) that embed ReLU neural network surrogate models. This addresses a critical bottleneck: while ReLU NNs can be exactly formulated as MILPs, the resulting optimization problems are often intractable. The methodology is well-grounded and comprises three main types of regularizers: 1. **Shrinkage Regularizers ($R_{L1}, R_{L2}$):** These are standard baselines, indirectly influencing MILP tractability by promoting smaller weights, which can lead to tighter bounds. 2. **Bound-based Regularizers ($R_{BW}, R_{SN}, R_{SN2}$):** * $R_{BW}$ (Bound-Width): Directly penalizes the mean width of Interval Bound Propagation (IBP) pre-activation bounds across all hidden neurons. This directly targets the big-M constants in MILP formulations, which are crucial for relaxation tightness. Its gradient is computed via automatic differentiation through the IBP forward pass. * $R_{SN}$ (Stable-Neuron): Penalizes the "distance to stability" for unstable neurons, encouraging them to become stably active or inactive, thus reducing the number of binary variables needed. It uses a piecewise-linear formulation with a clear subgradient. * $R_{SN2}$ (RS Loss): An alternative stability regularizer from prior work, included for comparison. 3. **LP Relaxation Gap Regularizer ($R_{LP}$):** This is the most novel and technically sophisticated contribution. It directly penalizes the per-sample continuous LP relaxation gap at training points. The paper elegantly derives its gradient using sensitivity analysis for parametric LPs, specifically leveraging LP dual variables. Crucially, it provides a practical implementation using a "straight-through estimator" to avoid custom automatic differentiation tools, making it accessible for standard ML frameworks like PyTorch. A significant theoretical contribution is Proposition 2, which demonstrates that the combined regularizer $R_{LP} + \lambda R_{BW}$ approximates the full total derivative of the LP gap with respect to network parameters. This decomposition captures both direct sensitivity (through constraint right-hand sides) and indirect sensitivity (through big-M constants via IBP), providing a strong theoretical justification for combining these regularizers. The methodology is robust, combining established concepts (IBP, MILP formulations) with novel gradient derivations and practical implementation strategies.
The experimental evaluation is comprehensive and compelling. * **Benchmarks:** The methods are tested on standard non-convex benchmark functions (Himmelblau, Peaks, Ackley) and a more complex, real-world relevant problem: a two-stage stochastic programming problem with quantile neural network surrogates. This demonstrates applicability across different problem types. * **Network Architectures:** Various network sizes (2, 3, 5 hidden layers, 25-50 neurons per layer) are explored, showing the robustness of the approach across different model complexities. * **Metrics:** The evaluation uses a comprehensive set of metrics: * **Accuracy:** Normalized test MSE ratios are reported to assess the trade-off between tractability and prediction accuracy. * **MILP Tractability:** Key metrics include the number of unstable neurons, LP relaxation gap, MILP node count, and MILP solve time. * **Results:** The results are outstanding. The proposed regularizers, especially combinations like $R_{BW}+R_{LP}$, achieve reductions in MILP solve times by *up to four orders of magnitude* (e.g., from hours to seconds) compared to unregularized baselines. This is achieved while maintaining competitive surrogate model accuracy, demonstrating a highly favorable trade-off. The paper shows that $R_{LP}$ is particularly effective at reducing the LP relaxation gap, while $R_{SN}$ and $R_{BW}$ contribute to reducing unstable neurons and tightening bounds, respectively. The computational overhead during training is analyzed, with $R_{LP}$ being the most expensive (5-10x baseline training time), but this cost is amortized over potentially many downstream optimization tasks. The visual examples (Figure 1, 2, 3) effectively illustrate the impact of regularization on relaxation tightness and prediction quality.
The paper provides sufficient detail for reproducibility. * **Implementation Details:** The use of PyTorch for NN models and regularizers, Gurobi for MILP, and HiGHS for LP solves is clearly stated. The specific version of Gurobi is mentioned. * **Gradient Derivations:** The gradients for all regularizers are explicitly derived, and the "straight-through estimator" implementation for $R_{LP}$ is clearly explained, which is crucial for practical implementation in standard ML frameworks. * **Experimental Setup:** Details on training data generation (Latin Hypercube sampling), sample sizes, normalization, and validation splits are provided. * **Computational Environment:** The server specifications (AMD EPYC 7742, 8 CPU cores, 16 GB memory) are mentioned. * **Tooling:** The choice of HiGHS over Gurobi for LPs during training is justified, aiding reproducibility with open-source tools. The acknowledgment of using Anthropic's Claude for server setup is unusual but transparent. Overall, the level of detail is high, making the work highly reproducible.
* **Computational Cost of $R_{LP}$:** While the benefits are immense, the LP-based regularizer significantly increases training time (5-10x). This might be a barrier for very large networks or datasets, although the paper suggests GPU-based LP solvers as a future direction. * **Reliance on IBP:** The bound-based regularizers and the indirect sensitivity path in Proposition 2 rely on IBP, which provides valid but often loose bounds. While the paper acknowledges this, more sophisticated OBBT methods could potentially yield even tighter relaxations at higher computational cost. * **Approximation in Combined Regularizer:** The combined regularizer $R_{LP} + \lambda R_{BW}$ approximates the full total derivative by using a uniform weight $\lambda$ instead of the true, sample-dependent LP dual multipliers for big-M sensitivity. While effective, this is an approximation. * **Scope of MILP Formulations:** The work primarily focuses on the standard big-M formulation for ReLU networks. While widely used, other more sophisticated MILP formulations exist, and the generalizability of these specific regularizers to those might require further investigation. * **ReLU-specific:** The methods are tailored for ReLU activation functions due to their piecewise-linear nature and exact MILP embedding. Generalization to other activation functions (e.g., sigmoid, tanh, or more complex non-linearities) would require different MILP formulations or convex relaxations, which is beyond the current scope.
This paper has significant broader impact across several domains: * **Mathematical Optimization:** It provides a powerful new tool for integrating neural network surrogates into global optimization problems, particularly those formulated as MILPs. This can unlock new capabilities in fields where complex black-box functions need to be optimized. * **Engineering Design and Operations:** Applications in process design, energy systems, and planning, where NN surrogates are increasingly used, will directly benefit from the ability to train more tractable models. This can lead to faster design cycles and more efficient operational decisions. * **Decision-Focused Learning:** The work contributes to the broader paradigm of training ML models with their downstream use in mind. While decision-focused learning often targets solution quality, this paper focuses on *computational tractability*, offering a complementary and equally important objective. * **Certified Robustness and Verification:** The techniques share methodological roots with certified robustness, demonstrating how insights from that field can be repurposed for optimization tractability. * **ML System Design:** It highlights the importance of considering the entire ML-to-optimization pipeline, suggesting that training objectives should be informed by the downstream application's computational characteristics. This could lead to more holistic ML system designs. The dramatic speedups demonstrated could make previously intractable problems solvable within reasonable timeframes, thereby expanding the practical applicability of NN surrogates in optimization. This paper introduces novel regularization techniques that enable the training of ReLU neural network surrogate models which are dramatically more tractable for downstream Mixed-Integer Linear Program (MILP) optimization, achieving up to four orders of magnitude speedup in MILP solve times while maintaining competitive accuracy. The work makes significant methodological contributions, including a novel LP relaxation gap regularizer with an elegant gradient derivation using LP dual variables and a practical straight-through estimator implementation, alongside a theoretical decomposition linking combined regularizers to the total derivative of the LP gap. This research provides a critical advancement for integrating machine learning models into mathematical optimization, with profound implications for engineering, design, and decision-making applications.
Every document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem. We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling. We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.
Primary: Open Gigantic
All Institutions: Open Gigantic
ObjectGraph has the potential for significant broader impact across several dimensions: 1. **Cost and Efficiency**: The dramatic reduction in token consumption (up to 95.3%) and mitigation of context compounding (36.5x reduction) can substantially lower the operational costs of LLM agents and enable more complex, multi-turn workflows within existing context window limits. 2. **Agent Capabilities**: By providing structured, queryable knowledge, ObjectGraph can enhance agent reasoning, planning, and execution capabilities, leading to more reliable and autonomous agents. 3. **System Simplification**: The "ObjectGraph as Infrastructure" concept is powerful. Role-scoped access control, executable assertions, and delta loading natively within the document format can eliminate the need for external middleware, validation prompt templates, and change tracking systems, simplifying the architecture of multi-agent deployments. 4. **Human-Agent Collaboration**: Being a strict superset of Markdown, ObjectGraph allows both humans and agents to interact with the same source document, reducing maintenance overhead and fostering better alignment between human-authored instructions and agent execution. 5. **Knowledge Management**: It offers a more robust framework for managing agent knowledge bases, enabling features like automated staleness detection and structured updates. 6. **New Paradigm for Documents**: This work challenges the fundamental assumption of linear document consumption, proposing a new paradigm for how information is structured and accessed in the agentic era. If widely adopted, it could lead to a new ecosystem of tools and practices for agent-native content creation and consumption. This paper introduces ObjectGraph, a novel file format that re-imagines documents as typed knowledge graphs for LLM agents, achieving up to 95.3% token reduction and significant context compounding mitigation without degrading task accuracy. The work presents a comprehensive, well-designed solution to a fundamental problem in LLM agent deployment, offering a paradigm shift in document consumption that promises to enhance agent efficiency, capabilities, and simplify multi-agent system architectures.
The paper introduces ObjectGraph (.og), a novel file format designed to address the "Document Consumption Problem" for LLM agents. The core methodology reconceives documents as typed, directed knowledge graphs rather than linear text strings. The authors formalize this problem and derive six structural properties (Query-Addressable Index, Layered Compression, Typed Dependency Graph, Role-Scoped Access Control, Executable Assertions, Human Readability) that existing formats fail to satisfy simultaneously. ObjectGraph is presented as a strict superset of Markdown, ensuring backward compatibility. Key methodological components include: 1. **ObjectGraph Format Specification**: A detailed structure comprising a file-level manifest (meta, index, changelog blocks) and atomic knowledge units (nodes). Nodes are typed containers with stable identifiers, scope annotations, confidence scores, and versioning metadata. Content-type tags (e.g., `code`, `steps`, `warning`) provide explicit semantic meaning beyond visual cues. 2. **Progressive Disclosure Model (PDM)**: A three-pass reading model (Index, Dense, Full) that enables agents to retrieve only relevant information at the necessary fidelity level, significantly reducing token consumption. 3. **Typed Edge Declarations**: Supports explicit, machine-traversable relationships between nodes (e.g., `:requires`, `:precedes`, `:see-also`), allowing for automatic dependency resolution. 4. **Role-Based Access Control**: The `scope` attribute on nodes and index entries enables content filtering at the format level, eliminating the need for external middleware in multi-agent systems. 5. **Executable Assertion Nodes**: Allows embedding validation logic, retry mechanisms, and escalation paths directly within the document, triggered by the query protocol. 6. **Delta Loading via Changelog**: A `__changelog` meta-node facilitates incremental document updates, reducing the cost of checking for changes. 7. **LLM-Native Query Protocol**: A minimal two-primitive interface (`search_index`, `resolve_context`) that leverages the LLM itself as a "Router" for semantic index search, rather than relying on traditional keyword matching or embeddings. This is a particularly clever design choice. 8. **Transpiler**: A hybrid Markdown-to-ObjectGraph transpiler that uses deterministic parsers for content extraction and bounded LLM calls for metadata synthesis (dense blocks, index keywords), ensuring high fidelity and bounding hallucination risk. The methodology is comprehensive, well-articulated, and addresses the identified problems systematically. The design choices, such as Markdown superset and LLM-as-Router, are pragmatic and innovative.
The empirical evaluation is robust and addresses key research questions effectively. 1. **Corpus**: A benchmark of 240 documents across five classes (Skill Files, Operational Runbooks, Execution Plans, Technical Documentation, Knowledge Bases), ranging from 200 to 15,000 tokens, provides a diverse testbed. 2. **Task Suite**: Eight distinct task types (information lookup, procedure execution, multi-step planning, role-conditional access, cross-node reasoning, update detection, assertion verification, multi-agent handoff) cover a broad range of agent interactions. 3. **Models & Baselines**: Evaluation uses Claude Sonnet 4.5 (primary), Claude Haiku 4.5 (Router), and GPT-4o (cross-model validation). Baselines include Full Markdown injection, RAG (text-embedding-3-large), and SkillReducer-optimized Markdown. 4. **RQ1: Token Consumption**: ObjectGraph achieved a mean token reduction from 2,340 to 187 tokens (92.0% average, up to 95.3%), demonstrating significant cost savings. 5. **RQ2: Context Compounding Reduction**: In a 5-turn workflow, ObjectGraph (Architecture B) reduced cumulative token cost by 36.5x compared to Markdown (46,000 vs. 1,260 tokens), effectively mitigating the super-linear growth of context. 6. **RQ3: Task Accuracy**: ObjectGraph matched or exceeded Markdown accuracy on 7 of 8 task types. Notably, it showed dramatic improvements on Role-conditional access (+18.4%) and Update detection (+30.1%), tasks where Markdown lacks native support. The "less-is-more" effect, where reduced context improves accuracy by reducing attention dilution, is a significant finding. 7. **RQ4: Transpiler Fidelity**: The transpiler achieved a mean fidelity of 0.987 (SD=0.018) on 180 held-out documents, ensuring high content preservation. 8. **RQ5: Human Authoring Burden**: A user study with 18 participants rated authoring burden as low (mean 2.8/7), suggesting good usability for human authors. 9. **Ablation Study**: An ablation study clearly demonstrated the individual contributions of different ObjectGraph features to token reduction, providing valuable insights into the design's effectiveness. The experimental setup is comprehensive, the results are statistically significant (p > 0.05 for accuracy degradation), and the findings strongly support the claims of the paper.
The paper provides a detailed specification of the ObjectGraph format, including its structure, node types, edge syntax, and query protocol. The LLM prompt template for metadata synthesis is explicitly provided. The algorithms for structural extraction and the query protocol are outlined. While no direct code repository or dataset links are provided, the level of detail in the format specification and methodology sections is high enough that a motivated researcher could likely implement the format and protocol. The benchmark corpus is described in terms of document classes and token ranges, but the specific documents are not publicly available. The LLM models used are identified. Overall, the paper offers a strong foundation for reproducibility, though direct code access would enhance it further.
The authors acknowledge several limitations: 1. **Scale**: The benchmark of 240 documents, while curated, may not fully represent the diversity of real-world enterprise-scale corpora. 2. **Cross-file Federation**: The current specification does not support cross-file edge resolution, limiting its applicability to mono-repo or single-domain knowledge bases. This is a significant limitation for truly distributed knowledge graphs. 3. **Standardisation**: Without a standards body or broad community adoption, the format risks fragmentation into incompatible dialects. 4. **Adversarial Inputs**: The evaluation did not consider adversarial document authors who might craft misleading `dense` blocks or `index` entries to manipulate agent routing. Additional minor limitations could include the reliance on LLMs for routing, which, while a feature, could introduce its own set of challenges (e.g., prompt engineering for optimal routing, potential for misinterpretation if the index is poorly crafted).
ObjectGraph has the potential for significant broader impact across several dimensions: 1. **Cost and Efficiency**: The dramatic reduction in token consumption (up to 95.3%) and mitigation of context compounding (36.5x reduction) can substantially lower the operational costs of LLM agents and enable more complex, multi-turn workflows within existing context window limits. 2. **Agent Capabilities**: By providing structured, queryable knowledge, ObjectGraph can enhance agent reasoning, planning, and execution capabilities, leading to more reliable and autonomous agents. 3. **System Simplification**: The "ObjectGraph as Infrastructure" concept is powerful. Role-scoped access control, executable assertions, and delta loading natively within the document format can eliminate the need for external middleware, validation prompt templates, and change tracking systems, simplifying the architecture of multi-agent deployments. 4. **Human-Agent Collaboration**: Being a strict superset of Markdown, ObjectGraph allows both humans and agents to interact with the same source document, reducing maintenance overhead and fostering better alignment between human-authored instructions and agent execution. 5. **Knowledge Management**: It offers a more robust framework for managing agent knowledge bases, enabling features like automated staleness detection and structured updates. 6. **New Paradigm for Documents**: This work challenges the fundamental assumption of linear document consumption, proposing a new paradigm for how information is structured and accessed in the agentic era. If widely adopted, it could lead to a new ecosystem of tools and practices for agent-native content creation and consumption. This paper introduces ObjectGraph, a novel file format that re-imagines documents as typed knowledge graphs for LLM agents, achieving up to 95.3% token reduction and significant context compounding mitigation without degrading task accuracy. The work presents a comprehensive, well-designed solution to a fundamental problem in LLM agent deployment, offering a paradigm shift in document consumption that promises to enhance agent efficiency, capabilities, and simplify multi-agent system architectures.