API and Structured Data Feeds for AI Agents

Author Introduction

Kia ora, I’m Andrew McPherson. Across my work with a wide variety of organisations I often see the same pattern: brands invest heavily in content yet overlook the programmatic surfaces AI agents actually cite. If you want ChatGPT, Claude, Gemini and Perplexity to represent your pricing, products and expertise accurately, your APIs and feeds deserve the same care as your homepage. Here’s how.

Outline

Definition of APIs and structured data feeds
Role as Surface 2 of AI Data Surfaces
Why feeds matter for AI citation accuracy
How AI agents retrieve and use feeds
Optimisation techniques for discoverability and freshness
llms.txt declaration and JSON-LD feed endpoints
CiteCompass feed health and citation monitoring
Recent changes in AI agent tool use

Key Takeaways

APIs deliver machine-readable data AI agents cite confidently
Structured feeds reduce hallucination versus HTML inference
Function calling enables real-time API access in chat
OpenAPI specifications let agents self-configure requests
llms.txt declares feed locations for agent discovery
Freshness timestamps signal trust and active maintenance
Competitors with feeds win pricing and spec citations
Feeds are AI visibility assets, not internal plumbing

What Are API and Structured Data Feeds?

APIs (Application Programming Interfaces) and structured data feeds are machine-readable endpoints that allow AI agents to retrieve information programmatically rather than through web crawling alone. Instead of parsing HTML and inferring meaning from unstructured content, AI systems can query APIs and feeds to access validated, up-to-date data in standardised formats.

These programmatic interfaces represent Surface 2 of the AI Data Surfaces framework: feeds and APIs that deliver synchronised, machine-readable information directly to AI systems, as outlined in Microsoft Advertising’s guide to AEO and GEO. While traditional web crawling (Surface 1) remains important for content discovery, APIs and feeds enable AI agents to access real-time data with explicit semantic structure, significantly reducing hallucination risk and improving citation accuracy.

Common feed and API types used by AI agents include REST APIs with JSON responses, GraphQL endpoints for query-based data access, RSS/Atom feeds for content syndication, JSON-LD structured data feeds, and XML sitemaps for content discovery. Each serves different retrieval patterns, and B2B companies increasingly deploy multiple feed types to serve different AI agent capabilities.

For B2B companies across industries – software, professional services, manufacturing, distribution and business services – APIs and feeds communicate core business information that AI systems can retrieve and cite with confidence: pricing structures, product specifications, service capabilities, geographic coverage, team expertise and operational metrics.

Why APIs and Feeds Matter for AI Agents

AI agents have evolved beyond passive web browsing to active tool use and function calling. Modern AI systems like ChatGPT, Claude, Perplexity and Gemini can invoke APIs, parse structured feeds and integrate real-time data into their responses, as documented in Anthropic’s tool use documentation. This shift fundamentally changes how AI systems access brand information.

When AI agents rely solely on crawled web content, they face several limitations. Web pages often embed information in unstructured prose, requiring natural language inference to extract facts – dates may be misread, pricing may be misinterpreted, and specifications confused. Crawled content reflects point-in-time snapshots, meaning AI systems working from cached indexes may retrieve outdated information. And web content often lacks explicit semantic markup defining relationships between entities, forcing AI systems to infer connections that may not exist.

APIs and structured feeds solve these problems. Programmatic access provides data in machine-parseable formats (JSON, XML, JSON-LD) with explicit field names and types, eliminating ambiguity. Feeds can be queried in real time or near-real time, ensuring AI agents retrieve current information rather than stale cached content. Structured data includes semantic metadata that defines entity relationships, enabling AI systems to understand not just isolated facts but how those facts connect.

The impact on citation accuracy is measurable. When AI systems retrieve pricing from a structured feed with explicit currency, billing increment and modification date fields, they can cite that pricing with high confidence. When they infer pricing from marketing copy that says “starting at $99 per month”, they face uncertainty about what “starting at” means, whether enterprise pricing differs, and when that price was last updated. That uncertainty reduces citation likelihood.

For B2B companies, the strategic implication is clear. Web content establishes topical authority and entity recognition, but APIs and feeds provide the verifiable data that AI systems cite when accuracy matters. A software company might rank well in search results based on blog authority, but if competitors provide structured pricing feeds while you do not, AI agents will cite competitor pricing more confidently than yours.

The rise of autonomous AI agents amplifies this dynamic. A procurement agent comparing vendor capabilities will prioritise sources offering structured product specification APIs over those requiring manual web scraping. A financial analysis AI will cite companies with machine-readable disclosure feeds more frequently than those publishing PDFs.

How AI Agents Use APIs and Feeds

Modern AI systems employ several mechanisms to access APIs and structured feeds, each with different technical requirements and optimisation opportunities.

Tool Use and Function Calling

Tool use and function calling represent the most direct API integration pattern. OpenAI’s ChatGPT, Anthropic’s Claude and Google’s Gemini all support function calling, which allows AI models to invoke external APIs during conversation. When a user asks “What is the current pricing for [Product Name]?”, the model can call a pricing API, retrieve structured data and incorporate that data into its response with a citation.

Function calling requires API providers to supply function definitions in OpenAPI / Swagger format or similar specifications that describe available endpoints, required parameters, expected response schemas and authentication methods. AI systems use these definitions to construct valid API requests without human intervention.

Browsing and Embedded Feed Detection

Browse with Bing (ChatGPT) and similar browsing capabilities enable AI agents to navigate web pages and follow links, but they can also detect and parse structured feeds embedded in pages. When an AI agent encounters a page with JSON-LD markup or discoverable feed URLs, it can extract that structured data directly rather than parsing HTML.

Indexed Feed Retrieval

Search tools used by Perplexity, Claude and other AI systems crawl and index feeds proactively. These systems maintain indexes of RSS feeds, sitemap URLs and JSON-LD endpoints, retrieving updates periodically to refresh their knowledge bases. When answering queries, they retrieve from these indexed feeds just as they retrieve from cached web pages.

Autonomous Research Agents

Autonomous agents represent the frontier of AI-powered data access. Research agents built on frameworks like LangChain, AutoGPT and Microsoft Semantic Kernel can orchestrate multi-step workflows that include API calls, feed parsing and data synthesis. A market research agent might query company APIs to gather product specifications, parse pricing feeds to build comparison matrices, and retrieve changelog feeds to identify recent feature additions.

Each access pattern has different latency, freshness and authentication characteristics. Function calling typically operates in real time. Indexed feed retrieval works from cached snapshots updated hourly, daily or weekly. Autonomous agents may operate asynchronously, running batch queries and synthesising results for later retrieval. B2B companies optimising for AI visibility must consider all three patterns.

How to Optimise APIs and Feeds for AI

Optimising APIs and feeds for AI agent access requires attention to discoverability, documentation quality, data structure, authentication patterns and freshness signals.

Publish an OpenAPI Specification

Start with API documentation using OpenAPI (formerly Swagger) or similar machine-readable specifications. AI systems that support function calling rely on these specifications to understand available endpoints, required parameters, response schemas and error codes.

For example, a pricing API specification might define a GET endpoint at /api/v1/pricing accepting a query parameter ‘plan’ with enum values [“starter”, “professional”, “enterprise”], requiring API key authentication via header, and returning a JSON object with fields price (number), currency (string), billingIncrement (string) and dateModified (ISO 8601 timestamp). AI systems can parse this specification and construct valid API calls without human guidance.

Balance Authentication and Accessibility

Public read-only APIs can remain unauthenticated or use simple API key authentication, enabling AI systems to access data without complex OAuth flows. For APIs requiring authentication, provide clear documentation on obtaining API keys and include rate limiting details so AI agents can respect usage constraints.

Rate limiting itself should be generous enough to accommodate AI agent queries while protecting against abuse. A reasonable baseline for public pricing or product specification APIs might allow 1,000 requests per day per IP address or API key. Include rate limit headers in API responses (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so AI agents can throttle appropriately.

Enforce Data Structure Consistency

Use semantic field names that clearly indicate content (dateModified rather than updated, priceCurrency rather than curr). Maintain consistent data types across endpoints – always represent dates as ISO 8601 strings, always represent prices as numbers rather than formatted strings like “$99.00”. Nest related data logically rather than flattening everything into top-level fields.

Signal Freshness

Freshness signals through dateModified or lastUpdated timestamps tell AI systems when information was last validated. Include these timestamps at both the resource level and the feed level. AI systems use these signals to assess recency, preferentially citing sources with recent modification dates when answering time-sensitive queries.

Declare Feeds in llms.txt

Declaring feeds in llms.txt provides a centralised discovery mechanism. Create an /llms.txt file at your domain root listing all available structured feeds with short descriptions – section headers describing feed types, followed by URLs and brief explanations for pricing feeds, product specifications, team directories and changelog feeds. This enables AI agents to discover data sources without manual configuration or guesswork about endpoint locations.

Publish JSON-LD Feed Endpoints

JSON-LD feed endpoints combine structured data with feed mechanics. Rather than embedding JSON-LD only in web pages, publish dedicated JSON-LD feeds at discoverable URLs. A professional services firm might publish /feeds/team.jsonld containing an array of Person entities with structured expertise, credentials and contact information. AI systems can retrieve this feed, parse the schema markup and extract facts with explicit semantic relationships.

Version and Handle Errors Predictably

Use URL-based versioning (/api/v1/, /api/v2/) rather than header-based versioning, making version explicit in endpoint paths. Maintain older API versions for reasonable deprecation periods – minimum 12 months – with clear sunset dates documented in API responses and documentation.

Error handling and status codes should follow HTTP standards rigorously. Return 200 for successful requests, 404 for not found, 429 for rate limit exceeded, 500 for server errors. Include error messages in response bodies with clear explanations rather than generic error text.

Response format consistency across endpoints reduces integration friction. Standardise on a consistent response envelope: either always wrap in {“data”: […]} or always return bare data, but be consistent. Consider providing multiple formats for the same data – publishing both /feeds/changelog.json and /feeds/changelog.xml accommodates different retrieval patterns without significant additional overhead.

CiteCompass Perspective

CiteCompass helps B2B companies understand how AI systems access and cite their APIs and structured feeds through AI visibility monitoring and optimisation. Learn more at the CiteCompass AI Visibility Suite.

Feed health validation identifies structural issues that reduce AI retrievability. Common problems include missing dateModified timestamps (AI systems cannot assess freshness), inconsistent field naming across endpoints, CORS misconfigurations that block browser-based AI agents, and authentication requirements without clear documentation.

Citation attribution tracking reveals whether AI systems cite your feeds directly or only reference web content. When AI responses include specific pricing, specifications or capabilities with citations, monitoring which source surfaces received attribution helps prioritise optimisation efforts. If competitors receive feed-attributed citations while your brand receives only web-attributed mentions, it signals a feed optimisation opportunity.

API accessibility testing from AI agent perspectives identifies friction points. Simulating access patterns from major AI platforms (OpenAI, Anthropic, Google, Perplexity) reveals whether endpoints are discoverable, whether authentication flows work smoothly, and whether response formats parse correctly. This testing catches issues like missing CORS headers, non-standard JSON structures, or rate limits too restrictive for typical AI agent usage.

Feed freshness monitoring tracks how often your feeds update and whether modification timestamps reflect actual changes. A pricing feed with a dateModified timestamp from six months ago signals staleness, even if pricing has not changed. Regular timestamp updates, even when data remains static, demonstrate active maintenance.

The educational insight CiteCompass reinforces is that APIs and feeds are not just technical infrastructure; they are AI visibility assets. Every structured endpoint represents a potential citation source. Every feed with current timestamps signals trustworthiness. Every llms.txt declaration improves discoverability. Companies that treat APIs as internal tools only miss citation opportunities; those that publish well-documented, actively maintained, discoverable feeds gain citation advantage.

What Changed Recently

2026-02-06: Created API and Structured Data Feeds spoke page with focus on AI agent tool use, function calling and llms.txt discovery
2025-Q4: OpenAI expanded ChatGPT function calling to support browsing combined with API access
2025-Q3: Anthropic released Claude with enhanced tool use, including automatic OpenAPI parsing and multi-step orchestration
2025-Q2: Google Gemini API added function calling with streaming responses for real-time integration
2025-Q1: llms.txt specification gained adoption across B2B SaaS as standard feed declaration format

References

1. Microsoft Advertising (2024). From Discovery to Influence: A Guide to AEO and GEO. Establishes the three-surface framework and identifies APIs and structured feeds as Surface 2.

2. Anthropic (2024). Tool Use (Function Calling). Official documentation on how Claude implements function calling, tool definition schemas and multi-step orchestration patterns.

3. OpenAPI Initiative (2024). OpenAPI Specification v3.1.0. The standard format for describing REST APIs in machine-readable specifications used by AI systems for automatic integration.

API and Structured Data Feeds: Programmatic Access for AI Agents