robots.txt API

Read-only API for retrieving robots.txt configurations with version history, ETag caching, and SHA-256 content verification.

Overview

Key Features

HTTP ETag caching & 304 responses
SHA-256 content hash verification
Complete version history
JSON & plain text formats

Note: To update robots.txt, login to your robotsense.io account.

API Endpoints

GET
/api/v1/robots/{domain}

Get the latest active robots.txt for a domain. Returns JSON by default or plain text with ?format=text.

Headers

X-API-Key: <your-api-key>
If-None-Match: <etag> (optional)

Query Parameters

formatResponse format: json (default) or text

Response (JSON)

1{
2  "domain": "example.com",
3  "content": "User-agent: *\nDisallow: /admin\n...",
4  "version": 5,
5  "content_hash": "a8f5f167f44f4964e6c998dee827110c5f...",
6  "etag": "\"v5-a8f5f167f44f4964e6c998dee827110c5f...\"",
7  "created_at": "2024-01-15T10:30:00Z",
8  "is_active": true
9}

Response (Text)

1# Add ?format=text to get plain text
2curl -H "X-API-Key: rs_..." \
3  https://api.robotsense.io/api/v1/robots/example.com?format=text
4
5# Response headers include:
6# X-Content-Hash: a8f5f167f44f4964e6c998dee827110c5f...
7# X-Robots-Version: 5
8# ETag: "a8f5f167f44f4964e6c998dee827110c5f..."
9
10# Response body:
11User-agent: *
12Disallow: /admin
13Allow: /public
GET
/api/v1/robots/{domain}/versions

List all historical versions. Useful for tracking changes over time.

Response

1{
2  "domain": "example.com",
3  "versions": [
4    {
5      "version": 5,
6      "content_hash": "a8f5f167...",
7      "created_at": "2024-01-15T10:30:00Z",
8      "is_active": true
9    },
10    {
11      "version": 4,
12      "content_hash": "bcd123ef...",
13      "created_at": "2024-01-10T14:20:00Z",
14      "is_active": false
15    }
16  ]
17}
GET
/api/v1/robots/{domain}/versions/{version}

Retrieve a specific historical version. Also supports ?format=text.

Example

1curl -H "X-API-Key: rs_..." \
2  https://api.robotsense.io/api/v1/robots/example.com/versions/4

Code Examples

cURL / BashClick to expand

Basic Request

1# Get latest robots.txt (JSON)
2curl -H "X-API-Key: rs_YOUR_API_KEY_HERE" \
3  https://api.robotsense.io/api/v1/robots/example.com
4
5# Get as plain text
6curl -H "X-API-Key: rs_YOUR_API_KEY_HERE" \
7  https://api.robotsense.io/api/v1/robots/example.com?format=text
8
9# With ETag caching
10ETAG='"a8f5f167f44f4964e6c998dee827110c5f..."'
11curl -H "X-API-Key: rs_YOUR_API_KEY_HERE" \
12     -H "If-None-Match: $ETAG" \
13  https://api.robotsense.io/api/v1/robots/example.com
14# Returns 304 Not Modified if content unchanged

Version History

1# List all versions
2curl -H "X-API-Key: rs_YOUR_API_KEY_HERE" \
3  https://api.robotsense.io/api/v1/robots/example.com/versions
4
5# Get specific version
6curl -H "X-API-Key: rs_YOUR_API_KEY_HERE" \
7  https://api.robotsense.io/api/v1/robots/example.com/versions/4
Node.js / TypeScriptClick to expand
1import axios, { AxiosInstance } from 'axios';
2import crypto from 'crypto';
3
4interface RobotsTxtResponse {
5  domain: string;
6  content: string;
7  version: number;
8  content_hash: string;
9  created_at: string;
10  is_active: boolean;
11}
12
13interface VersionInfo {
14  version: number;
15  content_hash: string;
16  created_at: string;
17  is_active: boolean;
18}
19
20class RobotsTxtClient {
21  private client: AxiosInstance;
22  private etagCache: Map<string, string> = new Map();
23
24  constructor(apiKey: string) {
25    this.client = axios.create({
26      baseURL: 'https://api.robotsense.io/api/v1/robots',
27      headers: {
28        'X-API-Key': apiKey,
29      },
30      timeout: 5000,
31    });
32  }
33
34  async getRobotsTxt(
35    domain: string, 
36    format: 'json' | 'text' = 'json'
37  ): Promise<RobotsTxtResponse | string | null> {
38    const headers: Record<string, string> = {};
39    
40    // Add cached ETag
41    if (this.etagCache.has(domain)) {
42      headers['If-None-Match'] = this.etagCache.get(domain)!;
43    }
44
45    try {
46      const response = await this.client.get(`/${domain}`, {
47        headers,
48        params: format === 'text' ? { format: 'text' } : {},
49      });
50
51      // Cache ETag
52      if (response.headers['etag']) {
53        this.etagCache.set(domain, response.headers['etag']);
54      }
55
56      return response.data;
57    } catch (error: any) {
58      if (error.response?.status === 304) {
59        console.log(`Content unchanged for ${domain}`);
60        return null;
61      }
62      throw error;
63    }
64  }
65
66  verifyContentHash(content: string, expectedHash: string): boolean {
67    const actualHash = crypto
68      .createHash('sha256')
69      .update(content)
70      .digest('hex');
71    return actualHash === expectedHash;
72  }
73
74  async getVersionHistory(domain: string): Promise<{
75    domain: string;
76    versions: VersionInfo[];
77  }> {
78    const response = await this.client.get(`/${domain}/versions`);
79    return response.data;
80  }
81
82  async getSpecificVersion(
83    domain: string, 
84    version: number,
85    format: 'json' | 'text' = 'json'
86  ): Promise<RobotsTxtResponse | string> {
87    const response = await this.client.get(
88      `/${domain}/versions/${version}`,
89      { params: format === 'text' ? { format: 'text' } : {} }
90    );
91    return response.data;
92  }
93}
94
95// Usage
96const client = new RobotsTxtClient('rs_YOUR_API_KEY_HERE');
97
98(async () => {
99  try {
100    // Get latest robots.txt
101    const result = await client.getRobotsTxt('example.com') as RobotsTxtResponse;
102    console.log(`Version: ${result.version}`);
103    
104    // Verify integrity
105    const isValid = client.verifyContentHash(
106      result.content,
107      result.content_hash
108    );
109    console.log(`Content valid: ${isValid}`);
110
111    // Get as text
112    const textContent = await client.getRobotsTxt('example.com', 'text');
113    console.log(textContent);
114
115    // Get version history
116    const history = await client.getVersionHistory('example.com');
117    console.log(`Total versions: ${history.versions.length}`);
118
119  } catch (error) {
120    console.error('Error:', error);
121  }
122})();
PythonClick to expand
1import requests
2import hashlib
3
4class RobotsTxtClient:
5    def __init__(self, api_key):
6        self.api_key = api_key
7        self.base_url = "https://api.robotsense.io/api/v1/robots"
8        self.etag_cache = {}
9    
10    def get_robots_txt(self, domain, format='json'):
11        """Get latest robots.txt with ETag caching"""
12        url = f"{self.base_url}/{domain}"
13        headers = {"X-API-Key": self.api_key}
14        
15        # Add ETag if cached
16        if domain in self.etag_cache:
17            headers["If-None-Match"] = self.etag_cache[domain]
18        
19        params = {}
20        if format == 'text':
21            params['format'] = 'text'
22        
23        response = requests.get(url, headers=headers, params=params)
24        
25        if response.status_code == 304:
26            print(f"Content unchanged for {domain}")
27            return None
28        
29        response.raise_for_status()
30        
31        # Cache ETag
32        if 'ETag' in response.headers:
33            self.etag_cache[domain] = response.headers['ETag']
34        
35        return response.json() if format == 'json' else response.text
36    
37    def verify_content_hash(self, content, expected_hash):
38        """Verify content integrity"""
39        actual_hash = hashlib.sha256(content.encode()).hexdigest()
40        return actual_hash == expected_hash
41    
42    def get_version_history(self, domain):
43        """Get all historical versions"""
44        url = f"{self.base_url}/{domain}/versions"
45        headers = {"X-API-Key": self.api_key}
46        
47        response = requests.get(url, headers=headers)
48        response.raise_for_status()
49        return response.json()
50
51# Usage
52client = RobotsTxtClient("rs_YOUR_API_KEY_HERE")
53
54# Get latest version
55result = client.get_robots_txt("example.com")
56if result:
57    print(f"Version {result['version']}")
58    print(f"Hash: {result['content_hash']}")
59    
60    # Verify integrity
61    is_valid = client.verify_content_hash(
62        result['content'], 
63        result['content_hash']
64    )
65    print(f"Content valid: {is_valid}")
66
67# Get as text
68text_content = client.get_robots_txt("example.com", format='text')
69print(text_content)
70
71# Get version history
72history = client.get_version_history("example.com")
73print(f"Total versions: {len(history['versions'])}")
PHPClick to expand
1<?php
2
3class RobotsTxtClient {
4    private $apiKey;
5    private $baseUrl = 'https://api.robotsense.io/api/v1/robots';
6    private $etagCache = [];
7
8    public function __construct($apiKey) {
9        $this->apiKey = $apiKey;
10    }
11
12    public function getRobotsTxt($domain, $format = 'json') {
13        $url = "{$this->baseUrl}/{$domain}";
14        if ($format === 'text') {
15            $url .= '?format=text';
16        }
17
18        $headers = [
19            "X-API-Key: {$this->apiKey}"
20        ];
21
22        // Add cached ETag
23        if (isset($this->etagCache[$domain])) {
24            $headers[] = "If-None-Match: {$this->etagCache[$domain]}";
25        }
26
27        $ch = curl_init($url);
28        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
29        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
30        curl_setopt($ch, CURLOPT_HEADER, true);
31        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
32
33        $response = curl_exec($ch);
34        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
35        $headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
36        
37        $responseHeaders = substr($response, 0, $headerSize);
38        $body = substr($response, $headerSize);
39
40        curl_close($ch);
41
42        // Handle 304 Not Modified
43        if ($httpCode === 304) {
44            echo "Content unchanged for {$domain}\n";
45            return null;
46        }
47
48        if ($httpCode !== 200) {
49            throw new Exception("API error: {$httpCode}");
50        }
51
52        // Cache ETag
53        if (preg_match('/ETag: (.+)/i', $responseHeaders, $matches)) {
54            $this->etagCache[$domain] = trim($matches[1]);
55        }
56
57        return $format === 'json' ? json_decode($body, true) : $body;
58    }
59
60    public function verifyContentHash($content, $expectedHash) {
61        $actualHash = hash('sha256', $content);
62        return $actualHash === $expectedHash;
63    }
64
65    public function getVersionHistory($domain) {
66        $url = "{$this->baseUrl}/{$domain}/versions";
67        $headers = ["X-API-Key: {$this->apiKey}"];
68
69        $ch = curl_init($url);
70        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
71        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
72        curl_setopt($ch, CURLOPT_TIMEOUT, 5);
73
74        $response = curl_exec($ch);
75        $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
76        curl_close($ch);
77
78        if ($httpCode !== 200) {
79            throw new Exception("API error: {$httpCode}");
80        }
81
82        return json_decode($response, true);
83    }
84}
85
86// Usage
87$client = new RobotsTxtClient('rs_YOUR_API_KEY_HERE');
88
89try {
90    // Get latest robots.txt
91    $result = $client->getRobotsTxt('example.com');
92    if ($result) {
93        echo "Version: {$result['version']}\n";
94        echo "Hash: {$result['content_hash']}\n";
95        
96        // Verify integrity
97        $isValid = $client->verifyContentHash(
98            $result['content'], 
99            $result['content_hash']
100        );
101        echo "Content valid: " . ($isValid ? 'true' : 'false') . "\n";
102    }
103
104    // Get as text
105    $textContent = $client->getRobotsTxt('example.com', 'text');
106    echo $textContent;
107
108    // Get version history
109    $history = $client->getVersionHistory('example.com');
110    echo "Total versions: " . count($history['versions']) . "\n";
111
112} catch (Exception $e) {
113    echo "Error: {$e->getMessage()}\n";
114}
115?>
GoClick to expand
1package main
2
3import (
4    "crypto/sha256"
5    "encoding/hex"
6    "encoding/json"
7    "fmt"
8    "io"
9    "net/http"
10    "time"
11)
12
13type RobotsTxtResponse struct {
14    Domain      string    `json:"domain"`
15    Content     string    `json:"content"`
16    Version     int       `json:"version"`
17    ContentHash string    `json:"content_hash"`
18    CreatedAt   time.Time `json:"created_at"`
19    IsActive    bool      `json:"is_active"`
20}
21
22type VersionInfo struct {
23    Version     int       `json:"version"`
24    ContentHash string    `json:"content_hash"`
25    CreatedAt   time.Time `json:"created_at"`
26    IsActive    bool      `json:"is_active"`
27}
28
29type RobotsTxtClient struct {
30    apiKey    string
31    baseURL   string
32    client    *http.Client
33    etagCache map[string]string
34}
35
36func NewRobotsTxtClient(apiKey string) *RobotsTxtClient {
37    return &RobotsTxtClient{
38        apiKey:    apiKey,
39        baseURL:   "https://api.robotsense.io/api/v1/robots",
40        client:    &http.Client{Timeout: 5 * time.Second},
41        etagCache: make(map[string]string),
42    }
43}
44
45func (c *RobotsTxtClient) GetRobotsTxt(domain, format string) (*RobotsTxtResponse, error) {
46    url := fmt.Sprintf("%s/%s", c.baseURL, domain)
47    if format == "text" {
48        url += "?format=text"
49    }
50
51    req, err := http.NewRequest("GET", url, nil)
52    if err != nil {
53        return nil, err
54    }
55
56    req.Header.Set("X-API-Key", c.apiKey)
57    
58    // Add cached ETag
59    if etag, exists := c.etagCache[domain]; exists {
60        req.Header.Set("If-None-Match", etag)
61    }
62
63    resp, err := c.client.Do(req)
64    if err != nil {
65        return nil, err
66    }
67    defer resp.Body.Close()
68
69    // Handle 304 Not Modified
70    if resp.StatusCode == http.StatusNotModified {
71        fmt.Println("Content unchanged for", domain)
72        return nil, nil
73    }
74
75    if resp.StatusCode != http.StatusOK {
76        return nil, fmt.Errorf("API error: %d", resp.StatusCode)
77    }
78
79    // Cache ETag
80    if etag := resp.Header.Get("ETag"); etag != "" {
81        c.etagCache[domain] = etag
82    }
83
84    body, err := io.ReadAll(resp.Body)
85    if err != nil {
86        return nil, err
87    }
88
89    if format == "text" {
90        return &RobotsTxtResponse{Content: string(body)}, nil
91    }
92
93    var result RobotsTxtResponse
94    if err := json.Unmarshal(body, &result); err != nil {
95        return nil, err
96    }
97
98    return &result, nil
99}
100
101func (c *RobotsTxtClient) VerifyContentHash(content, expectedHash string) bool {
102    hash := sha256.Sum256([]byte(content))
103    actualHash := hex.EncodeToString(hash[:])
104    return actualHash == expectedHash
105}
106
107func main() {
108    client := NewRobotsTxtClient("rs_YOUR_API_KEY_HERE")
109
110    // Get latest robots.txt
111    result, err := client.GetRobotsTxt("example.com", "json")
112    if err != nil {
113        panic(err)
114    }
115
116    if result != nil {
117        fmt.Printf("Version: %d\n", result.Version)
118        fmt.Printf("Hash: %s\n", result.ContentHash)
119        
120        // Verify integrity
121        isValid := client.VerifyContentHash(result.Content, result.ContentHash)
122        fmt.Printf("Content valid: %v\n", isValid)
123    }
124
125    // Get as text
126    textResult, _ := client.GetRobotsTxt("example.com", "text")
127    if textResult != nil {
128        fmt.Println(textResult.Content)
129    }
130}
Ruby / RailsClick to expand
1require 'net/http'
2require 'json'
3require 'digest'
4
5class RobotsTxtClient
6  attr_reader :api_key, :base_url
7
8  def initialize(api_key)
9    @api_key = api_key
10    @base_url = 'https://api.robotsense.io/api/v1/robots'
11    @etag_cache = {}
12  end
13
14  def get_robots_txt(domain, format: 'json')
15    uri = URI("#{@base_url}/#{domain}")
16    uri.query = "format=#{format}" if format == 'text'
17
18    http = Net::HTTP.new(uri.host, uri.port)
19    http.use_ssl = true
20    http.read_timeout = 5
21
22    request = Net::HTTP::Get.new(uri)
23    request['X-API-Key'] = @api_key
24
25    # Add cached ETag
26    if @etag_cache[domain]
27      request['If-None-Match'] = @etag_cache[domain]
28    end
29
30    response = http.request(request)
31
32    # Handle 304 Not Modified
33    if response.code == '304'
34      puts "Content unchanged for #{domain}"
35      return nil
36    end
37
38    raise "API error: #{response.code}" unless response.code == '200'
39
40    # Cache ETag
41    @etag_cache[domain] = response['ETag'] if response['ETag']
42
43    format == 'json' ? JSON.parse(response.body) : response.body
44  end
45
46  def verify_content_hash(content, expected_hash)
47    actual_hash = Digest::SHA256.hexdigest(content)
48    actual_hash == expected_hash
49  end
50
51  def get_version_history(domain)
52    uri = URI("#{@base_url}/#{domain}/versions")
53
54    http = Net::HTTP.new(uri.host, uri.port)
55    http.use_ssl = true
56    http.read_timeout = 5
57
58    request = Net::HTTP::Get.new(uri)
59    request['X-API-Key'] = @api_key
60
61    response = http.request(request)
62    raise "API error: #{response.code}" unless response.code == '200'
63
64    JSON.parse(response.body)
65  end
66
67  def get_specific_version(domain, version, format: 'json')
68    uri = URI("#{@base_url}/#{domain}/versions/#{version}")
69    uri.query = "format=#{format}" if format == 'text'
70
71    http = Net::HTTP.new(uri.host, uri.port)
72    http.use_ssl = true
73    http.read_timeout = 5
74
75    request = Net::HTTP::Get.new(uri)
76    request['X-API-Key'] = @api_key
77
78    response = http.request(request)
79    raise "API error: #{response.code}" unless response.code == '200'
80
81    format == 'json' ? JSON.parse(response.body) : response.body
82  end
83end
84
85# Usage
86client = RobotsTxtClient.new('rs_YOUR_API_KEY_HERE')
87
88begin
89  # Get latest robots.txt
90  result = client.get_robots_txt('example.com')
91  if result
92    puts "Version: #{result['version']}"
93    puts "Hash: #{result['content_hash']}"
94    
95    # Verify integrity
96    is_valid = client.verify_content_hash(
97      result['content'], 
98      result['content_hash']
99    )
100    puts "Content valid: #{is_valid}"
101  end
102
103  # Get as text
104  text_content = client.get_robots_txt('example.com', format: 'text')
105  puts text_content
106
107  # Get version history
108  history = client.get_version_history('example.com')
109  puts "Total versions: #{history['versions'].length}"
110
111rescue => e
112  puts "Error: #{e.message}"
113end

Best Practices

⚡ Performance Tips

  • ✅ Use ETag caching with If-None-Match header
  • ✅ Cache robots.txt locally for 1+ hours (changes infrequently)
  • ✅ Use ?format=text for CDN edge serving
  • ✅ Monitor X-RateLimit-Remaining header
  • ❌ Don't poll unnecessarily - robots.txt rarely changes

🔐 Security Best Practices

  • ✅ Verify SHA-256 content hash to detect tampering
  • ✅ Store API keys in environment variables
  • ✅ Use different keys for production and development
  • ❌ Never commit keys to version control
  • ❌ Don't use production keys in client-side code

Rate Limiting

Fixed Rate Limit

The robots.txt API has a fixed rate limit of 10 requests per minute per API key. This applies to all robots.txt endpoints and is separate from your main API rate limits.

Rate Limit Headers

All API responses include rate limit headers:

1X-RateLimit-Limit: 10
2X-RateLimit-Remaining: 7
3X-RateLimit-Reset: 1704461400  # Unix timestamp

429 Response

If you exceed the rate limit, you'll receive a 429 error:

1{
2  "error": "rate_limit_exceeded",
3  "message": "Rate limit exceeded. Limit: 10 requests per 60 seconds"
4}

Tip: Implement exponential backoff and check the X-RateLimit-Reset header to determine when to retry.

Additional Resources

For more information, see the main API documentation or contact support.