Trigger crawls & pull data out as JSON.
Kick off a fresh crawl when you deploy, then read every audit’s data — pages, issues, broken links, images, scripts and CSS — through a simple HTTP API. Point a script or an AI agent at the results, and fix issues automatically.
https://eu1.website-toolkit.co.ukQuick start
The API does two things: start a crawl for a domain, and read the data a crawl produced. Here’s one of each, plus pulling the broken links.
# Your API key (ask us for one for the beta service)
KEY="your-api-key"
BASE="https://eu1.website-toolkit.co.uk"
# 1. Start a fresh crawl of a domain
curl -s -X POST -H "Authorization: Bearer $KEY" \
"$BASE/api/crawl-site/example.com"
# 2. Pull the issues from the most recent crawl
curl -s -H "Authorization: Bearer $KEY" \
"$BASE/api/v1/example.com/latest/issues"
# 3. Pull just the broken links
curl -s -H "Authorization: Bearer $KEY" \
"$BASE/api/v1/example.com/latest/broken-links"Authentication
Every request needs your API key. Keys are issued per account; a revoked key stops working immediately. There are two ways to send it:
- Authorization header (recommended):
Authorization: Bearer your-api-key. Headers aren’t written to access logs, and GitHub Actions automatically masks the secret. - Token in the URL: For callers that can only fire a plain URL (some webhooks, uptime pingers). Convenient, but the token can appear in server logs.
Your data stays yours.
The read API only serves crawls that belong to your account. Free, public audits are not retrievable through the API. Requests for someone else’s data return a 404.
Trigger a crawl
Start a full crawl of a domain — e.g. straight after a deploy. Open by design: use GET or POST, with the key in either the header or the URL.
# Header auth (recommended)
curl -s -X POST -H "Authorization: Bearer $KEY" \
"$BASE/api/crawl-site/example.com"
# Token-in-URL (for webhooks / pingers that can't set headers)
curl -s "$BASE/api/crawl-site/your-api-key/example.com"On success you get 202 Accepted:
{ "queued": true, "domain": "example.com", "url": "https://example.com" }Trigger from GitHub Actions
The cleanest setup: a workflow that fires the trigger on every deploy, with the key stored as a repository secret. GitHub redacts the secret from run logs automatically.
name: Re-crawl on deploy
on:
push:
branches: [main] # or: workflow_run, after your deploy job
jobs:
trigger-crawl:
runs-on: ubuntu-latest
steps:
- name: Trigger a fresh crawl
run: |
curl -sS -X POST \
-H "Authorization: Bearer ${{ secrets.CRAWLER_KEY }}" \
--fail-with-body \
"https://eu1.website-toolkit.co.uk/api/crawl-site/example.com"List crawls for a domain
{domain} is the site’s hostname, e.g. example.com. Returns up to 25 crawls, newest first, plus the list of request types you can ask for.
{
"domain": "example.com",
"types": ["pages","issues","links","images","css","scripts",
"broken-links","broken-images","broken-css","broken-scripts",
"words","summary"],
"crawls": [
{
"crawlId": "8f1c2e4a-1b2c-4d3e-9f8a-7b6c5d4e3f2a",
"startTime": "2026-06-20T09:00:00.000Z",
"endTime": "2026-06-20T09:04:12.000Z",
"status": "Completed",
"pages": 128,
"brokenPages": 1, "brokenLinks": 4,
"brokenImages": 0, "brokenCss": 0, "brokenJs": 0,
"warnings": 7, "mode": "advanced"
}
]
}Get the data feed
{crawlId}— A crawl ID from the discovery call, or the wordlatest.{type}— One of the request types below.
Pagination for list types: ?page=1&pageSize=100. Default 100, max 1000.
{ "total": 128, "page": 1, "pageSize": 100, "rows": [ /* … */ ] }Request types & columns
| Type | What you get |
|---|---|
pages | Every crawled page: status, title, H1, load time, size, redirect target, SEO issues. |
issues | Detected problems: type, the page it was found on, a recommendation, destination. |
links | Every link checked: link URL, owning page, link text, exists flag. |
images | Every image: image URL, page, alt text, size, exists flag. |
css | Every stylesheet: URL, page, size, exists flag. |
scripts | Every script: URL, page, location, size, exists flag. |
broken-links | Only the links whose target failed. |
broken-images | Only the images that failed to load. |
broken-css | Only the stylesheets that failed. |
broken-scripts | Only the scripts that failed. |
words | Site-level content-integrity word list (not crawl-scoped). |
summary | The crawl’s headline metrics (single object, not paginated). |
Column reference
pages : content_status, content_url, page_title, heading_1, load_time_ms,
size_kb, page_redirects_to, seo_issues
issues : id, issue_type, found_on_url, recommendation, destination, screenshot_path
links : link_exists, link_url, owning_page_url, link_text, link_advisory, screenshot_path
images : image_exists, image_url, image_page_url, image_alt, size_kb
css : css_exists, css_url, css_page_url, size_kb
scripts : script_exists, script_url, script_page_url, script_location, size_kbThe *_exists columns are "Yes"/"No"-style strings; the broken-* types are pre-filtered to failures.
The summary type
{
"crawlId": "8f1c2e4a-…",
"summary": { /* timings, counts, mode, … */ },
"broken": 5,
"warnings": 7
}The words type
{
"domain": "example.com",
"total": 1432,
"rows": [
{ "word": "checkout", "status": "approved",
"first_seen_url": "https://example.com/cart", "flagged_pages": null }
]
}Examples
List broken links to fix
curl -s -H "Authorization: Bearer $KEY" \
"$BASE/api/v1/example.com/latest/broken-links" \
| jq '.rows[] | {link_url, owning_page_url}'Node.js — Pull every issue across all pages
const BASE = 'https://eu1.website-toolkit.co.uk';
const KEY = process.env.CRAWLER_API_KEY;
const headers = { Authorization: `Bearer ${KEY}` };
async function fetchAll(domain, crawlId, type) {
const out = [];
for (let page = 1; ; page++) {
const url = `${BASE}/api/v1/${domain}/${crawlId}/${type}?page=${page}&pageSize=500`;
const r = await fetch(url, { headers });
if (!r.ok) throw new Error(`${type} ${r.status}`);
const { rows, total, pageSize } = await r.json();
out.push(...rows);
if (page * pageSize >= total) break;
}
return out;
}
const issues = await fetchAll('example.com', 'latest', 'issues');
console.log(`${issues.length} issues`);Python — Summary + broken resources
import os, requests
BASE = "https://eu1.website-toolkit.co.uk"
H = {"Authorization": f"Bearer {os.environ['CRAWLER_API_KEY']}"}
def get(domain, crawl, kind):
r = requests.get(f"{BASE}/api/v1/{domain}/{crawl}/{kind}", headers=H, timeout=30)
r.raise_for_status()
return r.json()
summary = get("example.com", "latest", "summary")
print("broken:", summary["broken"], "warnings:", summary["warnings"])
for kind in ("broken-links", "broken-images", "broken-css", "broken-scripts"):
print(kind, "→", get("example.com", "latest", kind)["total"])Error responses
All errors are JSON: { "error": "…" }.
| Status | Meaning |
|---|---|
202 | Crawl trigger accepted — the crawl is queued. |
400 | Invalid domain, invalid crawlId, unknown type, or the domain isn’t reachable. |
401 | Missing or unrecognised API key. |
403 | Crawl trigger: your key isn’t permitted to crawl that domain. |
404 | No data for the domain, unknown crawl, or data that isn’t yours. |
429 | Crawl trigger: your plan’s crawl allowance is exhausted. |
500 | Something went wrong our end. |
Need help or a new feature?
Need an API key for the beta service, or want a request type that isn’t here yet? We’d love to hear what you’re building.