Website-Toolkit.co.uk | Useful Tools for Freelancers & Agencies

Quick start

The API does two things: start a crawl for a domain, and read the data a crawl produced. Here’s one of each, plus pulling the broken links.

# Your API key (ask us for one for the beta service)
KEY="your-api-key"
BASE="https://eu1.website-toolkit.co.uk"

# 1. Start a fresh crawl of a domain
curl -s -X POST -H "Authorization: Bearer $KEY" \
 "$BASE/api/crawl-site/example.com"

# 2. Pull the issues from the most recent crawl
curl -s -H "Authorization: Bearer $KEY" \
 "$BASE/api/v1/example.com/latest/issues"

# 3. Pull just the broken links
curl -s -H "Authorization: Bearer $KEY" \
 "$BASE/api/v1/example.com/latest/broken-links"

Authentication

Every request needs your API key. Keys are issued per account; a revoked key stops working immediately. There are two ways to send it:

Authorization header (recommended): Authorization: Bearer your-api-key. Headers aren’t written to access logs, and GitHub Actions automatically masks the secret.
Token in the URL: For callers that can only fire a plain URL (some webhooks, uptime pingers). Convenient, but the token can appear in server logs.

Your data stays yours.

The read API only serves crawls that belong to your account. Free, public audits are not retrievable through the API. Requests for someone else’s data return a 404.

Trigger a crawl

Start a full crawl of a domain — e.g. straight after a deploy. Open by design: use GET or POST, with the key in either the header or the URL.

POST/api/crawl-site/{domain}— key in header

GET/api/crawl-site/{domain}— key in header

POST/api/crawl-site/{token}/{domain}— key in URL

GET/api/crawl-site/{token}/{domain}— key in URL

# Header auth (recommended)
curl -s -X POST -H "Authorization: Bearer $KEY" \
 "$BASE/api/crawl-site/example.com"

# Token-in-URL (for webhooks / pingers that can't set headers)
curl -s "$BASE/api/crawl-site/your-api-key/example.com"

On success you get 202 Accepted:

{ "queued": true, "domain": "example.com", "url": "https://example.com" }

Trigger from GitHub Actions

The cleanest setup: a workflow that fires the trigger on every deploy, with the key stored as a repository secret. GitHub redacts the secret from run logs automatically.

name: Re-crawl on deploy
on:
  push:
    branches: [main] # or: workflow_run, after your deploy job

jobs:
  trigger-crawl:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger a fresh crawl
        run: |
          curl -sS -X POST \
            -H "Authorization: Bearer ${{ secrets.CRAWLER_KEY }}" \
            --fail-with-body \
            "https://eu1.website-toolkit.co.uk/api/crawl-site/example.com"

List crawls for a domain

GET/api/v1/{domain}

{domain} is the site’s hostname, e.g. example.com. Returns up to 25 crawls, newest first, plus the list of request types you can ask for.

{
  "domain": "example.com",
  "types": ["pages","issues","links","images","css","scripts",
           "broken-links","broken-images","broken-css","broken-scripts",
           "words","summary"],
  "crawls": [
    {
      "crawlId": "8f1c2e4a-1b2c-4d3e-9f8a-7b6c5d4e3f2a",
      "startTime": "2026-06-20T09:00:00.000Z",
      "endTime": "2026-06-20T09:04:12.000Z",
      "status": "Completed",
      "pages": 128,
      "brokenPages": 1, "brokenLinks": 4,
      "brokenImages": 0, "brokenCss": 0, "brokenJs": 0,
      "warnings": 7, "mode": "advanced"
    }
  ]
}

Get the data feed

GET/api/v1/{domain}/{crawlId}/{type}

{crawlId} — A crawl ID from the discovery call, or the word latest.
{type} — One of the request types below.

Pagination for list types: ?page=1&pageSize=100. Default 100, max 1000.

{ "total": 128, "page": 1, "pageSize": 100, "rows": [ /* … */ ] }

Request types & columns

Type	What you get
`pages`	Every crawled page: status, title, H1, load time, size, redirect target, SEO issues.
`issues`	Detected problems: type, the page it was found on, a recommendation, destination.
`links`	Every link checked: link URL, owning page, link text, exists flag.
`images`	Every image: image URL, page, alt text, size, exists flag.
`css`	Every stylesheet: URL, page, size, exists flag.
`scripts`	Every script: URL, page, location, size, exists flag.
`broken-links`	Only the links whose target failed.
`broken-images`	Only the images that failed to load.
`broken-css`	Only the stylesheets that failed.
`broken-scripts`	Only the scripts that failed.
`words`	Site-level content-integrity word list (not crawl-scoped).
`summary`	The crawl’s headline metrics (single object, not paginated).

Column reference

pages : content_status, content_url, page_title, heading_1, load_time_ms,
        size_kb, page_redirects_to, seo_issues
issues : id, issue_type, found_on_url, recommendation, destination, screenshot_path
links : link_exists, link_url, owning_page_url, link_text, link_advisory, screenshot_path
images : image_exists, image_url, image_page_url, image_alt, size_kb
css : css_exists, css_url, css_page_url, size_kb
scripts : script_exists, script_url, script_page_url, script_location, size_kb

The *_exists columns are "Yes"/"No"-style strings; the broken-* types are pre-filtered to failures.

The summary type

{
  "crawlId": "8f1c2e4a-…",
  "summary": { /* timings, counts, mode, … */ },
  "broken": 5,
  "warnings": 7
}

The words type

{
  "domain": "example.com",
  "total": 1432,
  "rows": [
    { "word": "checkout", "status": "approved",
      "first_seen_url": "https://example.com/cart", "flagged_pages": null }
  ]
}

Examples

List broken links to fix

curl -s -H "Authorization: Bearer $KEY" \
  "$BASE/api/v1/example.com/latest/broken-links" \
  | jq '.rows[] | {link_url, owning_page_url}'

Node.js — Pull every issue across all pages

const BASE = 'https://eu1.website-toolkit.co.uk';
const KEY = process.env.CRAWLER_API_KEY;
const headers = { Authorization: `Bearer ${KEY}` };

async function fetchAll(domain, crawlId, type) {
  const out = [];
  for (let page = 1; ; page++) {
    const url = `${BASE}/api/v1/${domain}/${crawlId}/${type}?page=${page}&pageSize=500`;
    const r = await fetch(url, { headers });
    if (!r.ok) throw new Error(`${type} ${r.status}`);
    const { rows, total, pageSize } = await r.json();
    out.push(...rows);
    if (page * pageSize >= total) break;
  }
  return out;
}

const issues = await fetchAll('example.com', 'latest', 'issues');
console.log(`${issues.length} issues`);

Python — Summary + broken resources

import os, requests

BASE = "https://eu1.website-toolkit.co.uk"
H = {"Authorization": f"Bearer {os.environ['CRAWLER_API_KEY']}"}

def get(domain, crawl, kind):
    r = requests.get(f"{BASE}/api/v1/{domain}/{crawl}/{kind}", headers=H, timeout=30)
    r.raise_for_status()
    return r.json()

summary = get("example.com", "latest", "summary")
print("broken:", summary["broken"], "warnings:", summary["warnings"])

for kind in ("broken-links", "broken-images", "broken-css", "broken-scripts"):
    print(kind, "→", get("example.com", "latest", kind)["total"])

Error responses

All errors are JSON: { "error": "…" }.

Status	Meaning
`202`	Crawl trigger accepted — the crawl is queued.
`400`	Invalid domain, invalid crawlId, unknown type, or the domain isn’t reachable.
`401`	Missing or unrecognised API key.
`403`	Crawl trigger: your key isn’t permitted to crawl that domain.
`404`	No data for the domain, unknown crawl, or data that isn’t yours.
`429`	Crawl trigger: your plan’s crawl allowance is exhausted.
`500`	Something went wrong our end.

Sitemap & single-page checks

Two related, data-returning endpoints you may find useful alongside the Data API.

Find new URLs from the sitemap

Diffs the live sitemap against known URLs — a cheap way to catch new pages.

GET/api/results/{domain}/{crawlId}/sitemap-check

{ "sitemapTotal": 120, "knownCount": 118, "newUrls": ["https://example.com/new-page"] }

Submit & pull a single page check

Submit one URL, get a crawlId back instantly, then pull results when ready.

POST/api/page-check

GET/api/page-check/{crawlId}

# Submit
curl -s -X POST -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
 -d '{"url":"https://example.com/page"}' \
 "$BASE/api/page-check"
# → { "crawlId": "…", "status": "queued" }

# Pull when ready
curl -s -H "Authorization: Bearer $KEY" \
 "$BASE/api/page-check/{crawlId}"
# → { "status": "running" } …or the full results once complete

Need help or a new feature?

Need an API key for the beta service, or want a request type that isn’t here yet? We’d love to hear what you’re building.

Trigger crawls & pull data out as JSON.