Hi HN, I'm the author. I originally built SiteOne Crawler in PHP+Swoole back in 2023.
Last year I rewrote it entirely in Rust — 25% faster execution, 30% lower memory,
and a single native binary with zero runtime dependencies.
The feature I'm most excited about is CI/CD quality gating. The idea is simple:
crawl your entire website after deploy and block the pipeline if quality regresses.
This crawls every page, scores it across 5 categories (Security, Performance, SEO,
Accessibility, Best Practices) on a 0–10 scale, and exits with code 10 if any
threshold is breached. Drop it into GitHub Actions, GitLab CI, or any pipeline
as a single binary — no Docker, no Node, no runtime needed.
Beyond CI/CD, it also does:
- Offline website archiving with a built-in HTTP server for self-hosting
- Full-site markdown export with deduplicated content (great for feeding to LLMs)
- Interactive HTML audit reports you can email via built-in SMTP
- Sitemap generation
Hi HN, I'm the author. I originally built SiteOne Crawler in PHP+Swoole back in 2023. Last year I rewrote it entirely in Rust — 25% faster execution, 30% lower memory, and a single native binary with zero runtime dependencies.
The feature I'm most excited about is CI/CD quality gating. The idea is simple: crawl your entire website after deploy and block the pipeline if quality regresses.
Example:
Install: This crawls every page, scores it across 5 categories (Security, Performance, SEO, Accessibility, Best Practices) on a 0–10 scale, and exits with code 10 if any threshold is breached. Drop it into GitHub Actions, GitLab CI, or any pipeline as a single binary — no Docker, no Node, no runtime needed.Beyond CI/CD, it also does: - Offline website archiving with a built-in HTTP server for self-hosting - Full-site markdown export with deduplicated content (great for feeding to LLMs) - Interactive HTML audit reports you can email via built-in SMTP - Sitemap generation
Sample HTML report: https://crawler.siteone.io/html/2024-08-23/forever/cl8xw4r-f... GitHub: https://github.com/janreges/siteone-crawler
I'd love to hear your feedback — especially if you're already doing something similar in your CI/CD pipelines. What thresholds would you find useful?