Train on website

Grivo crawls your website automatically and trains your AI chatbot on every page. Here's how to get the best results.

Last updated: April 2026

How website training works

When you provide a website URL, Grivo's crawler visits every page linked from that URL. It extracts the text content, ignores navigation/footer boilerplate, and builds a knowledge base. Your chatbot then uses this knowledge base to answer visitor questions.

The crawler follows internal links recursively, so entering your homepage URL is usually enough to capture your entire site.

Enter your website URL

  1. Go to your chatbot's Training tab
  2. Select Website URL as the source
  3. Paste your website URL (e.g., https://yoursite.com)
  4. Click Start Training

Grivo shows a real-time progress bar as it crawls and processes your pages.

💡 Tip: If you only want to train on a specific section (e.g., your docs), enter that section's URL (e.g., https://yoursite.com/docs) instead of the homepage.

What gets crawled

  • Included: All HTML pages linked from your starting URL within the same domain
  • Included: Text content from headings, paragraphs, lists, and tables
  • Excluded: JavaScript-only content that requires client-side rendering (use PDF upload for SPAs)
  • Excluded: Pages behind login walls or password protection
  • Excluded: External links (other domains)
  • Respected: Your robots.txt rules - Grivo's crawler respects disallow directives

Crawl settings

You can fine-tune crawling behavior:

  • Max pages - Limit how many pages are crawled (Free: 50 pages, Pro: 500 pages)
  • Include/exclude patterns - Specify URL patterns to include or exclude (e.g., exclude /blog/* if you only want product pages)
  • Crawl depth - Control how many link levels deep the crawler goes

Retrain your chatbot

Your website content changes over time. Retrain your chatbot to keep answers accurate:

  1. Go to your chatbot's Training tab
  2. Click Retrain next to your website source
  3. Grivo re-crawls your site and updates the knowledge base

📌 Recommendation: Retrain monthly, or after major content updates (new product features, pricing changes, updated docs).

Troubleshooting

  • Chatbot gives wrong answers - Check if the correct pages were crawled in the Training tab. Some pages may be blocked by robots.txt or require JavaScript rendering.
  • Pages missing from crawl - Make sure the pages are linked from your starting URL. Orphan pages (no internal links) won't be discovered.
  • Crawl is slow - Large sites (500+ pages) take longer. The crawler processes pages sequentially to respect your server.

Need to add content that isn't on your website? See Train on PDF or Train on docs.