Training a chatbot
Train chatbots using file upload, URL, and/or snippet options.
Last updated
Train chatbots using file upload, URL, and/or snippet options.
Last updated
Proto streamlines chatbot creation by automatically gathering information from your website, files, or text snippets. This feature simplifies the process of creating a virtual assistant, enabling automation and scalability in your customer support endeavours. The Data Sources page is divided into two sections:
In the top half of the page, you'll find the train bot options with three buttons:
"Upload File": Allows you to train the chatbot using content found in the uploaded file.
"Add URL": This enables the chatbot to gather information and learn from the entire content available on your site.
"Paste Snippet": Use this text as a snippet to train the chatbot.
The bottom part of the page contains the "Existing Content" table, displaying content that was added before using the options mentioned above. You can filter this content using the search bar and/or Type and Status filters.
Click on the "Add URL" button.
Add the name for the URL added.
Input the URL of your website.
Select the Crawl (train on this website and its linked websites) option.
Enable/disable the "Automatically retrain every X hours" option.
Click the "Train chatbot on URL" button.
This process enables the chatbot to gather information and learn from the entire content available on your site. You can then check the training status of your sources by reviewing them in the Existing Content section.
Note! Not all websites can be effectively scraped. This may be due to formatting inconsistencies, permission restrictions, or issues with site mapping. If you encounter difficulties uploading a site, we recommend transferring the content to a Google Doc instead.
There are instances where the chatbot may not be able to provide users with a response. One possible cause could be difficulties accessing your website due to restrictions imposed by a proxy server. In such cases, you can Enable proxy bypass to circumvent this issue and ensure seamless interaction with the chatbot.
You also have the option to train your chatbot only on a selected URL. To do so, simply choose the Single URL option.
Exclude pages from being crawled
You can restrict the training of your chatbot to specific pages of your website by listing the corresponding URLs under Exclude patterns. Moreover, you can efficiently exclude a page and its subpages by adding a pattern. For instance, the image below exempts all pages starting with *https://example.com/blog/.*
The bot, which is crawling through website pages to check for changes, will undergo a retraining process periodically. This feature is particularly useful for bots as websites frequently update their layouts, content, or underlying code, which can affect the bot's ability to extract information accurately.
Available frequency is 1 hour, 12 hours, 1 day, 1 week.
You have the option to train your chatbot on content found in uploaded files. This serves as an alternative in scenarios where your company's website isn't updated or if you don't have a website.
Click "Upload File".
Insert name and attach a file (supported file types: CSV, JSON, PDF. Up to 5MB).
Click the "Train chatbot on file" button.
You can then check the training status of your sources by reviewing them in the Existing Content section.
Click "Paste Snippet".
Insert name and Snippet text (this text will be used as a snippet to train the chatbot).
Click the "Train chatbot on Snippet" button.
You can then check the training status of your sources by reviewing them in the Existing Content section.
After the user clicks on the "Train chatbot on URL/file/snippet" button, the added source appears in the "Existing Content" table, displaying its name, last updated date and status.
Note! When a source is added, it will appear as "Pending" until the process is completed successfully, at which point it will change to "Success". If the process fails, it will show as "Failure". During retraining, it will display "Loading", which will then change to "Success" once the retraining is complete, regardless of the time taken.
Feel free to filter this content using the search bar and/or Type and Status filters for your convenience.
You can open an added source to view detailed information in the right sidebar, including content grabbed from these sources. Hovering over the name field allows you to edit the name. Additionally, URL content has a retrain option next to the close window cross at the top.
To delete the content from the "Existing content" table simply select the desired source in the table, and click the "Delete Content" button. Confirm the deletion by clicking the "Delete" button in the pop-up dialogue.
If your website is hosted with a service provider like Cloudflare, your URLs might be inaccessible to the chatbot. This can occur because many web service providers automatically block automated web traffic.
To fix this issue on Cloudflare:
Go to your site in Cloudflare
Go to Security → WAF → Tools
Enter the IP address 20.198.250.74 in the IP, IP range, country name, or ASN field. Set Action to Allow, then click Add
Back in Proto, add your website and be sure to disable Enable proxy bypass.