Getting Started
Datasets
We'll explore the Google Sheet data format reuiqrements so your chatbot can effectively query and analyze data like a database.
What is a Dataset?
A dataset is a collection of data that is organized into columns and rows. Column headers are data points and any rows of data below the headers are corresponding values. It provides structure to data that is easy for AI to understand, but also easy for you to manage.
You could have a small handlful of data rows, or several thousand rows of data in a single Google Sheet. Not all data is well suited for a Google Sheet. FAQs for example, are better suited as a knowledge base provided in PDF format. When you connect a dataset in Google Sheets, your chatbot becomes connected to a database. There are a number of use cases for datasets and where you would want to use a Google Sheet as a data source:
- Lists of frequently changing data (i.e. Real Estate listings, Restaurant menus, directories of people, products, services, events, etc.)
- Anything with criteria for more perosnalized responses
- Anything that could be quantified
- Anything that can be analyzed to generate insights
Sample Datasets
Here are sample datasets you can test with in a chatbot. We have set the share permissions on each sheet in our Google Drive so that anyone with the link can view it. You just need to click to view it, copy the URL in your browser and paste it into a Botsheets chatbot when you're prompted to paste in a link to a Google Sheet. You can also create a copy of the Google Sheet and install it to your own Google Drive where you can use it as a template for structuring your own datasets.
Category | View | Make a Copy | |
Real Estate Listings | Lead Generation | View Sheet | Copy Sheet |
Amazon Reviews | Feedback | View Sheet | Copy Sheet |
News Aggregation | News | View Sheet | Copy Sheet |
College Data | Education | View Sheet | Copy Sheet |
Supermarket Sales | Sales Report | View Sheet | Copy Sheet |
Titantic Passengers | Historical | View Sheet | Copy Sheet |
Apparel Sales Report | Sales Report | View Sheet | Copy Sheet |
Amazon Sales Report | Sales Report | View Sheet | Copy Sheet |
Preparing Your Data
If you use Google Sheets as a data source, you'll need to ensure that your data is prepared properly to get value out of the Botsheets Google Sheets integration.
At a minimum, a Google Sheet dataset connected to Botsheets requires at least one top row of column headers representing data points and at least one row of data.
There isn't a limit to the number of rows of data, but you should limit the number of columns to around 20 as with too many columns you'll experience a serious degradation in performance.
The labels you use for your column headers should be concise and succinct for the best performance. Do not use the column headers to be overly descriptive, or as space to engineer prompts. The data point used in the column header is the prompt. You can move columns around, but ensure that a data point in a column header matches with a data point sepcificed in the Botsheets dashboard.
Here are some additional recommendations to ensure best practice for datasets with Botsheets
Column Headers | Use alphanumeric characters symbols such as $ and %. Do not use semicolons, periods, and commas. |
Column Headers | For column header names with separate words we recommend using underscore. For example: email_address |
Column Headers | For pricing data, put the currency symbol in the header rather than in each row of data. For example, your column header may look like this: "$_in_USD". Use just the number in the data. |
Column Headers | Limit the number of columns and text in a row to something reasonable. Max 20 is suggested for optimitum performance. |
Column Headers | Do not have empty column headers, but have data in rows. Always have column headers. |
Column Headers | Do not use duplicate column names. Each column header name should be unique. |
Row Data | Be consistent about your data (if a column holds numbers, every row should hold numbers, etc.) |
Row Data | Avoid using commas in data if possible. It will still work, just not reliably. For example, for numbers use 100000 instead of 100,000 |
Row Data | Your data should only be text, but you can include links and the link will be included in the response. You might have a column header named "URL" and your data could be a link like https://www.botsheets.com. |