Datasets

We'll explore the Google Sheet data format reuiqrements so your chatbot can effectively query and analyze data like a database.

What is a Dataset?

A dataset is a collection of data that is organized into columns and rows. Column headers are data points and any rows of data below the headers are corresponding values. It provides structure to data that is easy for AI to understand, but also easy for you to manage.

You could have a small handlful of data rows, or several thousand rows of data in a single Google Sheet. Not all data is well suited for a Google Sheet. FAQs for example, are better suited as a knowledge base provided in PDF format. When you connect a dataset in Google Sheets, your chatbot becomes connected to a database. There are a number of use cases for datasets and where you would want to use a Google Sheet as a data source:

  • Lists of frequently changing data (i.e. Real Estate listings, Restaurant menus, directories of people, products, services, events, etc.)
  • Anything with criteria for more perosnalized responses
  • Anything that could be quantified
  • Anything that can be analyzed to generate insights

Sample Datasets

Here are sample datasets you can test with in a chatbot. We have set the share permissions on each sheet in our Google Drive so that anyone with the link can view it. You just need to click to view it, copy the URL in your browser and paste it into a Botsheets chatbot when you're prompted to paste in a link to a Google Sheet. You can also create a copy of the Google Sheet and install it to your own Google Drive where you can use it as a template for structuring your own datasets.

Category

View

Make a Copy

Real Estate ListingsLead GenerationView SheetCopy Sheet
Amazon ReviewsFeedbackView SheetCopy Sheet
News AggregationNewsView SheetCopy Sheet
College DataEducationView SheetCopy Sheet
Supermarket SalesSales ReportView SheetCopy Sheet
Titantic PassengersHistoricalView SheetCopy Sheet
Apparel Sales ReportSales ReportView SheetCopy Sheet
Amazon Sales ReportSales ReportView SheetCopy Sheet

Preparing Your Data

If you use Google Sheets as a data source, you'll need to ensure that your data is prepared properly to get value out of the Botsheets Google Sheets integration.

At a minimum, a Google Sheet dataset connected to Botsheets requires at least one top row of column headers representing data points and at least one row of data.

There isn't a limit to the number of rows of data, but you should limit the number of columns to around 20 as with too many columns you'll experience a serious degradation in performance.

The labels you use for your column headers should be concise and succinct for the best performance. Do not use the column headers to be overly descriptive, or as space to engineer prompts. The data point used in the column header is the prompt. You can move columns around, but ensure that a data point in a column header matches with a data point sepcificed in the Botsheets dashboard.

Here are some additional recommendations to ensure best practice for datasets with Botsheets

Column HeadersUse alphanumeric characters symbols such as $ and %. Do not use semicolons, periods, and commas.
Column HeadersFor column header names with separate words we recommend using underscore. For example: email_address
Column HeadersFor pricing data, put the currency symbol in the header rather than in each row of data. For example, your column header may look like this: "$_in_USD". Use just the number in the data.
Column HeadersLimit the number of columns and text in a row to something reasonable. Max 20 is suggested for optimitum performance.
Column HeadersDo not have empty column headers, but have data in rows. Always have column headers.
Column HeadersDo not use duplicate column names. Each column header name should be unique.
Row DataBe consistent about your data (if a column holds numbers, every row should hold numbers, etc.)
Row DataAvoid using commas in data if possible. It will still work, just not reliably. For example, for numbers use 100000 instead of 100,000
Row DataYour data should only be text, but you can include links and the link will be included in the response. You might have a column header named "URL" and your data could be a link like https://www.botsheets.com.