📝Knowledge Base

The "knowledge base" refers to the business information uploaded to train an AI chatbot.

Technical overview of the AI chatbot

The knowledge base is split up into multiple sections that make it super easy to add the information about your business.

It's important to note that more data is not always better. We recommend that you only add relevant information that helps provide context to the overall conversation.

How GPT4Business Works with GPT Models and Knowledge-Based Functionality

Understanding GPT Models

GPT4Business utilizes GPT models, like GPT-4, which are designed as Generative Pre-trained Transformers. Let’s break that down:

Generative: GPT generates coherent and relevant text responses based on input.
Pre-trained: The models are trained on massive datasets, enabling them to understand and respond to a wide variety of queries.
Transformer: This architecture identifies and prioritizes relationships within the input text, dynamically focusing on critical elements to generate meaningful responses.

At its essence, GPT4Business’s core functionality revolves around taking input text, processing it, and generating output text. While other models handle diverse data inputs (e.g., images, sound), GPT models are fundamentally text-oriented, requiring all data to be converted into text format.

Token Limits and Context Windows

A critical aspect of GPT models is their token-based system, which dictates how much input data they can handle at once:

Tokens: GPT breaks down input into smaller parts called tokens. For example, “Hello, world!” is three tokens.
Model Token Capacities:
GPT-3.5: Supports up to 16,000 tokens (recent upgrade).
GPT-4: Handles up to 128,000 tokens, suitable for more complex or larger datasets.

When using GPT4Business, any input—whether user queries or file data—must fit within these token limits to ensure smooth functionality.

Handling Files (CSV, Excel) in GPT4Business

CSV Files:
- GPT4Business can process CSV files, as these are simple text-based formats. However, large CSV files may exceed token limits, requiring trimming or segmentation before use.
- For successful integration, data can be structured within the prompt as:

This is a CSV file. Please analyze the following data and answer based on the details provided:

[CSV Content]

Excel Files:

Excel files (saved in a zipped XML format) cannot be directly read. Here’s how to use them with GPT4Business:

Convert Excel to a CSV file.
(Optional) Convert CSV data into JSON (JavaScript Object Notation), which GPT4Business processes efficiently.
Input the JSON or text representation into the system, ensuring it fits within token limits.

By enabling compatibility with these formats, GPT4Business simplifies the use of structured data within the AI platform.

Advanced Knowledge Base Integration

GPT4Business leverages advanced knowledge-base functionality to improve responses and provide precise information:

Function Calling

GPT4Business uses function calling, enabling the AI to decide when and how to access external information. This approach avoids the inefficiencies of “prompt injection,” which involves manually embedding data into every query.

Semantic Search for Intelligent Matching

The knowledge base relies on semantic search, which retrieves data based on meaning and context rather than exact keyword matches.

Vector Embeddings: Each document in the knowledge base is converted into vector embeddings (mathematical representations of relationships between words).

Query Matching: When a user asks a question, GPT4Business generates a vector embedding for the query, compares it to stored embeddings, and retrieves the most relevant documents.

Workflow Example

A user asks, “What are the office hours for GPT4Business support?”
The system:

Processes the query.
Identifies that external information is required.
Searches the knowledge base using semantic search.

Relevant documents are retrieved, and GPT4Business compiles an accurate response, referencing the knowledge base as needed.

Enhancing Functionality with APIs and Integrations

GPT4Business is evolving to include additional capabilities, such as:

Database Queries: Directly search structured databases (e.g., product inventories) to provide specific answers, such as “What sizes are available for this item?”
Dynamic File Reading: Interact with Excel or other complex file formats by extracting relevant data on-demand.
API Connections: Query external APIs for real-time data, such as shipping information or live inventory.

These enhancements allow GPT4Business to go beyond text-based responses, enabling it to integrate seamlessly with existing business systems.

The GPT4Business Advantage

Intelligent Automation: Leverages GPT’s ability to process natural language queries and interact dynamically with external systems.
Time Efficiency: Reduces response time with pre-built semantic search and knowledge-base functionality.
Scalability: Handles large datasets with token-efficient approaches, scaling seamlessly for growing businesses.
Versatility: Adapts to diverse use cases, from customer support to data analysis, while maintaining high accuracy and relevance.

Conclusion

GPT4Business combines the power of GPT models with advanced knowledge-base integration and real-time system interactions. By handling complex queries, structured data, and dynamic actions, it serves as a robust tool for businesses to elevate their efficiency and customer experience.

Data Storage (Characters)

Storage (characters) refers to the amount of text uploaded to an AI agent's knowledge base. Characters are the unit of storage. This definition is 154 characters long.

When you create an AI agent, upload data or scrape a website to add that information to the AI agent's knowledge base. It's all based in text and the amount of characters in that text.

For example if you scrape your website, the AI agent will look at all of the text on the page and use those characters.

What is 1,000,000 characters equivalent to?

1 million characters of storage is roughly equal to 100 website pages, or about 2,500 pages from a book.

Copy & Paste Text

Simply paste in text to train the AI chatbot.

Upload Documents

You can upload different files like .pdf, .txt, .doc, .docx

Having issues when uploading your files?

Troubleshooting File Errors in GPT4Business Chatbot Training

If you’re encountering an error while training a file in your GPT4Business chatbot, such as “something went wrong with that file,” it typically indicates an issue with the file itself. Here’s a detailed guide to diagnose and resolve common file-related problems:

Common Issues with Files

Image-Based PDFs

Problem: The most frequent issue is with PDFs that are 100% image-based. GPT4Business cannot extract text from image-only PDFs.
Solution: Use an Optical Character Recognition (OCR) tool to convert the images in the PDF into readable text. Save the result as a new text-based PDF or plain text file.

Corrupted Files

Problem: Corruption in files, whether PDF, Word, or text files, can prevent GPT4Business from reading them.
- Solution:
  - Step 1: Open the file locally using an appropriate program. For example:
    Use Adobe Reader for PDFs.
    Use Microsoft Word for Word documents.
  - Step 2: If the file fails to open or appears corrupted, upload it to Google Drive:
    Open the file within Google Docs or Google Drive.
    Export the file as a new PDF, Word document, or plain text file.
    This often corrects minor corruption issues.

Advanced Troubleshooting with GPT-4

GPT4Business provides an alternative approach by leveraging GPT-4’s Advanced Data Analysis (ADA) to debug files:

Step 1: Upload the problematic file to GPT-4 via a chat window enabled with the ADA plugin.
Step 2: Ask GPT-4 to analyze the file with a simple query like:

“What could be wrong with this file? GPT4Business cannot read it and shows an error.”

Step 3: GPT-4 will examine the file and identify the problem, such as:
Whether the file contains text or images.
Potential corruption or formatting issues.

Recommendations for File Preparation

Ensure Text Accessibility: For PDFs, verify that text is selectable and not embedded as images.
File Formats: Stick to supported formats (e.g., plain text, text-based PDFs, and Word documents).
Check Size and Structure:
- Large files or files with unusual formatting might cause issues.
- Simplify the structure if needed.

If the Issue Persists

Contact Support:
- Open a ticket and attach the problematic file.
- Include details of the error message and any troubleshooting steps you’ve already tried.
Request Assistance:
- Schedule a call with the GPT4Business team for direct support.

GPT4Business’s tools and guidance ensure smooth integration of files into your chatbot. Always double-check files for readability and compatibility, and leverage advanced tools like GPT-4 for seamless troubleshooting.

Scrape Websites

Scrape any webpage(s) to add all of the content to your knowledge base.

Data Quality in AI Agents

When it comes to creating AI agents, especially with platforms like GPT4Business, the quality of data is paramount. Here's why:

Bad Data Equals Bad Performance

The performance of an AI agent is directly proportional to the quality of data it's trained on. Simply scraping all the content from a website and feeding it to a bot won't yield good results. Many websites contain outdated, irrelevant, or inaccurate information that can hinder the bot's performance.

The Structure of Data Matters

Not all content on a website is useful. For instance, some blog posts might lack relevant information about a product or might be generated automatically by the application. Including such data can lead to the bot being misinformed.

Strategies for Effective Data Collection

Selective Scraping: Instead of scraping everything, focus on the most relevant and accurate pages.
Utilizing FAQ Pages: These pages are goldmines as they often contain question-answer pairs. Platforms like Notifier use knowledge base matching to find text that matches customer queries, making FAQ pages extremely valuable.
Handling Unstructured Data: If a webpage contains unstructured data, like paragraphs without clear headings, it's possible to preprocess this data using tools like Chat GPT and the WebPilot plugin. This helps in structuring the data in a more bot-friendly manner.
Dealing with JavaScript-heavy Pages: Some web pages rely heavily on JavaScript to display content. In such cases, tools like the WebPilot plugin might not work effectively. However, using the Chrome extension 'Page Plain Text' can help extract all the text from such pages, ensuring the bot gets all the necessary information.

Scraping a Google Doc

The website scraper is also able to see and scrape all of the text data from a public Google Doc.

How to Connect GPT4Business to a Google Doc.

This guide walks you through connecting a Google Doc to GPT4Business for seamless data integration. The process enables your document to update dynamically and be accessible for scraping via the web scraper tool.

Step 1: Prepare Your Google Doc

Open the Google Doc you wish to connect. Ensure the content is ready and organized for integration.

Step 2: Publish the Document to the Web

Go to File Options:

In your Google Doc, click on File in the menu bar.
Select Share, then click Publish to the web.

Publish the Document:

Click the Publish button, and Google will generate a public URL for your document.

Enable Automatic Updates:

Before copying the URL, ensure the “Automatically republish when changes are made” option is enabled.
- This ensures that any updates to your Google Doc are automatically reflected at the provided URL.

Copy the URL:

Copy the URL generated by Google Docs.

Step 3: Test the URL

Paste the copied URL into your browser to confirm the document is published.
Check that the document is displayed as scrapable HTML.

Step 4: Integrate the URL with GPT4Business

Access the Web Scraper:

In GPT4Business, navigate to the web scraper tool.

Input the URL:

Paste the copied Google Doc URL into the web scraper.
GPT4Business will now use this URL to scrape and incorporate the document’s data into its knowledge base.

Step 5: Maintain Updates

GPT4Business supports automated re-scraping functionality. If enabled, the web scraper will re-scrape the published document every 24 hours, ensuring the latest data is always available.
This is particularly useful if you frequently update your Google Doc.

Bonus: Use Other Platforms

This method isn’t limited to Google Docs. You can use any platform that supports publishing to the web, such as:

Notion: Publish a Notion page to the web and input its URL into the web scraper for similar functionality.

Conclusion

By publishing and connecting your Google Doc to GPT4Business, you enable real-time updates and dynamic data scraping. If you have any questions or need assistance, reach out to our support team. This feature ensures your chatbot stays updated with minimal manual effort.

24-hour Auto-Scraping

This is a powerful option to enable via the knowledge base that will tell the AI agent to rescrape the selected website pages every 24 hours.

All new or changed information will be updated within the knowledge base.

When you enter in a URL to scrape there will be an option to select the auto-scrape option.

Google Docs

With the GPT4Business knowledge base you can add in the URL of a published Google doc.

All of the text data on this page will be added into the knowledge base of an AI Agent to be trained.

Question & Answer Pairs

The most accurate and powerful way to add data to your AI chatbot knowledge base.

You can add questions and answers manually, via a CSV upload or via "training better response" and saving an updated response.

Export Q/A Data

This allows you to download your question & answer pairs into a .CSV file which is formatted perfectly to be uploaded into another bot's knowledge base.

The export should look like this:

Data & Content Security

GPT4Business does everything in our power to keep your data private and secure.

What type of data is stored?

Websites that are scraped, documents and/or text that are uploaded to train an AI agent are securely stored within a vector database hosted in the United States.

How is my data used?

We only use your data to train your AI chatbots and improve the product, we do not pass it on to any 3rd parties for marketing purposes.

Who owns the data that is uploaded?

The content and data you upload to GPT4Business is owned by your account. Your data is not used by GPT4Business for any other purpose than providing you with the A.I. Agent services.

Any and all data is deleted from our servers after you delete it.

What happens to customers' data when they upload it? What happens with my clients’ companies’ data? Who is responsible for keeping it safe?

Websites that are scraped, documents and/or text that are uploaded to train an AI agent are securely stored within a vector database hosted in the United States.

You can see full details of our Privacy Policy here:

What is the maximum number of data sources that can be added?

There is no specific limit on the number of data sources you can add. However, the total amount of data is constrained by your plan’s character limit. We recommend checking your current plan details for precise limitations.

Citations/ Sources

You have the option to enable/ disable the ability for users to see which sources from the knowledge base were used by the A.I. to find the answer.

Example: When you ask a question, the A.I. Agent will respond with an answer and then also provide the links to the user from where it got that information under the dropdown "Answer Sources".

How to Enable / Disable

Go into the Update tab on any chatbot and navigate to the Advanced Settings.

Scroll to the bottom and look for the checkbox next to "Show Data Sources"

This will show the data sources of the chatbot's response in the chat widget.

PreviousAI Agent Dashboard NextAI Agent Settings

Last updated 8 months ago

Was this helpful?