🧠Guide: Building Your Agent's Brain (The Knowledge Base)

A Step-by-Step Guide to a Powerful Knowledge Base.

We're excited to guide you through the most crucial step in creating your AI Agent: building its knowledge base. Think of this as your agent’s brain. The quality of the information you put in will directly determine how smart, accurate, and helpful your agent will be in your community.

This guide will give you simple, step-by-step instructions and pro-tips to build a truly exceptional knowledge base. Let’s get started.

The Two-Step Process: Gather & Index

Building your agent's brain is a simple, two-part process:

Gather Your Knowledge: Collect all the documents and information you want your agent to know.
Build the Knowledge Base: With a single click, you'll transform that information into a high-speed, intelligent brain for your agent.

Method 1: Using the Web Crawler

If your project information is on a website, blog, or documentation page (like GitBook), our built-in Web Crawler is the fastest way to get that information into your agent's brain. It will visit the web pages you specify and automatically save their content as clean, readable files.

Step-by-Step Guide to Web Crawling:

Navigate to the KNOWLEDGE tab in the AICORA application. (AI Agent>>KNOWLEDGE>>Crawling for Web documents)

In the "Crawling for Web documents" section, paste the URL you want to capture into the input box (e.g., https://your-project-docs.com).
Click the dropdown menu to select your crawl mode. You have two simple options:

Current Page: This option will only grab the content from the single, exact URL you entered. It's perfect for a specific blog post, an announcement, or your project's main landing page.
All Pages (Max 50 Pages): This mode will start at the URL you entered and then follow links to crawl up to 50 pages on the same website. This is great for smaller websites or grabbing a specific section of a larger one.

Click the "Start" button and watch the progress in the log.

Once the crawl is complete, you will find all the new content saved as files in the AICORA/source/your_project_dataset/ folder within your AICORA project directory.

For Example:

The crawling results can be found in this folder:

⭐ Important Strategies for Effective Crawling (Please Read!)

To get the best results, it's important to understand how the crawler works and how to use it strategically.

1. How to Crawl Websites with More Than 50 Pages

Our crawler is designed for efficiency and can capture a maximum of 50 pages in a single task. So, what should you do if your website has 150 pages, for example?

The answer is to crawl your site in sections.

DON'T do this: Do not simply enter your main domain (e.g., https://my-awesome-project.io) and select "All Pages (Max 50 Pages)". This will only capture the first 50 pages it finds, and you won't know which pages were included and which were missed.

Do this instead (The Smart Strategy): Treat your website's sitemap as a table of contents. Identify the main sections and crawl each one as a separate, dedicated task. Most large websites are already organized this way:

https://my-awesome-project.io/features
https://my-awesome-project.io/docs
https://my-awesome-project.io/blog
https://my-awesome-project.io/about-us

Your workflow should be:

Crawl Section 1: Paste https://my-awesome-project.io/features into the URL field and run the "All Pages (Max 50)" crawl. Wait for it to complete.
Crawl Section 2: Paste https://my-awesome-project.io/docs and run the crawl again.
Repeat: Continue this process for each major section of your website.

The "Small Website" Exception: Of course, if your entire website has fewer than 50 pages in total, you can simply crawl the main domain (https://my-awesome-project.io) in a single task.

This approach gives you full control, ensures complete coverage of your site, and is the most effective way to build a comprehensive knowledge base from a large website.

2. The Crawler CANNOT Read Text Inside Images

This is a critical point to remember. The web crawler is excellent at reading text on a page, but it cannot read words that are part of an image file (like a .jpg or .png). It will only save a link to the image, not the information within it.

If your images contain crucial information—such as diagrams, feature comparisons on a chart, or important text in a banner—you need a simple workaround:

Look at the image and summarize its key information in plain text. You can even use another AI tool to help you describe the image's content.
Save this summary as a new .txt file.
Add this text file to your dataset folder using Method 2: Adding Your Own Files (described below).

This ensures that this vital, visual information is included in your agent's brain and isn't lost during the crawl.

Method 2: Adding Local Files Manually

Perhaps your knowledge is stored in documents on your computer. That's perfect! The AICORA indexer is designed to work with a variety of file types.

Supported File Formats:

.md (Markdown)
.txt (Plain Text)
.pdf (Portable Document Format)
.docx (Microsoft Word)

Simply drag and drop or copy these files directly into your dataset folder (AICORA/source/your_project_dataset/).

Or using the "Open Folder" Button (Recommended): This is the easiest method. In the AICORA application, under the KNOWLEDGE tab, you will find an Open Folder button.

Clicking this button will instantly open the correct folder on your computer where your knowledge base files are stored (AICORA/source/your_project_dataset/).

⭐ Important Best Practice: Leverage Timestamps in Filenames!

This is a powerful, advanced feature of AICORA. Your AI agent can be made aware of how recent a piece of information is. You can achieve this by embedding a date directly into the filename.

When you build your knowledge base, our system will automatically look for dates in filenames and add them as metadata. This allows the agent to prioritize or reference information based on its timestamp. For example, it can answer "What was the latest project update?" more accurately.

Supported Date Formats:

ISO Format: 2024-10-25, 2024/10/25
Compact Format: 20241025 (e.g., project_update_20241025.md)
US Format: 10/25/2024, 10/25/24
Chinese Format: 2024年10月25日

What if there's no date? No problem. If a filename doesn't contain a recognizable date, the system will use the file's "last modified" timestamp as a fallback. However, naming files explicitly is the most reliable method.

Part 2: Building the Knowledge Base (Indexing)

Once you've gathered all your information using the Web Crawler or by adding your own files, you’re ready for the final, one-click step. This process will take all of your documents and transform them into an intelligent, searchable "brain" for your AI agent.

What does this mean?

In simple terms, this process takes all your text documents, breaks them into small, logical chunks, and then uses AICORA's powerful AI model to convert each chunk into a numerical representation called an "embedding." These embeddings are stored in a highly-efficient vector index (FAISS).

Think of it like creating a hyper-intelligent index for a library. Instead of just searching for keywords, the AI can search for concepts, meaning, and context. This is what allows your agent to find the most relevant information to answer a complex question, even if the user doesn't use the exact words found in your documents.

How to Build the Index

This is the easiest part. Follow these simple steps to finalize your knowledge base.

Step 1: Confirm Your Files Are Ready

First, ensure that all the information you want your agent to know is located in the your_project_dataset folder.

Click the "Open Folder" button in the AICORA application to quickly access the correct directory.
The correct path is: AICORA/source/your_project_dataset/
Double-check that all your crawled web documents and manually added files (TXT, PDF, DOCX) are present in this folder.

Step 2: Click the "Build" Button

Now, go to the "Build Knowledge Base" section at the bottom of the KNOWLEDGE tab.

You will see a field for "Last updated", which shows you the last time you successfully built the knowledge base. If this is your first time, it will be empty.
Click the purple "Build" button to start the process.

Step 3: Be Patient While the Magic Happens

This is the most important part of the process. The AI is now reading, analyzing, and indexing all the information you provided.

This can take several minutes, especially if you have many large documents.
Do not close the application or perform other actions while it is building.
You can monitor the progress through the status messages in the application's log.

Step 4: Check for Confirmation

You'll know the process is complete when the "Last updated" field shows the current date and time. This confirms that your agent's new brain has been successfully built!

Our system is built to be robust. If it encounters a temporary network issue while generating embeddings, it will automatically retry a few times before stopping.

What Happens When You Click "Build"?

This is an important question. When you click "Build":

The AICORA system reads the content from all the files inside your AICORA/source/your_project_dataset/ folder.
It creates a separate, highly optimized index (the agent's brain) that is stored elsewhere in the project.
Crucially, your original files in the your_project_dataset folder are NOT changed, modified, or deleted. They are simply used as the source material for building the brain. You can always check them, edit them, or use them for other purposes.

How to Update Your Agent's Knowledge

Your project will grow and change, and your agent's knowledge should too. The red text in the UI, "Rebuild knowledge base after documents updates," is a reminder of this simple workflow.

To update your agent's knowledge:

Add, edit, or delete files in your AICORA/source/your_project_dataset/ folder.
Go back to the KNOWLEDGE tab.
Click the "Build" button again.

This will create a completely new, up-to-date brain for your agent based on the latest documents. The old index will be replaced. It's that simple to keep your agent smart and relevant.

Please read these points carefully to ensure a smooth and successful experience.

Garbage In, Garbage Out: The most crucial rule. If your source documents are poorly written, contain typos, or have chaotic formatting, your agent's answers will reflect that. Always strive for clean, well-structured source material.
Updating Your Knowledge Base: The knowledge base is a snapshot in time. If you add, remove, or modify any files in your source/your_project_dataset/ folder, you MUST re-run the indexing process. The agent will not see the changes until you rebuild its brain.
Patience During Indexing: Creating embeddings for thousands of text chunks is computationally intensive. Please be patient and do not close the application or perform other actions while the knowledge base is being built.
Web Crawling Isn't Perfect: Some websites are designed to block crawlers or use complex structures that are difficult to parse.

Part 3: Knowledge Filter Settings

Fine-Tuning Your AI Agent

What is Knowledge Filter Settings

A powerful AI Agent is not only defined by what it knows, but also by when it chooses to speak. To ensure your agent provides the highest quality interactions in your community, we have provided a suite of Knowledge Filter Settings.

These settings act as the agent's "rules of engagement." They empower you to fine-tune its behavior, ensuring it responds only to meaningful queries and only when it has a high degree of confidence in its answer. This prevents spam, reduces irrelevant responses, and builds community trust in your agent's capabilities.

This guide explains each setting, the underlying logic, and our recommended best practices for configuration.

The AI Agent's Decision Process

Before configuring the settings, it is helpful to understand how your agent processes a user's message. Think of it as a sophisticated, multi-stage funnel designed to find the most accurate answer:

The Gatekeeper: The agent first applies a basic filter to the incoming message. Is it long enough to be a real question? Is it from another bot?
Wide Search: If the message passes the initial check, the agent performs a broad search across its entire knowledge base, retrieving a large set of potentially relevant documents (e.g., 30 documents).
Expert Analysis: Our advanced AI model then meticulously analyzes this set of documents, comparing each one to the user's specific question to understand the semantic context and relevance. It then re-ranks all the documents from most to least relevant.
Confidence Check: The agent looks at the top-ranked document and its "Relevance Score." This score is the AI's measure of confidence. Does this score meet the threshold you have set?
The Response: If the confidence threshold is met, the agent uses the most relevant documents to formulate its answer. If not, it will remain silent, correctly choosing not to provide a potentially inaccurate response.

The settings below give you direct control over Step 1 and Step 4 of this process.

Configuration Settings

You can find these options under the AI AGENT -> SETTINGS tab.

1. Message Length

Setting: Minimum User Message Length

Purpose: This setting controls the "Gatekeeper" (Step 1). It defines the minimum number of characters a user's message must contain for the AI Agent to even consider processing it.
Why it's important: This is your first line of defense against spam and low-effort messages. It prevents the agent from wasting resources trying to interpret messages like "hi," "gm," "lol," or single emojis, allowing it to focus on genuine questions.
How to configure:
- A good starting point is a value between 8 and 15.
- Consider your community's communication style. If users often ask short but valid questions, you might choose a lower value. If your community is very active and chatty, a slightly higher value can help filter out noise more effectively.

2. Message Relevance

Setting: Relevance Score
Purpose: This is the most powerful setting for tuning your agent's accuracy. It controls the "Confidence Check" (Step 4) by setting the minimum confidence level the AI must have before it formulates a response.
What is Relevance Score? When our AI analyzes a piece of knowledge against a user's question, it assigns a score from 0.0 (completely irrelevant) to 1.0 (a perfect match). This setting is the threshold that the top-ranked result must meet or exceed.
How to configure (The Accuracy vs. Helpfulness Trade-off):
- Higher Value (e.g., 0.8 - 0.9): Stricter Filtering
  - Pros: The agent will be extremely accurate. It will only answer when it is very confident that it has the correct information. This builds a strong reputation for reliability.
  - Cons: The agent may remain silent on valid questions that are phrased unusually or use different terminology than your source documents, as the score may not meet the high threshold.
- Lower Value (e.g., 0.5 - 0.6): Looser Filtering
  - Pros: The agent will attempt to answer more questions, making it appear more active and helpful. It can find answers even if the user's query is not a perfect match.
  - Cons: There is a higher risk of the agent providing an answer that is irrelevant or not entirely accurate, as it is operating on a lower confidence level.

Best Practices and Recommendations

Start with the Defaults: We recommend beginning with a Minimum User Message Length of 8 and a Relevance Score of 0.85. This provides a healthy balance between responsiveness and accuracy for most communities.
Observe and Iterate: The best configuration is one that is tailored to your community. After launching your agent, observe its behavior for a day or two.
- Is the agent silent too often, even on questions you know are in the knowledge base? Consider slightly lowering the Relevance Score (e.g., from 0.85 to 0.75).
- Is the agent answering questions with irrelevant information? You should increase the Relevance Score (e.g., from 0.85 to 0.9) to make it stricter.
Don't Forget to Save: After making any changes to these settings, you must click the "Save" button to apply them. The changes will take effect immediately for all new messages.

By thoughtfully configuring these filters, you transform your AI Agent from a simple database of information into a discerning, intelligent, and trusted member of your community.

PreviousOverview: The Knowledge Engine NextCreating a Living Knowledge Base

Last updated 18 days ago