XC Chatbot

A premium white-label AI chat widget for WordPress, with multi-provider AI, a local website-only knowledge base, secure attachments, optional image and PDF analysis, and a domain-pack system to specialise the bot per industry. This document covers configuration, behaviour, the REST surface, and operational concerns.

Introduction ¶

XC Chatbot adds a floating chat widget to any WordPress front-end. Conversations are routed through either Anthropic Claude or OpenAI GPT, with first-class streaming, server-side enforcement of website-only answers, and a private attachment pipeline. The plugin is designed to be operable without code changes — every behaviour described below is reachable from the admin UI.

This documentation is split into five parts:

Getting started — install, requirements, first run.
Configuration — every admin page, field by field.
Behaviour — what happens when a message is sent, how the bot decides what to say, and where the limits sit.
Reference — REST API, AJAX endpoints, hooks, schemas, options.
Operations — troubleshooting, performance, uninstall.

Requirements ¶

Component	Minimum	Notes
WordPress	6.0	Block editor not required.
PHP	8.0	Uses typed properties and `match`-style narrowing in places.
MySQL / MariaDB	5.7 / 10.2	InnoDB FULLTEXT is preferred for KB. The plugin falls back gracefully if FULLTEXT is unavailable.
OpenSSL	recommended	Used for AES-256-CBC API-key storage. A safe base64 fallback is used if absent.
cURL	required	Streaming relies on cURL; `wp_remote_*` functions delegate to it.
`pdftotext`	optional	Used to extract PDF text for Attachment AI.
`pdftoppm` / Imagick	optional	Used to render the first PDF page for vision models.

Hosting note

The streaming endpoint produces a long-lived text/event-stream response. Hosts that buffer responses (some Nginx, some Cloudflare proxy modes, mod_pagespeed) may delay or break streaming. The plugin sets X-Accel-Buffering: no and tries flush() aggressively, but the upstream proxy must permit it.

Installation ¶

Upload the plugin folder xc-chatbot/ to wp-content/plugins/ (or upload the .zip via Plugins → Add New → Upload).
Activate the plugin from the Plugins screen. On activation, two tables are created and the initial KB reindex is scheduled for ~60 seconds later.
Open AI Chatbot in the admin sidebar. The configuration is split across eight sub-pages.

What happens on activation

A non-destructive merge of default settings is written to wp_options under xc_chatbot_settings. Existing values are preserved on re-activation.
Two tables are created via dbDelta:
- {prefix}xc_chatbot_kb_docs — local knowledge base
- {prefix}xc_chatbot_chat_files — attachment registry
Three cron events are scheduled: an hourly attachment cleanup, a one-shot initial KB reindex, and a daily/weekly KB reindex (configurable).
A private upload directory is prepared on first upload at wp-content/uploads/xc-chatbot-private/, with a deny-all .htaccess and an empty index.php.

What happens on deactivation

All scheduled cron events registered by the plugin are cleared. Tables, options, and the private upload directory are not removed — see Uninstall for full removal.

First run ¶

The minimum to reach a working chat:

Open AI Chatbot → AI & API.
Choose a provider (Anthropic recommended; OpenAI required for image / PDF analysis).
Paste an API key. It is encrypted at rest before the option is written.
Click 🔌 Test. A 200 from the upstream means the key is valid; HTTP code mappings are surfaced as short hints — the plugin never echoes upstream error.type codes.
Save the page. Open the front-end — the chat widget appears in the configured corner.

By default the bot is restricted to your website content (Knowledge Base → Restrict to indexed website content). On a fresh install the KB is empty until the initial reindex runs, so the bot will politely apologise and direct visitors to the contact buttons. To start answering immediately, click 🔄 Reindex on the Knowledge Base page or disable site-only mode while content is being indexed.

Admin overview ¶

The plugin adds a top-level menu item with eight sub-pages. Settings are written through page-scoped POST handlers; nonces are scoped per-page (xc_chatbot_save_{page}).

Page	Slug	Purpose
General	`xc-chatbot`	Identity, language, behaviour, white-label.
AI & API	`xc-chatbot-ai`	Provider, model, key, prompts, rules.
Knowledge Base	`xc-chatbot-kb`	Search, indexing, citations, schedule.
Attachments	`xc-chatbot-attachments`	Uploads, AI analysis toggles, limits.
Contact Bar	`xc-chatbot-contact`	Departments, phone, page URL, smart CTA.
Design	`xc-chatbot-design`	Primary / accent colour pickers.
Domain Packs	`xc-chatbot-domain-packs`	Import / export / activate verticals.
Diagnostics	`xc-chatbot-diagnostics`	Read-only environment audit.

General ¶

Chatbot Identity

`bot_name`	Display name shown in the chat header and in default greetings. Default: `Assistant`.
`welcome_msg`	First assistant bubble shown when the chat opens. Plain text; emoji allowed.
`placeholder`	Placeholder text inside the input box.

Behaviour

`answer_language_mode`	`message` (auto-detect from each user message), `browser`, or `force`.
`answer_language_force`	ISO code used when mode is `force`. Supported: `en`, `ro`, `fr`, `nl`, `de`, `es`, `it`, `pt`.
`position`	`bottom-right` (default), `bottom-left`, `top-right`, `top-left`.
`typing_speed`	Milliseconds per character for the typewriter effect on streamed replies. Range 5–80.
`sound_enabled`	Play a soft notification chime when an assistant reply arrives.
`download_transcript_enabled`	Show the transcript-download button in the chat header (`.txt`).
`download_export_zip_enabled`	Show the ZIP-export button (logged-in users only). Server-enforced.

White Label

`powered_by_text`	Optional footer text inside the widget (e.g. Powered by AcmeCorp). Empty hides it.
`powered_by_url`	If set, the footer text becomes a link.

AI & API ¶

Provider & key

The plugin supports two providers, selected by api_provider:

Anthropic — uses POST https://api.anthropic.com/v1/messages. Streaming uses stream: true with the Server-Sent Events response that Anthropic emits natively.
OpenAI — uses POST https://api.openai.com/v1/chat/completions with split text / vision model routing (see below).

The API key is encrypted with AES-256-CBC before being stored in wp_options. The cipher key is derived from a SHA-256 of LOGGED_IN_SALT, NONCE_SALT and AUTH_SALT. Stored values are prefixed:

Prefix	Meaning
`enc:`	AES-256-CBC ciphertext (base64 of `IV ‖ ciphertext`).
`b64:`	Fallback if OpenSSL is unavailable. Obfuscation only — not real encryption.
(no prefix)	Plaintext from a legacy import. The plugin reads it transparently.

The admin field shows a masked value (sk-a••••••••••••••••••••DEFG). Submitting a form whose API-key field still contains • characters is treated as "no change" and the existing encrypted value is preserved. To rotate a key, clear the field and paste the new value.

Anthropic models

Identifier	Notes
`claude-haiku-4-5-20251001`	Default. Fast, low cost.
`claude-sonnet-4-6`	Balanced.
`claude-opus-4-6`	Most capable, highest cost.

OpenAI models — split routing

OpenAI is configured with two model slots:

Text model (openai_model_text) — used for plain text conversations.
Vision model (openai_model_vision) — used only when the request includes image content parts. The plugin inspects the user content and switches automatically.

Routing is decided by XC_Chatbot_Chat_Handler::stream_response() just before the upstream call: if any image_url content part exists or a PDF was rendered to JPEG, the vision slot is used. Otherwise, the text slot is used. This keeps cost low for plain text while still permitting vision for the same conversation.

Available identifiers
`gpt-4o-mini`, `gpt-4o`, `gpt-4.1-mini`, `gpt-4.1`, `gpt-5-nano`, `gpt-5-mini`, `gpt-5`

GPT-5 family

The chat handler omits unsupported parameters (e.g. temperature) for GPT-5 models — these are recognised by name and routed through a compatibility-safe payload builder.

System prompts

`system_prompt_default`	The base system prompt for general questions.
`brand_policy`	Appended to every prompt (except translation). Enforces wording like our website, our company, and directs the user toward the contact options inside the chat.
`use_advanced_rules`	If `1`, the prompt rules table is consulted to pick a specialised prompt by keyword. If `0`, only the default + a hard-coded translation/technical pair (filled from legacy keys) is used.

Prompt rules

Each rule has a name, a type (default, translation, or technical), a comma-separated keywords list, and a prompt. Rules are evaluated top-to-bottom; first keyword match wins. Domain pack rules merge in front of admin rules — pack rules are evaluated first.

Maximum 25 rules are saved (further rows are silently dropped).

Knowledge Base ¶

The KB indexes selected post types into a local FULLTEXT-enabled table and uses it for retrieval-augmented generation. Two policies coexist:

Site-only mode (kb_site_only=1, default) — replies must be derived from indexed pages or attached files. If neither exists, the bot returns the configured apology in the user's detected language.
Open mode (kb_site_only=0) — the model may answer from its training knowledge.

Search & Answers

`kb_answer_mode`	`best_effort` (default): use whatever context exists, answer politely if it is loosely related. `strict`: refuse if the best match score is below `kb_min_score`.
`kb_min_score`	Floating-point threshold compared against the FULLTEXT score. Default `0.03`.
`kb_retrieve_limit`	How many documents to feed into context. Range 3–8, default 6.
`kb_require_citations`	If `1`, the system prompt instructs the model to cite sources as `[1]`, `[2]`.
`kb_citations_mode`	`auto` appends a basic citation per paragraph if missing. `strict` rejects answers without citations. `off` disables enforcement.
`kb_policy_apology`	Default refusal text. If left at the English default, the plugin auto-localizes to `en/ro/fr/nl/de` based on the detected reply language.

Indexing

`kb_auto_sync`	Re-index a single post on `save_post`; remove from KB on `delete_post`. Default `1`.
`kb_reindex_schedule`	`daily` (default), `weekly`, or `never`. Controls full reindex cadence.
`kb_max_items`	Hard cap on the number of indexed documents. Range 50–20000, default 2000.
`kb_max_chars_per_doc`	Truncation per document. Range 2000–60000, default 14000.
`kb_batch_size`	How many posts to index per cron tick. Range 20–200, default 100.
`kb_batch_sizes`	Per-post-type override, e.g. `{"product": 30}`. Useful when WooCommerce products are heavy.
`kb_post_types`	Array of post types to index. Defaults to `page`, `post`, and `product` if WooCommerce is present.
`kb_include_acf`	Pull values from registered ACF fields (filtered to skip sensitive keys).
`kb_include_custom_fields`	Pull values from regular `postmeta`, capped at `kb_max_meta_fields` (80) and `kb_max_meta_chars` (6000).
`kb_index_allow_shortcodes`	Off by default. Enabling it executes shortcodes during indexing. Useful when content is shortcode-driven, but can cause performance / side-effect surprises.

Internal-only output policy

`kb_allow_internal_links`	Allow `[label](url)` links in answers, but only if the URL host matches `home_url()`.
`kb_allow_internal_images`	Allow `![alt](url)` images, with the same same-origin restriction.

External URLs are stripped post-generation by enforce_website_only_output(). The model's reasoning may be open, but the rendered answer cannot leak outside the indexed site.

Action buttons

🔄 Reindex — clears the running job, schedules a new one, and runs the first batch synchronously so the status panel updates immediately.
⚡ One Batch — runs exactly one batch. Useful when WP-Cron is disabled.
🗑️ Clear Index — truncates the KB table and resets state. Confirmation prompt via data-xc-confirm.

Attachments ¶

Attachments are stored privately, outside the WordPress media library, in wp-content/uploads/xc-chatbot-private/YYYY/MM/. Files are served only through the REST download endpoint, which checks both a nonce and per-actor ownership.

File Upload

`attachments_enabled`	Master toggle for the paperclip button.
`attachments_allow_guests`	If `0`, only logged-in users can upload. Guests are tracked via the `xc_chatbot_sid` HttpOnly cookie.
`attachments_max_files`	Per message. Range 1–10, default 3.
`attachments_max_mb`	Per file. Range 1–50, default 10.
`attachments_retention_days`	Files older than this are deleted hourly. Range 1–60, default 7.
`attachments_allowed_exts`	Comma-separated extension whitelist. Defaults: `jpg,jpeg,png,webp,gif,pdf,doc,docx,xls,xlsx,ppt,pptx,txt`. The plugin still verifies the actual MIME type via `wp_check_filetype_and_ext()`; the extension list is a UI / accept-attribute helper, not the security boundary.

AI Analysis (OpenAI only)

`attachments_ai_enabled`	Master toggle. Off by default.
`attachments_ai_allow_users`	Allow logged-in users.
`attachments_ai_allow_guests`	Allow guests. Off by default.
`attachments_ai_max_images`	Per message. Range 0–4, default 2.
`attachments_ai_pdf_pages`	How many leading pages to read with `pdftotext`. Range 1–10, default 3.
`attachments_ai_max_chars`	Cap on extracted-text length per message. Range 1000–20000, default 6000.

See Attachment AI for the full extraction pipeline.

Contact Bar ¶

The contact bar is an in-chat row of buttons (Email, Call, named departments, Contact page) that gives visitors a non-AI escape hatch.

Display

`contact_bar_enabled`	Hide the bar entirely.
`contact_bar_mode`	`smart` (default — only shown when the user message looks contact-related), `always`, or `never`.
`contact_keywords`	Comma-separated triggers for smart mode. Pre-populated with multilingual variants (EN/RO/FR/NL).

Contact details

`contact_email_main`	Primary Email us button.
`contact_email_administration`	Renders as 🏢 Administration.
`contact_email_repair`	Renders as 🛠 Repair service.
`contact_email_management`	Renders as 👔 Management.
`contact_phone`	`tel:` link.
`contact_page_url`	Link to a Contact page on the same site.
`contact_email_subject`	Subject pre-filled in `mailto:` links. The body is auto-populated client-side with the last user question + page URL.

Reply CTA

An optional one-sentence hint that the bot appends to its replies, telling the user the contact buttons are right below.

`contact_cta_enabled`	Master toggle.
`contact_cta_language`	`auto` (match reply language), `en` (force English), `off`.
`contact_cta_smart`	Only append when the user's question or the bot's reply looks contact-related.

Localized strings are built into the chat handler for en, ro, fr, nl, de; other languages fall back to English.

Design ¶

The widget exposes two CSS variables — --xc-chatbot-primary and --xc-chatbot-accent — that drive the gradient on the trigger button, the header, the user bubbles, and the antenna of the avatar SVG.

`primary_color`	Hex (e.g. `#0A5C9E`). Empty preserves the bundled default.
`accent_color`	Hex (e.g. `#FF6B00`). Empty preserves the bundled default.

Both values are validated server-side with sanitize_hex_color() before being persisted.

Domain Packs ¶

A domain pack is a JSON document that overrides system prompt, brand policy, prompt rules, quick replies, contact keywords, and the KB apology in one operation. The pack data is written to wp_options under xc_chatbot_domain_packs (keyed by pack_id); the active pack is recorded in xc_chatbot_settings.domain_pack_active.

Bundled samples

Five samples ship in domain-packs/:

industrial — error codes, fault diagnosis, maintenance, spare parts.
medical — appointment-style triage, treatment Q&A. Always recommends consulting a clinician.
pedagogic — tutoring tone, scaffolded explanations.
literary — bookshop / publisher tone, recommendations and series.
sports — league / club tone, tickets and fixtures.

Click a bundled sample to import it. Importing a sample whose pack_id already exists silently overwrites it.

How merging works

Base values are read from xc_chatbot_settings.
Pack values override base values when present:
- system_prompt_default — replaces base.
- brand_policy — replaces base.
- prompt_rules — pack rules go first, then base rules. Capped at 30 rules total.
- quick_replies — replaces base if non-empty.
- contact_keywords_append — appended to existing keywords.
- kb_policy_apology — replaces base if non-empty.

The merged result is computed at runtime in XC_Chatbot_Domain_Packs::get_effective_config() and is read by the chat handler before each request — no save step is required after activating a pack.

Import / export

Upload & Import — accepts a single .json file. Validates schema, sanitizes every field, caps rules to 30 and quick replies to 8.
📤 Export — serves the active pack JSON as a download (domain-pack-{pack_id}.json), excluding the imported_gmt internal field.
🗑️ Remove — deletes a pack (with confirmation). If it was active, deactivates it.

For the JSON shape, see Domain pack schema.

Diagnostics ¶

A read-only environment audit. Useful before opening a support ticket. Reports:

WordPress and PHP versions.
HTTPS status (is_ssl()).
Whether openssl_encrypt is available (otherwise the plugin uses base64 obfuscation).
Whether the stored API key is encrypted, plaintext, or unset.
Private upload directory path, existence, and writability.
GD and Imagick availability.
upload_max_filesize and post_max_size from the active PHP config.

Prompt routing ¶

For every user message, the chat handler decides which system prompt to use:

Read effective config (base + active domain pack).
If use_advanced_rules=1, evaluate prompt_rules top-to-bottom. The first rule whose comma-separated keywords contain a substring of the user message wins. Its type drives downstream behaviour (translation bypasses the brand policy and the website-only KB injection).
If no rule matches, fall back to a hard-coded translation/technical pair derived from system_prompt_translation + prompt_keywords_translation and system_prompt_technical + prompt_keywords_technical.
If still no match, use system_prompt_default.
Append brand_policy unless the intent is translation.
Append a LANGUAGE: … instruction (see Language detection).
If site-only mode is on, append the website-only policy and the retrieved KB context block.
If attached files are present and Attachment AI is allowed for this actor, append the attachment policy block.

Language detection ¶

Three modes (answer_language_mode):

`message`	Heuristic detection on each message. Strong signals first (Romanian diacritics → `ro`; French accents → `fr` bonus), then a token-overlap score against built-in EN / RO / FR / NL word sets. Falls back to the browser language if the score is too low.
`browser`	Use the `lang` sent by the front-end (derived from `navigator.language`).
`force`	Always use `answer_language_force`, regardless of the message.

The detected language is used in three places:

The LANGUAGE: instruction appended to the system prompt.
Localization of the rate-limit and stream-busy messages.
Localization of the KB apology when the admin left the apology at the English default.

KB retrieval ¶

Retrieval runs in three tiers, each falling back to the next on no results:

FULLTEXT NATURAL LANGUAGE MODE — primary. Score is the MySQL relevance score.
FULLTEXT BOOLEAN MODE with simple token expansion (+token*). Stop-words are removed and tokens shorter than 3 characters are dropped.
LIKE on title and content with esc_like(). Always available; assigns score 1.

Before retrieval, if the current page URL maps to a post (url_to_postid), that document is included with a synthetic high score so the bot is biased toward the page the user is viewing. Results are de-duplicated by URL and capped at kb_retrieve_limit.

Context block format

Documents are formatted into a single textual block with numeric markers used by the citation system:

[1] Title — https://example.com/page-a
Excerpt body…

[2] Title — https://example.com/page-b
Excerpt body…

The model is asked to cite as [1], [2]… If kb_citations_mode=auto and the model forgets, the response post-processor inserts a basic citation per paragraph.

Streaming & rate limits ¶

The streaming endpoint emits Server-Sent Events. A typical session looks like:

: xc-chatbot stream start

data: {"token":"Hello"}

data: {"token":" — how"}

data: {"token":" can I help?"}

data: [DONE]

Errors are emitted as a single data: {"error":"…"} followed by [DONE].

Rate limits (per actor)

Best-effort, transient-backed limits enforced inside enforce_stream_rate_limits(). The fingerprint is md5(IP|sid|user_id); IP is sanitized through a strict character filter before use to defeat header spoofing.

Limit	Guest	Logged-in
Requests / 10 min	30	60
Concurrent stream lock	90 s — only one stream per actor at a time.

Rate-limit and stream-busy messages are localized to en/ro/fr/nl; other languages fall back to English. The lock is released by a register_shutdown_function() safety net even if the script is interrupted mid-stream.

Server-side caps

`MAX_MESSAGE_CHARS`	3,000
`MAX_HISTORY_ITEMS`	12
`MAX_HISTORY_ITEM_CHARS`	2,000
`MAX_HISTORY_TOTAL_CHARS`	12,000

Excess content is silently truncated. Roles in client-supplied history are whitelisted to user or assistant — no role injection is possible.

Attachment AI ¶

Optional. Off by default. Activates only when:

attachments_ai_enabled=1.
The current actor is allowed (attachments_ai_allow_users for logged-in, attachments_ai_allow_guests for guests).
The selected provider is OpenAI.
The intent is not translation.

Image pipeline

The file MIME is verified to start with image/.
If the file is ≤ ~4 MB, original bytes are base64-encoded into a data: URL and passed as an image_url content part.
If larger, the plugin attempts a GD downscale to JPEG. If GD is unavailable, the image is dropped and a note is emitted to the model.
Up to attachments_ai_max_images images per message; the rest are skipped with a note.

PDF pipeline

Text path: if pdftotext is on PATH (or at /usr/bin/pdftotext, /usr/local/bin/pdftotext, /bin/pdftotext), it is invoked with -f 1 -l N -layout to extract the first N = attachments_ai_pdf_pages pages. Output is normalized and length-capped at attachments_ai_max_chars.
Vision fallback: if no text is recoverable, the first page is rendered to JPEG using pdftoppm first (-f 1 -l 1 -singlefile -jpeg -r 150). If pdftoppm is missing, Imagick is tried (readImage($pdf.'[0]') at 150 DPI). The JPEG is then handed to the vision model just like any other image.
If neither path produces content, a note like Could not extract text or render PDF: invoice.pdf is added to the system context — the model is told what was attached but warned not to hallucinate the content.

All shell invocations use escapeshellarg() on every argument. The chosen binary is selected from a fixed candidate list, never from user input.

Fallback / demo mode ¶

If no API key is configured, the chat handler enters a deterministic demo mode that simulates streaming over a small, hard-coded response table. This is intentional — it gives an admin who installs the plugin without an API key something tangible to look at, and it gracefully degrades instead of failing visibly to visitors.

If the website-only policy is on and the KB returns nothing, the bot returns the apology even in demo mode — there is no risk of fabricated answers leaking when the key is missing.

REST API ¶

All endpoints are under the xc-chatbot/v1 namespace. Authentication is performed inside each callback by verifying a per-action nonce; this allows guest sessions to authenticate via the xc_chatbot_sid HttpOnly cookie without losing the protection of wp_verify_nonce().

GET/wp-json/xc-chatbot/v1/nonce

Returns a fresh nonce for the xc_chatbot_nonce action. Used by the front-end to recover from cached pages whose embedded nonce has expired. nocache headers are sent.

Response: { "nonce": "abc123" }

POST/wp-json/xc-chatbot/v1/upload

Multipart form upload of a single chat attachment. Validates extension, size, and real MIME via wp_check_filetype_and_ext(). Files are stored under the private upload tree.

Body (multipart):

`file` required	The file blob.
`nonce` required	`xc_chatbot_nonce` nonce.

Limits: 30 uploads / hour per (sid + IP).

Response: { key, name, mime, size, is_image }

GET/wp-json/xc-chatbot/v1/file/<key>

Streams an attachment to its uploader. The key must match the regex ^[a-f0-9]{16,64}$. Inline disposition for images, attachment for everything else (or ?dl=1 to force download). Sends X-Content-Type-Options: nosniff.

Auth: nonce + actor ownership check (logged-in user OR matching xc_chatbot_sid).

POST/wp-json/xc-chatbot/v1/file/<key>/delete

Removes the file from disk and the registry row. Path is realpath-pinned to the private upload tree before unlink().

Limits: 60 deletes / hour per (sid + IP).

POST/wp-json/xc-chatbot/v1/export-zip

Logged-in users only. Streams a ZIP containing transcript.txt and any owned attachments referenced by file key. The download_export_zip_enabled setting is enforced server-side.

Body: nonce, transcript (string, capped at 400 KB), attachments (JSON array of file keys, max 20).

Total file cap: 200 MB. Files past the cap are skipped.

POST/wp-json/xc-chatbot/v1/stream

The conversational endpoint. Returns a text/event-stream response. See Streaming & rate limits for the wire format.

Body (form-encoded):

`nonce` required	`xc_chatbot_nonce`
`message` optional*	Up to 3000 chars. *Required unless `attachments` is non-empty.
`history`	JSON array of `{role,content}`. Capped to 12 items, 2000 chars each, 12 000 total.
`attachments`	JSON array of file keys.
`page_url`	Current page URL — biases KB retrieval.
`lang`	Browser language hint.

AJAX endpoints ¶

Two legacy admin-ajax.php actions are registered:

Action	Auth	Purpose
`xc_chatbot_send_message`	nonce, public	Legacy non-streaming fallback. Returns a deterministic demo reply. Used only when the streaming endpoint cannot be reached.
`xc_chatbot_test_connection`	nonce, `manage_options`	Admin connectivity test against the configured provider. Capability is checked before nonce verification. Upstream `error.type` values are never echoed; only sanitized hints derived from the HTTP status code.

Hooks & cron ¶

Cron events

Hook	Schedule	Purpose
`xc_chatbot_attachments_cleanup`	hourly	Deletes attachments older than `attachments_retention_days`.
`xc_chatbot_kb_initial_reindex`	one-shot, 60 s after activation	Bootstraps the KB.
`xc_chatbot_kb_scheduled_reindex`	`daily` / `weekly` / never	Full reindex per schedule.
`xc_chatbot_kb_reindex_batch`	one-shot, chained	Processes one batch and re-schedules itself if more work remains.

WordPress action / filter integration

`save_post` / `delete_post`	Auto-syncs a single document into / out of the KB if `kb_auto_sync=1`.
`cron_schedules`	Adds a custom `weekly` recurrence (`7 × DAY_IN_SECONDS`) if WordPress did not already register one.
`wp_footer`	Renders the chat widget HTML.
`wp_enqueue_scripts` / `admin_enqueue_scripts`	Enqueues `xc-chatbot.css/js` on the front-end and `xc-chatbot-admin.css/js` on plugin admin pages.
`init` (priority 1)	Sets the `xc_chatbot_sid` cookie for guests on the front-end.
`plugins_loaded`	Top-level bootstrap entrypoint (`xc_chatbot_init()`).

Database tables ¶

{prefix}xc_chatbot_kb_docs

CREATE TABLE {prefix}xc_chatbot_kb_docs (
  id             BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
  post_id        BIGINT(20) UNSIGNED NOT NULL,
  post_type      VARCHAR(40)  NOT NULL DEFAULT '',
  title          TEXT         NOT NULL,
  url            TEXT         NOT NULL,
  excerpt        TEXT         NOT NULL,
  content        LONGTEXT     NOT NULL,
  image_url      TEXT         NULL,
  images_json    LONGTEXT     NULL,
  run_id         BIGINT(20) UNSIGNED NOT NULL DEFAULT 0,
  modified_gmt   DATETIME     NULL,
  indexed_gmt    DATETIME     NOT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY post_id (post_id),
  KEY run_id (run_id),
  FULLTEXT KEY ft_title_content (title, content)
);

run_id is a monotonically-increasing identifier set at the start of each full reindex, used by the post-reindex sweep to drop documents that were not touched by the current run.

{prefix}xc_chatbot_chat_files

CREATE TABLE {prefix}xc_chatbot_chat_files (
  id           BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  file_key     VARCHAR(64)   NOT NULL,
  sid          VARCHAR(64)   NOT NULL,
  user_id      BIGINT UNSIGNED NOT NULL DEFAULT 0,
  orig_name    TEXT          NOT NULL,
  stored_path  TEXT          NOT NULL,
  mime         VARCHAR(190)  NOT NULL,
  size         BIGINT UNSIGNED NOT NULL DEFAULT 0,
  created_gmt  DATETIME      NOT NULL,
  used         TINYINT(1)    NOT NULL DEFAULT 0,
  PRIMARY KEY (id),
  UNIQUE KEY file_key (file_key),
  KEY sid (sid),
  KEY user_id (user_id),
  KEY created_gmt (created_gmt)
);

file_key is a 24-character hex value generated from random_bytes(12). It is the only identifier exposed in URLs; the actual filesystem path is never user-visible.

Options reference ¶

All settings live in a single serialized array at wp_options.xc_chatbot_settings. The plugin performs a non-destructive merge of defaults on every read, so newly added keys appear automatically after upgrades without overwriting existing values.

Option name	Stores
`xc_chatbot_settings`	Main configuration array.
`xc_chatbot_domain_packs`	All imported domain packs, keyed by `pack_id`.
`xc_chatbot_kb_state`	KB summary: `last_indexed_gmt`, `docs_count`, `last_error`.
`xc_chatbot_kb_reindex_job`	Active reindex job: `in_progress`, `run_id`, `processed`, `current_post_type`, `last_id`, `max_items`, `batch_size`.

None of these options are autoloaded except the main settings array. Encrypted API keys live inside xc_chatbot_settings with the enc: prefix.

Domain pack schema ¶

{
  "pack_id": "industrial",                   // required, [a-z0-9_-]{1,64}
  "pack_name": "Industrial & Technical",    // optional, defaults to pack_id
  "version": "1.0",
  "description": "Specialised for…",

  "system_prompt_default": "You are…",
  "brand_policy":          "You are the official…",

  "prompt_rules": [
    {
      "name":     "Error Code Diagnosis",
      "type":     "technical",            // default | translation | technical
      "keywords": "error, fault, code",
      "prompt":   "You are a diagnostic assistant…"
    }
  ],                                              // max 30 rules

  "quick_replies": [
    { "icon": "🔧", "label": "Report error", "msg": "…" }
  ],                                              // max 8 items

  "contact_keywords_append": "technician, …",
  "kb_policy_apology":        "Sorry — I can only…"
}

Validation is strict: every string field is sanitized with sanitize_text_field() or sanitize_textarea_field(); type is whitelisted to default, translation, technical; rules with empty prompt are dropped; quick replies with empty label or msg are dropped.

Privacy & storage ¶

Conversations

By default the plugin does not log conversation content. Messages are streamed through the chat handler to the configured AI provider and the response is sent back to the visitor — no per-message database row is written. Aggregate counters (xc_chatbot_total_messages) are not enabled in v1.0.0.

Visitors may invoke the transcript-download or ZIP-export buttons themselves; in both cases the transcript is generated client-side from the in-memory chat log and never persisted on the server.

Cookies

Cookie	Purpose	Lifetime
`xc_chatbot_sid`	Anonymous session identifier used to bind attachment uploads to the visitor that uploaded them. HttpOnly, `SameSite=Lax`, `Secure` on HTTPS.	30 days.

The cookie value is 32 hex characters from random_bytes(16). It contains no user-identifying information.

Attachments

Stored in wp-content/uploads/xc-chatbot-private/YYYY/MM/.
The directory contains a deny-all .htaccess and an empty index.php.
File names on disk are {file_key}.{ext} — original names are preserved only inside the database for display.
Auto-deleted after attachments_retention_days by the hourly cleanup cron.
Download is gated by nonce + ownership check.

API key

Encrypted at rest (AES-256-CBC), key material derived from WordPress salts. The plaintext is held in memory only for the duration of an upstream API call. The admin UI displays a masked value and never sends the encrypted blob back to the browser.

File layout ¶

xc-chatbot/
├── xc-chatbot.php                                // main plugin file, bootstrap
├── readme.txt
├── MANIFEST.md
├── admin/
│   └── class-xc-chatbot-admin.php                // admin UI, settings save/dispatch
├── assets/
│   ├── css/xc-chatbot.css                        // front-end widget styles
│   ├── css/xc-chatbot-admin.css                  // admin styles
│   ├── js/xc-chatbot.js                          // front-end widget
│   ├── js/xc-chatbot-admin.js                    // admin JS (delegated handlers)
│   └── images/xc-chatbot-avatar.svg
├── domain-packs/
│   ├── industrial.json
│   ├── medical.json
│   ├── pedagogic.json
│   ├── literary.json
│   └── sports.json
└── includes/
    ├── class-xc-chatbot-settings.php             // options + non-destructive merge
    ├── class-xc-chatbot-crypto.php               // AES-256-CBC for API key
    ├── class-xc-chatbot-domain-packs.php         // pack import/export/merge
    ├── class-xc-chatbot-kb.php                   // KB indexer + retriever
    ├── class-xc-chatbot-attachments.php          // uploads, REST endpoints
    ├── class-xc-chatbot-attachment-ai.php        // image/PDF analysis
    ├── class-xc-chatbot-chat-handler.php         // streaming, prompts, widget render
    └── class-xc-chatbot-assets.php               // enqueue with mtime-busted versions

Troubleshooting ¶

Streaming stops mid-reply / hangs

Almost always a buffering issue upstream of WordPress. Symptoms: chat shows the typing indicator, the request reaches OpenAI / Anthropic, but tokens arrive in one large lump or never arrive. Resolution checklist:

Verify Cloudflare or other CDN is not in "Buffer entire response" mode for the /wp-json/ path.
For Apache + mod_pagespeed, exclude /wp-json/xc-chatbot/v1/stream.
For LiteSpeed cache, add the path to the "Do Not Cache URIs" list.
For Nginx, ensure proxy_buffering off for the path or globally for SSE.

"Invalid security token"

The widget caches the page nonce. If the page was served from a static / page cache, the embedded nonce can be stale. The widget calls GET /wp-json/xc-chatbot/v1/nonce to refresh on first error; a second occurrence usually means caching is too aggressive — exclude logged-in users or exclude the page from the cache.

"Documents: 0" even after reindex

Open Knowledge Base, click ⚡ One Batch manually. The status panel updates after the call returns.
Check the Diagnostics page for table existence.
Look at kb_post_types — by default it is page + post (+ product if WooCommerce is present). Custom post types must be added explicitly.
If WP-Cron is disabled (DISABLE_WP_CRON=true), set up an external cron — see CLI & cron.

PDF analysis returns nothing useful

SSH into the server and run command -v pdftotext pdftoppm. If both are missing, install poppler-utils.
If you can only run Imagick, check the ImageMagick policy file (/etc/ImageMagick-6/policy.xml or similar) for a PDF read restriction.
Some PDFs contain only scanned images and no text layer. The plugin will fall back to vision rendering, but only the first page is rendered.

Attachments fail: "File type could not be verified"

This is wp_check_filetype_and_ext() rejecting a file whose real MIME does not match its extension — for example, a renamed .txt file or a PDF saved with the wrong extension. The check is intentional. Re-save the file with the correct extension or extend attachments_allowed_exts if you legitimately need the format.

Performance tuning ¶

Indexing

`kb_batch_size`	Lower this if reindex causes timeouts on shared hosts (default 100). Try 30 for hosts with strict `max_execution_time`.
`kb_batch_sizes['product']`	WooCommerce products with long descriptions and many meta fields can be heavy — set this to 20–30.
`kb_max_chars_per_doc`	Lower to reduce table size and FULLTEXT memory usage. 8000 is usually enough for product catalogs.
`kb_include_custom_fields`	Disable on sites with very large unrelated meta tables — this is by far the biggest indexing cost.

Inference cost

Anthropic Haiku + `kb_retrieve_limit=4`	Lowest-cost configuration that still produces good website-only answers.
OpenAI split routing	Set `openai_model_text=gpt-4o-mini` for cheap text and `openai_model_vision=gpt-4o` only for the rare image / PDF call.
`MAX_HISTORY_TOTAL_CHARS`	Tighter limits on history mean smaller upstream payloads. The default 12000 is balanced; reduce in `class-xc-chatbot-chat-handler.php` if needed.

Front-end

Asset versioning uses filemtime() on every CSS / JS file the plugin enqueues, so cache busting is automatic on edits. The widget is added in wp_footer with defer-equivalent semantics (script in footer); it does not block paint.

CLI & cron ¶

The plugin does not register WP-CLI commands in v1.0.0, but standard cron events can be triggered manually:

# Run any due plugin cron events
wp cron event run --due-now

# Trigger one KB reindex batch
wp cron event run xc_chatbot_kb_reindex_batch

# Force a full reindex from scratch
wp cron event run xc_chatbot_kb_scheduled_reindex

If DISABLE_WP_CRON is set in wp-config.php, schedule a system cron entry:

# /etc/cron.d/wp-xc-chatbot — every minute
* * * * * www-data curl -s https://example.com/wp-cron.php?doing_wp_cron >/dev/null 2>&1

# or with WP-CLI
* * * * * www-data /usr/local/bin/wp --path=/var/www/example cron event run --due-now

Uninstall ¶

Deactivating the plugin clears scheduled cron events but does not remove data. To remove the plugin completely:

Deactivate from Plugins.
Delete from Plugins (or remove the xc-chatbot/ directory).

Drop the two database tables:

DROP TABLE {prefix}xc_chatbot_kb_docs;
DROP TABLE {prefix}xc_chatbot_chat_files;

Delete the four options:

DELETE FROM {prefix}options
WHERE option_name IN (
  'xc_chatbot_settings',
  'xc_chatbot_domain_packs',
  'xc_chatbot_kb_state',
  'xc_chatbot_kb_reindex_job'
);

Remove the private upload directory:

rm -rf wp-content/uploads/xc-chatbot-private/

Final note

The above destroys all chat attachments and the indexed knowledge base. If you intend to reinstall later and keep KB content, leave the tables and the option array in place — the plugin will pick up exactly where it left off after re-activation.