Overall Architecture
- Docker Compose orchestrates the gateway, frontend, main backend, async worker, and data infrastructure; see the service inventory and dependencies in
docker-compose.yaml:1. - Nginx serves as the unified entry point, routing
/api/*traffic to the main backend, file service, RSS service, etc.; configuration is innginx/nginx.conf:1. - The FastAPI service exposes REST and WebSocket endpoints and mounts MCP sub-apps; startup lifecycle is defined in
api/main.py:1. - The business flow spans “document ingestion → parsing → vectorization → knowledge graph → notification” coordinated across Celery, Milvus, Neo4j, and other components.
Client & Gateway
- The Next.js 19 frontend hosts the entire UI, with dependencies and scripts in
web/package.json:1; it leverages the App Router, Tailwind, and multilingual support. - Frontend APIs read REST/WS prefixes from
.env, mapped inweb/src/config/api.ts:1. - The notification WebSocket authenticates via access_token and then streams events; backend routing is in
api/router/notification.py:1. - Nginx proxies static assets and frontend routes to the
webcontainer; upstream definitions live innginx/nginx.conf:17.
Core Backend Services
- FastAPI registers user, document, AI, graph, file, and other routes in
api/main.py:1, initializes Redis plus the scheduler, and mounts MCP HTTP endpoints. - Business routers operate through
crud/modelsagainst PostgreSQL; document-centric REST APIs reside inapi/router/document.py:1. - Graph queries connect directly to Neo4j, enforcing per-user isolation and returning node/edge data; see
api/router/graph.py:1. - MCP server management handles STD and HTTP tooling, with logic in
api/router/mcp.py:1alongsideapi/mcp_router/common.py:1andapi/mcp_router/document.py:1. - Remote file system abstractions are defined in
api/protocol/remote_file_service.py:1, supporting MinIO, OSS, S3, and more.
Tasks & Intelligent Processing
- Celery worker setup and entrypoint live in
celery-worker/common/celery/app.py:1, using Redis as broker/backend. - Document parsing invokes the configured engine to generate Markdown and persist it to remote storage; core flow is in
celery-worker/common/celery/app.py:40. - Chunking, vectorization, and entity-relation extraction are encapsulated in
celery-worker/data/common.py:1, supporting streaming chunking and LLM calls. - On the FastAPI side, APScheduler handles RSS ingestion and notification triggers; initialization and cron jobs are in
api/common/apscheduler/app.py:1.
Data & External Dependencies
- PostgreSQL is the primary relational store; SQLAlchemy engine/session init occurs in
api/common/sql.py:1. - Redis provides caching and acts as the Celery broker; async connection setup is in
api/common/redis.py:1. - Milvus stores document embeddings; schema/index creation is in
api/data/milvus/create.py:1, and vector writes are handled inapi/data/milvus/insert.py:1. - Neo4j persists entities and relations; upsert routines are in
api/data/neo4j/insert.py:1. - MinIO acts as the default file service, declared in Compose at
docker-compose.yaml:55and exposed through the remote file protocol layer. - Environment variables cover model access, OAuth, SMS, and other secrets; FastAPI loads them at startup via
api/main.py:23.
Supporting Services & Integrations
- The Daily Hot aggregation service runs on Hono; application entry is
daily-hot/src/app.tsx:1, with dynamic routing setup indaily-hot/src/registry.ts:1. - RSSHub and Browserless containers support RSS aggregation and web rendering; Nginx routes
/api/rss-serviceto them (nginx/nginx.conf:35). - Celery workers share
common/crudmodules and mount./volumes/tempfor temporary processing (docker-compose.yaml:78). - MCP tools expose time queries and document search through
api/mcp_router/common.py:1andapi/mcp_router/document.py:1, available to the built-in agent.
Key Data Flow Example
- When a user uploads a file or submits a URL, the FastAPI document endpoint writes metadata and schedules a transform task (
api/router/document.py:120). - The Celery worker consumes the task, invokes the selected parsing engine, stores Markdown/cover via the remote file system, and updates the database (
celery-worker/common/celery/app.py:90). - Markdown content proceeds through vectorization and entity extraction (
celery-worker/data/common.py:40), then persists to Milvus/Neo4j and updates task status. - The frontend refreshes via React Query polling or WebSocket pushes; the notification pipeline is defined at
api/router/notification.py:38.
Last updated on