Files
ragflow/Dockerfile

228 lines
9.2 KiB
Docker
Raw Normal View History

# base stage
FROM ubuntu:24.04 AS base
USER root
SHELL ["/bin/bash", "-c"]
ARG NEED_MIRROR=0
WORKDIR /ragflow
Feature rtl support (#13118) ### What problem does this PR solve? This PR adds comprehensive **Right-to-Left (RTL) language support**, primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu, etc.). Previously, RTL content had multiple rendering issues: - Incorrect sentence splitting for Arabic punctuation in citation logic - Misaligned text in chat messages and markdown components - Improper positioning of blockquotes and “think” sections - Incorrect table alignment - Citation placement ambiguity in RTL prompts - UI layout inconsistencies when mixing LTR and RTL text This PR introduces backend and frontend improvements to properly detect, render, and style RTL content while preserving existing LTR behavior. #### Backend - Updated sentence boundary regex in `rag/nlp/search.py` to include Arabic punctuation: - `،` (comma) - `؛` (semicolon) - `؟` (question mark) - `۔` (Arabic full stop) - Ensures citation insertion works correctly in RTL sentences. - Updated citation prompt instructions to clarify citation placement rules for RTL languages. #### Frontend - Introduced a new utility: `text-direction.ts` - Detects text direction based on Unicode ranges. - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts. - Provides `getDirAttribute()` for automatic `dir` assignment. - Applied dynamic `dir` attributes across: - Markdown rendering - Chat messages - Search results - Tables - Hover cards and reference popovers - Added proper RTL styling in LESS: - Text alignment adjustments - Blockquote border flipping - Section indentation correction - Table direction switching - Use of `<bdi>` for figure labels to prevent bidirectional conflicts #### DevOps / Environment - Added Windows backend launch script with retry handling. - Updated dependency metadata. - Adjusted development-only React debugging behavior. --- ### Type of change - [x] Bug Fix (non-breaking change which fixes RTL rendering and citation issues) - [x] New Feature (non-breaking change which adds RTL detection and dynamic direction handling) --------- Co-authored-by: 6ba3i <isbaaoui09@gmail.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com> Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 08:03:44 +03:00
# copy models downloaded via download_deps.py
RUN mkdir -p /ragflow/rag/res/deepdoc /root/.ragflow
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/huggingface.co,target=/huggingface.co \
tar --exclude='.*' -cf - \
/huggingface.co/InfiniFlow/text_concat_xgb_v1.0 \
/huggingface.co/InfiniFlow/deepdoc \
| tar -xf - --strip-components=3 -C /ragflow/rag/res/deepdoc
# https://github.com/chrismattmann/tika-python
# This is the only way to run python-tika without internet access. Without this set, the default is to check the tika version and pull latest every time from Apache.
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
cp -r /deps/nltk_data /root/ && \
cp /deps/tika-server-standard-3.2.3.jar /deps/tika-server-standard-3.2.3.jar.md5 /ragflow/ && \
cp /deps/cl100k_base.tiktoken /ragflow/9b5ad71b2ce5302211f9c61530b329a4922fc6a4
ENV TIKA_SERVER_JAR="file:///ragflow/tika-server-standard-3.2.3.jar"
ENV DEBIAN_FRONTEND=noninteractive
# Setup apt
# Python package and implicit dependencies:
# opencv-python: libglib2.0-0 libglx-mesa0 libgl1
# python-pptx: default-jdk tika-server-standard-3.2.3.jar
# selenium: libatk-bridge2.0-0 chrome-linux64-121-0-6167-85
# Building C extensions: libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
apt update && \
apt --no-install-recommends install -y ca-certificates; \
if [ "$NEED_MIRROR" == "1" ]; then \
sed -i 's|http://archive.ubuntu.com/ubuntu|https://mirrors.tuna.tsinghua.edu.cn/ubuntu|g' /etc/apt/sources.list.d/ubuntu.sources; \
sed -i 's|http://security.ubuntu.com/ubuntu|https://mirrors.tuna.tsinghua.edu.cn/ubuntu|g' /etc/apt/sources.list.d/ubuntu.sources; \
fi; \
rm -f /etc/apt/apt.conf.d/docker-clean && \
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache && \
chmod 1777 /tmp && \
apt update && \
apt install -y libglib2.0-0 libglx-mesa0 libgl1 && \
apt install -y pkg-config libicu-dev libgdiplus && \
apt install -y default-jdk && \
apt install -y libatk-bridge2.0-0 && \
apt install -y libpython3-dev libgtk-4-1 libnss3 xdg-utils libgbm-dev && \
apt install -y libjemalloc-dev && \
apt install -y gnupg unzip curl wget git vim less && \
apt install -y ghostscript && \
apt install -y pandoc && \
Feature/docs generator (#11858) ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### What problem does this PR solve? This PR introduces a new Docs Generator agent component for producing downloadable PDF, DOCX, or TXT files from Markdown content generated within a RAGFlow workflow. ### **Key Features** **Backend** - New component: DocsGenerator (agent/component/docs_generator.py) - - Markdown → PDF/DOCX/TXT conversion - - Supports tables, lists, code blocks, headings, and rich formatting - - Configurable document style (fonts, margins, colors, page size, orientation) - - Optional header logo and footer with page numbers/timestamps - **Frontend** - New configuration UI for the Docs Generator - - Download button integrated into the chat interface - - Output wired to the Message component - - Full i18n support **Documentation** Added component guide: docs/guides/agent/agent_component_reference/docs_generator.md **Usage** Add the Docs Generator to a workflow, connect Markdown output from an upstream component, configure metadata/style, and feed its output into the Message component. Users will see a document download button directly in the chat. **Contributor Note** We have been following RAGFlow since more than a year and half now and have worked extensively on personalizing the framework and integrating it into several of our internal systems. Over the past year and a half, we have built multiple platforms that rely on RAGFlow as a core component, which has given us a strong appreciation for how flexible and powerful the project is. We also previously contributed the full Italian translation, and we were glad to see it accepted. This new Docs Generator component was created for our own production needs, and we believe that it may be useful for many others in the community as well. We want to sincerely thank the entire RAGFlow team for the remarkable work you have done and continue to do. If there are opportunities to contribute further, we would be glad to help whenever we have time available. It would be a pleasure to support the project in any way we can. If appropriate, we would be glad to be listed among the project’s contributors, but in any case we look forward to continuing to support and contribute to the project. PentaFrame Development Team --------- Co-authored-by: PentaFrame <info@pentaframe.it> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-12 07:59:43 +01:00
apt install -y texlive && \
apt install -y fonts-freefont-ttf fonts-noto-cjk && \
apt install -y postgresql-client
# Download resource from GitHub to /usr/share/infinity
RUN mkdir -p /usr/share/infinity/resource && \
if [ "$NEED_MIRROR" == "1" ]; then \
git clone --depth 1 --single-branch https://gitee.com/infiniflow/resource /tmp/resource; \
else \
git clone --depth 1 --single-branch https://github.com/infiniflow/resource.git /tmp/resource; \
fi && \
cp -r /tmp/resource/* /usr/share/infinity/resource && \
rm -rf /tmp/resource
ARG NGINX_VERSION=1.29.5-1~noble
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
mkdir -p /etc/apt/keyrings && \
curl --retry 5 --retry-delay 2 --retry-all-errors -fsSL https://nginx.org/keys/nginx_signing.key | gpg --dearmor -o /etc/apt/keyrings/nginx-archive-keyring.gpg && \
echo "deb [signed-by=/etc/apt/keyrings/nginx-archive-keyring.gpg] https://nginx.org/packages/mainline/ubuntu/ noble nginx" > /etc/apt/sources.list.d/nginx.list && \
apt -o Acquire::Retries=5 update && \
apt -o Acquire::Retries=5 install -y nginx=${NGINX_VERSION} && \
apt-mark hold nginx
# Install uv
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
if [ "$NEED_MIRROR" == "1" ]; then \
mkdir -p /etc/uv && \
echo 'python-install-mirror = "https://registry.npmmirror.com/-/binary/python-build-standalone/"' > /etc/uv/uv.toml && \
echo '[[index]]' >> /etc/uv/uv.toml && \
echo 'url = "https://mirrors.aliyun.com/pypi/simple"' >> /etc/uv/uv.toml && \
echo 'default = true' >> /etc/uv/uv.toml; \
fi; \
arch="$(uname -m)"; \
if [ "$arch" = "x86_64" ]; then uv_arch="x86_64"; else uv_arch="aarch64"; fi; \
tar xzf "/deps/uv-${uv_arch}-unknown-linux-gnu.tar.gz" \
&& cp "uv-${uv_arch}-unknown-linux-gnu/"* /usr/local/bin/ \
&& rm -rf "uv-${uv_arch}-unknown-linux-gnu" \
&& uv python install 3.12
ENV PYTHONDONTWRITEBYTECODE=1 DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1 \
UV_HTTP_TIMEOUT=200 \
UV_HTTP_RETRIES=3
ENV PATH=/root/.local/bin:$PATH
# nodejs 12.22 on Ubuntu 22.04 is too old
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
apt purge -y nodejs npm cargo && \
apt autoremove -y && \
apt update && \
apt install -y nodejs
# A modern version of cargo is needed for the latest version of the Rust compiler.
RUN apt update && apt install -y curl build-essential \
&& if [ "$NEED_MIRROR" == "1" ]; then \
# Use TUNA mirrors for rustup/rust dist files \
export RUSTUP_DIST_SERVER="https://mirrors.tuna.tsinghua.edu.cn/rustup"; \
export RUSTUP_UPDATE_ROOT="https://mirrors.tuna.tsinghua.edu.cn/rustup/rustup"; \
echo "Using TUNA mirrors for Rustup."; \
fi; \
# Force curl to use HTTP/1.1 \
curl --proto '=https' --tlsv1.2 --http1.1 -sSf https://sh.rustup.rs | bash -s -- -y --profile minimal \
&& echo 'export PATH="/root/.cargo/bin:${PATH}"' >> /root/.bashrc
ENV PATH="/root/.cargo/bin:${PATH}"
RUN cargo --version && rustc --version
# Add msssql ODBC driver
# macOS ARM64 environment, install msodbcsql18.
# general x86_64 environment, install msodbcsql17.
RUN --mount=type=cache,id=ragflow_apt,target=/var/cache/apt,sharing=locked \
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - && \
curl https://packages.microsoft.com/config/ubuntu/22.04/prod.list > /etc/apt/sources.list.d/mssql-release.list && \
apt update && \
arch="$(uname -m)"; \
if [ "$arch" = "arm64" ] || [ "$arch" = "aarch64" ]; then \
# ARM64 (macOS/Apple Silicon or Linux aarch64) \
ACCEPT_EULA=Y apt install -y unixodbc-dev msodbcsql18; \
else \
# x86_64 or others \
ACCEPT_EULA=Y apt install -y unixodbc-dev msodbcsql17; \
fi || \
{ echo "Failed to install ODBC driver"; exit 1; }
# Add dependencies of selenium
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/chrome-linux64-121-0-6167-85,target=/chrome-linux64.zip \
unzip /chrome-linux64.zip && \
mv chrome-linux64 /opt/chrome && \
ln -s /opt/chrome/chrome /usr/local/bin/
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/chromedriver-linux64-121-0-6167-85,target=/chromedriver-linux64.zip \
unzip -j /chromedriver-linux64.zip chromedriver-linux64/chromedriver && \
mv chromedriver /usr/local/bin/ && \
rm -f /usr/bin/google-chrome
RUN --mount=type=bind,from=infiniflow/ragflow_deps:latest,source=/,target=/deps \
if [ "$(uname -m)" = "x86_64" ]; then \
dpkg -i /deps/libssl1.1_1.1.1f-1ubuntu2_amd64.deb; \
elif [ "$(uname -m)" = "aarch64" ]; then \
dpkg -i /deps/libssl1.1_1.1.1f-1ubuntu2_arm64.deb; \
fi
# builder stage
FROM base AS builder
USER root
WORKDIR /ragflow
# install dependencies from uv.lock file
COPY pyproject.toml uv.lock ./
# https://github.com/astral-sh/uv/issues/10462
# uv records index url into uv.lock but doesn't failover among multiple indexes
RUN --mount=type=cache,id=ragflow_uv,target=/root/.cache/uv,sharing=locked \
if [ "$NEED_MIRROR" == "1" ]; then \
sed -i 's|pypi.org|mirrors.aliyun.com/pypi|g' uv.lock; \
else \
sed -i 's|mirrors.aliyun.com/pypi|pypi.org|g' uv.lock; \
fi; \
uv sync --python 3.12 --frozen && \
# Ensure pip is available in the venv for runtime package installation (fixes #12651)
.venv/bin/python3 -m ensurepip --upgrade
COPY web web
COPY docs docs
RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \
An issue involving node.js OOM happened (#12690) ### What problem does this PR solve? The Node.js memory issue occurred due to JavaScript heap exhaustion during the Vite build process sometimes. Here's what happened: export NODE_OPTIONS="--max-old-space-size=4096" && \ Root Cause: The Node.js memory issue occurred due to JavaScript heap exhaustion during the Vite build process sometimes. Here's what happened: Root Cause: When building the web frontend with npm run build, Vite needs to bundle, transform, and optimize all JavaScript/TypeScript code Node.js has a default maximum heap size of ~2GB The RAGFlow web application is large enough that the build process exceeded this limit This triggered garbage collection failures ("Ineffective mark-compacts near heap limit") and eventually crashed with exit code 134 (SIGABRT) The solution I attempted: I did not find a simple method to reduce the use of memory for node.js, so I added NODE_OPTIONS=--max-old-space-size=4096 to allocate 4GB heap memory for Node.js during the build. ### Type of change - Bug Fix (non-breaking change which fixes an issue) => ERROR [builder 6/8] RUN --mount=type=cache,id=ragflow_npm,target=/ro 53.3s [builder 6/8] RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked cd web && npm install && npm run build: 4.551 4.551 > prepare 4.551 > cd .. && husky web/.husky 4.551 4.810 .git can't be found 4.833 added 7 packages in 4s 4.833 4.833 499 packages are looking for funding 4.833 run npm fund for details 5.206 5.206 > build 5.206 > vite build --mode production 5.206 5.939 vite v7.3.0 building client environment for production... 6.169 transforming... 6.472 6.472 WARN 6.472 6.472 6.472 WARN warn - As of Tailwind CSS v3.3, the @tailwindcss/line-clamp plugin is now included by default. 6.472 6.472 6.472 WARN warn - Remove it from the plugins array in your configuration to eliminate this warning. 6.472 53.14 53.14 <--- Last few GCs ---> 53.14 53.14 [41:0x55f82d0] 47673 ms: Scavenge (reduce) 2041.5 (2086.0) -> 2038.7 (2079.7) MB, 6.11 / 0.00 ms (average mu = 0.330, current mu = 0.319) allocation failure; 53.14 [41:0x55f82d0] 47727 ms: Scavenge (reduce) 2039.4 (2079.7) -> 2038.7 (2080.2) MB, 5.34 / 0.00 ms (average mu = 0.330, current mu = 0.319) allocation failure; 53.14 [41:0x55f82d0] 47809 ms: Scavenge (reduce) 2039.6 (2080.2) -> 2038.7 (2080.2) MB, 4.59 / 0.00 ms (average mu = 0.330, current mu = 0.319) allocation failure; 53.14 53.14 53.14 <--- JS stacktrace ---> 53.14 53.14 FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory 53.14 ----- Native stack trace ----- 53.14 53.14 1: 0xb76db1 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node] 53.14 2: 0xee62f0 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node] 53.14 3: 0xee65d7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node] 53.14 4: 0x10f82d5 [node] 53.14 5: 0x10f8864 v8::internal::Heap::RecomputeLimits(v8::internal::GarbageCollector) [node] 53.14 6: 0x110f754 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node] 53.14 7: 0x110ff6c v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node] 53.14 8: 0x11120ca v8::internal::Heap::HandleGCRequest() [node] 53.14 9: 0x107d737 v8::internal::StackGuard::HandleInterrupts() [node] 53.15 10: 0x151fb9a v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) [node] 53.15 11: 0x1959ef6 [node] 53.22 Aborted [+] up 0/1 ⠙ Image docker-ragflow Building 58.0s Dockerfile:161 160 | COPY docs docs 161 | >>> RUN --mount=type=cache,id=ragflow_npm,target=/root/.npm,sharing=locked \ 162 | >>> cd web && npm install && npm run build 163 | failed to solve: process "/bin/bash -c cd web && npm install && npm run build" did not complete successfully: exit code: 134 View build details: docker-desktop://dashboard/build/default/default/j68n2ke32cd8bte4y8fs471au
2026-01-19 14:28:38 +08:00
export NODE_OPTIONS="--max-old-space-size=4096" && \
cd web && npm install && npm run build
COPY .git /ragflow/.git
2024-12-07 16:56:34 +08:00
RUN version_info=$(git describe --tags --match=v* --first-parent --always); \
version_info="$version_info"; \
echo "RAGFlow version: $version_info"; \
echo $version_info > /ragflow/VERSION
# production stage
FROM base AS production
USER root
WORKDIR /ragflow
# Copy Python environment and packages
ENV VIRTUAL_ENV=/ragflow/.venv
COPY --from=builder ${VIRTUAL_ENV} ${VIRTUAL_ENV}
ENV PATH="${VIRTUAL_ENV}/bin:${PATH}"
ENV PYTHONPATH=/ragflow/
COPY web web
COPY admin admin
COPY api api
COPY conf conf
COPY deepdoc deepdoc
COPY rag rag
COPY agent agent
COPY pyproject.toml uv.lock ./
COPY mcp mcp
COPY common common
COPY memory memory
COPY bin bin
COPY docker/service_conf.yaml.template ./conf/service_conf.yaml.template
COPY docker/entrypoint.sh ./
RUN chmod +x ./entrypoint*.sh
# Copy compiled web pages
COPY --from=builder /ragflow/web/dist /ragflow/web/dist
COPY --from=builder /ragflow/VERSION /ragflow/VERSION
ENTRYPOINT ["./entrypoint.sh"]