mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-06-29 15:31:05 +08:00
build(go): make bash build.sh work on macOS arm64 (Homebrew) (#15009)
## Problem
The Go server build pipeline (`build.sh` + CMake + CGO bindings) was
tested on Ubuntu only. On macOS arm64 with Homebrew it fails in five
orthogonal places. None of these require platform-specific code paths —
the same source builds on both Linux and Darwin after these fixes.
## Reproduction (before)
```
$ uname -a
Darwin … 25.4.0 arm64
$ brew install cmake pcre2 simde
$ bash build.sh
…
error: 'simde/x86/sse4.1.h' file not found
error: implicit instantiation of undefined template 'std::basic_istringstream<char>'
error: no matching function for call to 'Join'
…
clang: error: no such file or directory: '/usr/local/lib/libpcre2-8.a'
```
## Fix (5 small, orthogonal changes)
### 1. `internal/cpp/CMakeLists.txt` — find Homebrew + libpcre2-8
portably
- Detect Apple platforms via `if(APPLE)`, call `brew --prefix` once, add
`${HOMEBREW_PREFIX}/include` and `${HOMEBREW_PREFIX}/lib`. No effect on
Linux.
- Replace the literal `libpcre2-8.a` link token (which only the Linux
linker finds in `/usr/local/lib` by default) with
`find_library(PCRE2_LIB NAMES pcre2-8 REQUIRED)`. Works on
`/usr/lib/x86_64-linux-gnu` (Debian/Ubuntu), `/usr/local/lib` (Intel Mac
& legacy Linux), `/opt/homebrew/lib` (Apple Silicon).
### 2. `internal/cpp/wordnet_lemmatizer.cpp` +
`internal/cpp/rag_analyzer.cpp` — explicit `#include <sstream>`
libstdc++ (Linux) pulls `<sstream>` in transitively via `<fstream>`;
libc++ (Apple Clang) doesn't, so the existing `std::istringstream` /
`std::ostringstream` uses fail to compile on macOS. One-line include in
each file.
### 3. `internal/cpp/rag_analyzer.cpp` — `Join` template overload fix
`Join(tokens, start, tokens.size(), delim)` at line 146 passes `size_t`
to an `int` parameter. C++23 strict mode in Apple Clang refuses the
implicit narrowing and reports the 4-arg overload as a substitution
failure, leaving the call ambiguous between the 3-arg and 4-arg
templates. Fix: explicit `static_cast<int>(tokens.size())`. Behaviour
identical on libstdc++ — the narrowing was always intentional.
### 4. `internal/binding/rag_analyzer.go` — split darwin CGO LDFLAGS
The existing `#cgo darwin LDFLAGS: ... /usr/local/lib/libpcre2-8.a` only
matches Intel Macs. Apple Silicon Homebrew installs to `/opt/homebrew`.
Split into `darwin,arm64` and `darwin,amd64` build constraints with the
right absolute path on each.
### 5. `build.sh` — accept Homebrew path in the pcre2 sanity check
The sanity check looked at two Linux paths only and then fell through to
`sudo apt -y install libpcre2-dev` on failure. Added
`/opt/homebrew/lib/libpcre2-8.a`, and on Darwin failure now exits
cleanly with the right `brew install pcre2` hint instead of trying
`apt`.
## Verified
- `bash build.sh` now completes on macOS arm64 (Apple Silicon, brew 4.x,
cmake 4.x, Apple Clang 17, Go 1.25, pcre2 10.x, simde 0.8.x).
- Produced binaries: `bin/server_main`, `bin/admin_server`,
`bin/ragflow_cli`.
- `bin/server_main` boots, connects MySQL, runs migrations, loads the 64
model provider configs cleanly.
- Still builds on Linux — the CMake additions are inside an `if(APPLE)`
guard, the `find_library` call matches Linux paths too, the build.sh
check still tries `apt` when not on Darwin.
## Out of scope
The Go server itself currently fails at runtime when not pointing at
Elasticsearch (`Failed to initialize doc engine: failed to ping
Elasticsearch`), but that's the placeholder Infinity engine documented
in `internal/engine/README.md` — unrelated to this build patchset.
---
Happy to split this into smaller PRs if you'd prefer (one per file). The
five changes are independent.
This commit is contained in:
11
build.sh
11
build.sh
@@ -84,10 +84,17 @@ build_go() {
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check for pcre2 library
|
||||
if [ -f "/usr/lib/x86_64-linux-gnu/libpcre2-8.a" ] || [ -f "/usr/local/lib/libpcre2-8.a" ]; then
|
||||
# Check for pcre2 library — known Linux paths + macOS Homebrew (Apple Silicon
|
||||
# at /opt/homebrew, Intel Macs at /usr/local).
|
||||
if [ -f "/usr/lib/x86_64-linux-gnu/libpcre2-8.a" ] \
|
||||
|| [ -f "/usr/local/lib/libpcre2-8.a" ] \
|
||||
|| [ -f "/opt/homebrew/lib/libpcre2-8.a" ]; then
|
||||
echo "✓ pcre2 library found"
|
||||
else
|
||||
if [ "$(uname)" = "Darwin" ]; then
|
||||
echo -e "${RED}Error: libpcre2-8.a not found. Install with: brew install pcre2${NC}"
|
||||
exit 1
|
||||
fi
|
||||
echo -e "${YELLOW}Warning: libpcre2-8.a not found. You may need to install libpcre2-dev:${NC}"
|
||||
sudo apt -y install libpcre2-dev
|
||||
fi
|
||||
|
||||
@@ -19,7 +19,9 @@ package rag_analyzer
|
||||
/*
|
||||
#cgo CXXFLAGS: -std=c++20 -I${SRCDIR}/..
|
||||
#cgo linux LDFLAGS: ${SRCDIR}/../cpp/cmake-build-release/librag_tokenizer_c_api.a -lstdc++ -lm -lpthread /usr/lib/x86_64-linux-gnu/libpcre2-8.a
|
||||
#cgo darwin LDFLAGS: ${SRCDIR}/../cpp/cmake-build-release/librag_tokenizer_c_api.a -lstdc++ -lm -lpthread /usr/local/lib/libpcre2-8.a
|
||||
// Apple Silicon: Homebrew installs to /opt/homebrew; Intel Macs keep /usr/local.
|
||||
#cgo darwin,arm64 LDFLAGS: ${SRCDIR}/../cpp/cmake-build-release/librag_tokenizer_c_api.a -lstdc++ -lm -lpthread /opt/homebrew/lib/libpcre2-8.a
|
||||
#cgo darwin,amd64 LDFLAGS: ${SRCDIR}/../cpp/cmake-build-release/librag_tokenizer_c_api.a -lstdc++ -lm -lpthread /usr/local/lib/libpcre2-8.a
|
||||
|
||||
#include <stdlib.h>
|
||||
#include "../cpp/rag_analyzer_c_api.h"
|
||||
|
||||
@@ -3,6 +3,40 @@ project(rag_tokenizer)
|
||||
|
||||
set(CMAKE_CXX_STANDARD 23)
|
||||
|
||||
# macOS dependency discovery — Homebrew installs headers and libs under a
|
||||
# prefix that is NOT on the compiler's default search path (Apple Silicon:
|
||||
# /opt/homebrew, Intel: /usr/local). Linux is left completely untouched:
|
||||
# the infinity_builder image already ships pcre2 + simde where the
|
||||
# toolchain finds them, so adding paths there risks shadowing them.
|
||||
if(APPLE)
|
||||
execute_process(
|
||||
COMMAND brew --prefix
|
||||
OUTPUT_VARIABLE HOMEBREW_PREFIX
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE BREW_RC
|
||||
)
|
||||
if(BREW_RC EQUAL 0 AND HOMEBREW_PREFIX)
|
||||
message(STATUS "macOS detected; Homebrew prefix: ${HOMEBREW_PREFIX}")
|
||||
include_directories(SYSTEM "${HOMEBREW_PREFIX}/include")
|
||||
link_directories("${HOMEBREW_PREFIX}/lib")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# Resolve libpcre2-8.
|
||||
# - Linux: keep upstream's bare `libpcre2-8.a` token verbatim. The linker
|
||||
# resolves it from its own default search path, which the
|
||||
# infinity_builder image populates. find_library() does NOT see that
|
||||
# path (pcre2 is built from source there), so calling it here would
|
||||
# break the CI build that worked before.
|
||||
# - macOS: the bare token fails (libpcre2-8.a is under the Homebrew
|
||||
# prefix, off the default path), so resolve the full path explicitly.
|
||||
if(APPLE)
|
||||
find_library(PCRE2_LIB NAMES pcre2-8 REQUIRED)
|
||||
else()
|
||||
set(PCRE2_LIB libpcre2-8.a)
|
||||
endif()
|
||||
message(STATUS "PCRE2 library: ${PCRE2_LIB}")
|
||||
|
||||
# Option to enable AddressSanitizer
|
||||
option(ENABLE_ASAN "Enable AddressSanitizer" OFF)
|
||||
|
||||
@@ -88,7 +122,7 @@ add_executable(rag_tokenizer
|
||||
${darts_src}
|
||||
${re2_src})
|
||||
|
||||
target_link_libraries(rag_tokenizer stdc++ m libpcre2-8.a)
|
||||
target_link_libraries(rag_tokenizer stdc++ m ${PCRE2_LIB})
|
||||
target_include_directories(rag_tokenizer PUBLIC "${CMAKE_SOURCE_DIR}")
|
||||
set_target_properties(rag_tokenizer PROPERTIES
|
||||
CXX_STANDARD 20
|
||||
@@ -118,7 +152,7 @@ add_library(rag_tokenizer_c_api STATIC
|
||||
${re2_src}
|
||||
)
|
||||
|
||||
target_link_libraries(rag_tokenizer_c_api stdc++ libm.a libpcre2-8.a)
|
||||
target_link_libraries(rag_tokenizer_c_api stdc++ libm.a ${PCRE2_LIB})
|
||||
target_include_directories(rag_tokenizer_c_api PUBLIC "${CMAKE_SOURCE_DIR}")
|
||||
set_target_properties(rag_tokenizer_c_api PROPERTIES
|
||||
CXX_STANDARD 20
|
||||
@@ -130,7 +164,7 @@ add_executable(rag_analyzer_c_test
|
||||
rag_analyzer_c_test.cpp
|
||||
)
|
||||
|
||||
target_link_libraries(rag_analyzer_c_test rag_tokenizer_c_api stdc++ libm.a libpcre2-8.a)
|
||||
target_link_libraries(rag_analyzer_c_test rag_tokenizer_c_api stdc++ libm.a ${PCRE2_LIB})
|
||||
target_include_directories(rag_analyzer_c_test PUBLIC "${CMAKE_SOURCE_DIR}")
|
||||
set_target_properties(rag_analyzer_c_test PROPERTIES
|
||||
CXX_STANDARD 20
|
||||
|
||||
@@ -22,6 +22,7 @@
|
||||
#include "re2/re2.h"
|
||||
|
||||
#include <cassert>
|
||||
#include <sstream> // std::ostringstream / std::istringstream — explicit for libc++ (macOS)
|
||||
#include <cstdint>
|
||||
#include <filesystem>
|
||||
#include <iostream>
|
||||
@@ -143,7 +144,10 @@ std::string Join(const std::vector<T> &tokens, int start, int end, const std::st
|
||||
|
||||
template <typename T>
|
||||
std::string Join(const std::vector<T> &tokens, int start, const std::string &delim = " ") {
|
||||
return Join(tokens, start, tokens.size(), delim);
|
||||
// C++23 strict overload resolution refuses the implicit size_t → int
|
||||
// narrowing conversion; the explicit cast makes the 4-arg overload above
|
||||
// unambiguous on libc++ (macOS) without changing behaviour on libstdc++.
|
||||
return Join(tokens, start, static_cast<int>(tokens.size()), delim);
|
||||
}
|
||||
|
||||
std::string Join(const TermList &tokens, int start, int end, const std::string &delim = " ") {
|
||||
|
||||
@@ -15,6 +15,7 @@
|
||||
#include "wordnet_lemmatizer.h"
|
||||
#include <fstream>
|
||||
#include <filesystem>
|
||||
#include <sstream> // std::istringstream — implicit via <fstream> on libstdc++ (Linux), explicit on libc++ (macOS)
|
||||
|
||||
namespace fs = std::filesystem;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user