1 Spring AI简介

1.1 简介

Spring AI 是 Spring 官方推出的 AI 应用开发框架，它抽象了不同 AI 模型（如 OpenAI、Azure OpenAI、百度文心一言等）的调用方式，让你能用统一的 API 对接各种大模型，无需关注不同厂商的 SDK 差异，就像 Spring Boot 简化后端开发一样简化 AI 应用开发。

Spring AI 的核心是提供了开发 AI 大模型应用所需的基本抽象模型，这些抽象拥有多种实现方式，使得开发者可以用很少的代码改动就能实现组件的轻松替换；

1.2 主要功能

对主流 AI 大模型供应商提供了支持
支持AI大模型类型包括：聊天、文生图、语音转译等
支持主流的Embedding Models和向量数据库

2 Spring AI快速入门

2.1 引入依赖

 <properties>
     <spring-boot.version>3.5.6</spring-boot.version>
     <spring-ai.version>1.0.3</spring-ai.version>

     <maven.compiler.source>21</maven.compiler.source>
     <maven.compiler.target>21</maven.compiler.target>
     <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-dependencies</artifactId>
            <version>${spring-boot.version}</version>
            <scope>import</scope>
            <type>pom</type>
        </dependency>

        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <scope>import</scope>
            <type>pom</type>
        </dependency>
    </dependencies>
</dependencyManagement>

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-openai</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

2.2 创建配置文件

spring:
  ai:
    openai:
      # 模型API Key
      api-key: sk-12345678901234567890123456789012
      # 模型API地址
      base-url: https://ark.cn-beijing.volces.com/api
      chat:
        options:
          # 模型名称
          model: doubao-seed-1-6-251015
          # temperature参数用于控制生成文本的多样性
          # 值越高，生成的文本越多样化，但也可能包含更多的随机性和不可预测的内容
          # 值越低，生成的文本越接近于确定性的结果，即生成的文本会更加一致和可预测
          temperature: 0.7
      image:
        images-path: /v3/images/generations

2.3 编写测试文件

@SpringBootTest
public class SpringAIDemoTest {

    @Resource
    private ChatModel chatModel;

    @Test
    public void testChatModel() {
        // 1. 构建 Prompt，显式指定模型参数（确保和配置文件一致）
        String userPrompt = "你好";
        Prompt prompt = new Prompt(userPrompt);

        // 2. 调用模型并获取完整响应（而非仅字符串，便于排查问题）
        ChatResponse response = chatModel.call(prompt);

        // 3. 提取并打印回答内容
        String answer = response.getResult().getOutput().getText();
        System.out.println(answer);

        // 可选：打印响应元数据（排查模型/令牌数等问题）
        System.out.println("\n===== 响应元数据 =====");
        System.out.println("使用模型：" + response.getMetadata().getModel());
        System.out.println("令牌数：" + response.getMetadata().getUsage().getTotalTokens());
        
        Assertions.assertNotNull(answer);
    }

}

3 Spring AI的聊天模型

3.1 ChatClient

ChatClient 是一个接口，它定义了一个与聊天服务交互的客户端。

ChatClient接口提供了构建和配置聊天客户端对象的灵活性，以及发起和处理聊天请求的能力。用户可以通过ChatClient.Builder来定制客户端的行为，然后使用prompt()和prompt(Prompt prompt)方法设置请求规范，最后通过call()方法发起聊天请求。

3.1.1 实现简单的对话

定义Bean

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
    return builder.build();
}

测试类

 @Test
public void testSimpleChatClient() {
    ChatResponse response = chatClient.prompt()
        .user("你好")
        .call()
        .chatResponse();

    String answer = response.getResult().getOutput().getText();
    System.out.println(answer);

    // 可选：打印响应元数据（排查模型/令牌数等问题）
    System.out.println("\n===== 响应元数据 =====");
    System.out.println("使用模型：" + response.getMetadata().getModel());
    System.out.println("令牌数：" + response.getMetadata().getUsage().getTotalTokens());

    Assertions.assertNotNull(answer);
}

3.1.2 实现角色预设

ChatClient.Builder中提供了很多参数来选择，其中defaultSystem属性可以用来给大模型设置角色和背景，后续大模型可以根据该角色回答你的信息

@Bean
public ChatClient chatClient(ChatClient.Builder builder) {
    return builder
        .defaultSystem("你是一名熟练的 Java 程序员， 你能帮助我解决 Java 相关的问题")
        .build();
}

3.1.3 实现流式响应

非流式输出 call：等待大模型把回答结果全部生成后输出给用户；
流式输出stream：逐个字符输出，一方面符合大模型生成方式的本质，另一方面当模型推理效率不是很高时，流式输出比起全部生成后再输出大大提高用户体验。

CountDownLatch latch = new CountDownLatch(1);

chatClient.prompt()
    .user("你好")
    .stream()
    .chatResponse()
    .doOnSubscribe(subscription -> System.out.println("订阅成功，开始接收数据..."))
    .doOnComplete(() -> {
        System.out.println("\n===== 流式输出完成 =====");
        latch.countDown();
    })
    .doOnError(e -> {
        System.err.println("流式输出错误：" + e.getMessage());
        e.printStackTrace();
        latch.countDown();
    })
    .subscribe(response -> {
        String content = response.getResult().getOutput().getText();
        if (content != null && !content.isEmpty()) {
            System.out.print(content);
            System.out.flush();
        }
    });

// 等待流式输出完成，最多等待 60 秒
boolean finished = latch.await(60, TimeUnit.SECONDS);
System.out.println("\n等待结果: " + (finished ? "完成" : "超时"));

3.2 ChatModel接口

ChatModel接口作为核心，定义了与AI模型交互的基本方法。它继承自Model<Prompt, ChatResponse>，提供了三个call方法：

default String call(String message) {
    // ......
}

default String call(Message... messages) {
	// ......
}

ChatResponse call(Prompt prompt);

在ChatModel接口中，带有String参数的call()方法简化了实际的使用，避免了更复杂的Prompt和ChatResponse类的复杂性。但是在实际应用程序中，更常见的是使用ChatResponse call()方法

@Test
public void testOtherModels() {
    ChatResponse response = chatModel.call(
        new Prompt(
            "你好",
            OpenAiChatOptions.builder()
            .model("doubao-seed-1-8-251228")
            .temperature(0.8)
            .build()
        )
    );
    String answer = response.getResult().getOutput().getText();
    System.out.println(answer);
}

3.3 提示词

提示词是引导大模型生成特定输出的输入，提示词的设计和措辞会极大地影响模型的响应结果，Prompt 提示词是与模型交互的一种输入数据组织方式，本质上是一种复合结构的输入，在 prompt 我们是可以包含多组不同角色（System、User、Aissistant等）的信息。如何管理好 Prompt 是简化 AI 应用开发的关键环节。

System：设定AI行为边界/角色/定位。指导AI的行为和响应方式，设置AI如何解释和回复输入
User：用户原始提问输入。代表用户的输入他们向AI提出的问题、命令或陈述
Assistant：AI返回的响应信息，定义为”助手角色“消息。用它能确保上下文能够连贯的交互
Tool：桥接外部服务，可以进行函数调用

Spring AI 提供了 Prompt Template 提示词模板管理抽象，开发者可以预先定义好模板，并在运行时替换模板中的关键词。在 Spring AI 与大模型交互的过程中，处理提示词首先要创建包含动态内容占位符 {占位符} 的模板，然后，这些占位符会根据用户请求或应用程序中的其他代码进行替换。在提示词模板中，{占位符} 可以用 Map 中的变量动态替换。

@Test
public void testPrompt() {
    String userText= """
        给我推荐北京的至少三种美食
        """;
        UserMessage userMessage = new UserMessage(userText);
    String systemText= """
        你是一个美食咨询助手，可以帮助人们查询美食信息。
        你的名字是{name},
    你应该用你的名字和{voice}的饮食习惯回复用户的请求。
        """;
        SystemPromptTemplate systemPromptTemplate = new SystemPromptTemplate(systemText);
    // 替换占位符
    Message systemMessage = systemPromptTemplate
        .createMessage(Map.of("name", name, "voice", voice));
    Prompt prompt = new Prompt(List.of(userMessage, systemMessage));
    List<Generation> results = chatModel.call(prompt).getResults();
    String answer = results.stream().map(x -> x.getOutput().getText()).collect(Collectors.joining(""));
    System.out.println(answer);
}

3.4 调用Ollama

Ollama 是一个用于本地化部署和管理大型语言模型（LLM）的工具。它支持多种开源模型（如 LLaMA、Alpaca 等），并提供了简单的 API 接口，方便开发者调用。Ollama可以让你在自己的电脑上运行各种强大的 AI 模型，就像运行普通软件一样简单。

这里简单安装可以直接1Panel安装Docker版的，这里不再赘述安装过程

3.4.1 引入依赖

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-ollama</artifactId>
</dependency>

3.4.2 创建配置文件

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: deepseek-r1:8b
          temperature: 0.7

3.4.3 测试

ollama和OpenAI调用模式很像，使用OllamaChatModel即可

@Test
public void testOllamaChatModel() {
    String response = ollamaChatModel.call("Java");
    System.out.println(response);
    Assertions.assertNotNull(response);
}

4 Spring AI的工具调用

4.1 工具调用简介

Spring AI 的工具调用（Tool Calling），也成函数调用（Function Calling）功能允许大语言模型在生成回答时触发预定义的外部函数，从而实现动态数据获取或业务逻辑操作（如查询数据库、调用 API 等）

SpringAI 帮我们规范了函数定义、注册等过程，并在发起模型请求之前自动将函数注入到 Prompt 中，而当模型决策在合适的时候去调用某个函数时，Spring AI 完成函数调用动作，最终将函数执行结果与原始问题再一并发送给模型，模型根据新的输入决策下一步动作。这其中涉及与大模型的多次交互过程，一次函数调用就是一次完成的交互过程。

函数调用的核心流程如下：

定义函数：声明可供模型调用的函数（名称、描述、参数结构）
模型交互：将函数信息与用户输入一起发送给模型，模型决定是否需要调用函数。
执行函数：解析模型的函数调用请求，执行对应的业务逻辑。
返回结果：将函数执行结果返回给模型，生成最终回答。

4.2 函数调用实现

Spring AI自定义函数非常简单，只需定义一个返回java.util.Function的@Bean定义，并在调用ChatModel时将bean名称作为选项进行注册即可。在底层，Spring会用适当的适配器代码包装你的函数，以便与 AI 模型进行交互

比如我们定义一个计算器服务，支持加法和乘法

@Configuration
public class CalculatorFunction {

    public record AddOperation(int a, int b) {
    }

    public record MulOperation(int m, int n) {
    }

    @Bean
    @Description("加法运算")
    public Function<AddOperation, Integer> addOperation() {
        return request -> request.a + request.b;
    }

    @Bean
    @Description("乘法运算")
    public Function<MulOperation, Integer> mulOperation() {
        return request -> request.m * request.n;
    }

}

然后你就可以通过toolNames引用该函数了

有些教程用的是functions，版本较老根据最新官方文档，目前最新的应该用toolNames

@Test
public void testFunctionCalling() {
    String response = ChatClient.builder(chatModel)
        .build()
        .prompt()
        .system("""
                你是算术计算器的代理。
                你能够支持加法运算、乘法运算等操作，其余功能将在后续版本中添加，如果用户问的问题不支持请告知详情。
                在提供加法运算、乘法运算等操作之前，你必须从用户处获取如下信息：两个数字，运算类型。
                请调用自定义函数执行加法运算、乘法运算。
                请讲中文。
                """)
         .user("帮我计算 3 + 5")
         .toolNames("addOperation", "mulOperation")
         .call()
         .content();

    System.out.println("响应: " + response);
    Assertions.assertNotNull(response);
}

另外一种方法是使用@Tool注解

public class DateTimeTools {

    @Tool(description = "获取当前时间")
    public String getCurrentTime() {
        return LocalDateTime.now().toString();
    }

}

@Test
public void testToolCalling() {
    ToolCallback[] tools = ToolCallbacks.from(new DateTimeTools());

    ToolCallingChatOptions options = ToolCallingChatOptions.builder().toolCallbacks(tools).build();
    Prompt prompt = new Prompt("你是谁 现在几点了", options);
    String answer = chatModel.call(prompt).getResult().getOutput().getText();
    System.out.println("响应: " + answer);
    Assertions.assertNotNull(answer);
}

5 Spring AI MCP

学习了上述工具调用，你会发现假如每次开发，都需要写一堆通用的工具类，那不是在重复造轮子吗？所以MCP就应运而生。

MCP(Model Context Protocol)是一种开放协议，它标准化了应用程序如何向大模型语言（LLMs）提供上下文。MCP提供了一种标准化的方式将AI模型连接到不同的数据源和工具，常用的MCP可以在MCP.so上获取

5.1 MCP的架构知识

MCP遵循CS架构（客户端-服务器），包含以下几个方面：

MCP主机（MCP Hosts）：发起请求的AI应用程序，比如聊天机器人，AI驱动的IDE等
MCP客户端（MCP Clients）：在主机程序内部，与MCP服务器保持1：1的连接
MCP服务器（MCP Servers）：为MCP客户端提供上下文、工具和提示信息。
本地资源（Local Resources）：本地计算机中可供MCP 服务器安全访问的资源，如文件、数据库。
远程资源（Remote Resources）：MCP服务器可以连接到的远程资源，如通过API提供的数据

在MCP的通信协议中，一般有两种模式

Stdio：支持标准输入和输出流进行通信，主要用于本地集成、命令行工具等场景
SSE：支持使用HTTP POST请求进行服务器到客户端流式处理，以实现客户端到服务器的通信

特性	SSE	STDIO
传输协议	HTTP（长连接）	操作系统级文件描述符
方向	服务器→客户端（单向推送）	双向流（stdin，stdout）
保持连接	长连接（Connection：Keep-Alive）	不保证长时间打开，取决于进程生命周期
数据格式	文本流（EventStream格式）	原始字节流
异常处理	可通过Http状态码或重连机制	进程退出或管道断裂

5.2 本地Server服务端实现

5.2.1 引入依赖

<!-- spring-ai-starter-mcp-server-webflux不能和Web依赖并存 -->
<!-- 否则会使用Tomcat而不是Netty,从而导致MCPServer启动失败 -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-server-webflux</artifactId>
</dependency>

5.2.2 配置文件

spring:
  ai:
    mcp:
      server:
        type: async
        name: customer-mcp-server
        version: 1.0.0

5.2.3 服务类

@Service
public class WeatherService {

    @Tool(description = "根据城市获取天气预报")
    public String getWeather(String city) {
        Map<String, String> weatherMap = Map.of(
                "北京", "晴朗",
                "上海", "多云",
                "广州", "晴朗"
        );
        return weatherMap.getOrDefault(city, "未知");
    }

}

5.2.3 暴露工具

@Configuration
public class McpServerConfig {

    @Bean
    public ToolCallbackProvider weatherTools(WeatherService weatherService) {
        return MethodToolCallbackProvider.builder()
                .toolObjects(weatherService)
                .build();
    }

}

5.3 本地Client客户端实现

5.3.1 引入依赖

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-mcp-client</artifactId>
</dependency>

5.3.2 配置文件

spring:
  ai:
    mcp:
      client:
        type: async
        request-timeout: 20s
        toolcallback:
          enabled: true
        sse:
          connections:
            mcp-server1:
              url: http://localhost:8011

5.3.3 ChatClient配置

@Configuration
public class SaaLLMConfig {

    @Bean
    public ChatClient saaLLMChatClient(ChatModel chatModel, ToolCallbackProvider provider) {
        return ChatClient.builder(chatModel)
                .defaultToolCallbacks(provider.getToolCallbacks())
                .build();
    }

}

之后使用ChatClient调用即可，这里不再赘述

6 Spring AI的其他模型

6.1 图像模型

在Spring AI框架中，Image Model API旨在为与专注于图像生成的各种AI模型进行交互提供一个简单且可移植的接口，使开发者能够以最小的代码改动切换不同的图像相关模型。这一设计符合Spring模块化和互换性的理念，确保开发者可以快速调整其应用程序以适应不同的图像处理相关的AI能力。此外，通过支持像ImagePrompt这样的辅助类来进行输入封装以及使用ImageResponse来处理输出，图像模型API统一了与致力于图像生成的AI模型之间的通信。它管理请求准备和响应解析的复杂性，为图像生成功能提供直接而简化的API交互。

Spring AI框架的ImageModel API 抽象了应用程序通过模型调用实现“文生图”的交互过程，基本流程：应用程序接收文本，调用模型生成图片。ImageModel 的入参为包装类型ImagePrompt，输出类型为ImageResponse

@Resource
private OpenAiImageModel openAiImageModel;

@Test
public void testImageGen() {
    ImageResponse response = openAiImageModel.call(
        new ImagePrompt(
            "A photo of a cute cat.",
            // 这里可以加自定义的选项
            OpenAiImageOptions.builder()
            	.model("doubao-seedream-4-5-251128")
            	.build()
        )
    );

    //获取生成图像地址
    String imageUrl = response.getResult().getOutput().getUrl();
    System.out.println(imageUrl);
    Assertions.assertNotNull(imageUrl);
}

6.2 语音模型

在Spring AI框架中，Text-to-Speech API提供了一个基于OpenAI的TTS（文本转语音）模型的语音端点，使用户能够：

朗读写好的博客文章。
生成多种语言的语音音频。
使用流媒体实现实时音频输出。

这一功能强大的API让用户可以轻松地将文字内容转化为语音内容，不仅支持多语言转换，还能满足实时语音输出的需求，极大地提升了内容的可访问性和用户的体验感。

这块只能OpenAI用，如果用百炼的话可以考虑SpringAI Alibaba

@Resource
private OpenAiAudioSpeechModel speechModel;

@Test
public void testTTSGen() {
    SpeechResponse response = speechModel.call(
        new SpeechPrompt(
            "床前明月光， 疑是地上霜。 举头望明月， 低头思故乡。",
            OpenAiAudioSpeechOptions.builder()
            .build()
        )
    );

    File file = new File("output.mp3");
    try (FileOutputStream fos = new FileOutputStream(file)) {
        byte[] output = response.getResult().getOutput();
        fos.write(output);
    } catch (IOException e) {
        log.error("写入文件失败", e);
    }
}

7 Spring AI实现RAG

7.1 RAG概述

7.1.1 向量化

向量数据库（Vector Database）是一种以数学向量的形式存储数据集合的数据库，通过一个数字列表来表示维度空间中的一个位置。在这里，向量数据库的功能是可以基于相似性搜索进行识别，而不是精准匹配。

比如说在使用一个商城系统的向量数据库进行查询的时候，用户输入“北京”，其可能返回的结果会是 “中国、北京、华北、首都、奥运会” 等信息；输入“沈阳”，其返回结果可能会是“东北、辽宁、雪花、重工业”等信息。当然，返回的信息取决于向量数据库中存在的数据。用户可以通过参数的设置来限定返回的情况，进而适配不同的需求。

嵌入模型（Embedding Model）和向量数据库（Vector Database/Vector Store）是一对亲密无间的合作伙伴，也是 AI 技术栈中紧密关联的两大核心组件，两者的协同作用构成了现代语义搜索、推荐系统和 RAG（Retrieval Augmented Generation，检索增强生成）等应用的技术基础。

7.1.2 RAG

RAG，全称 Retrieval-Augmented Generation ，中文叫做检索增强生成。RAG是一种结合了检索系统和生成模型的新型技术框架，其主要目的有：

利用外部知识库
帮助大模型生成更加准确、有依据、最新的回答

通过使用RAG，解决了传统LLM存在的两个主要问题：

知识局限性：LLM的知识被固定在训练数据中，无法知道最新消息。
幻觉现象：LLM有时候会编造出并不存在的答案。

通过检索外部知识，RAG让模型突破了知识局限性，也让LLM（大语言模型）的幻觉现象得到解决。

RAG工作流程大致如下

用户输入问题：用户在输入窗口输入自己的问题，这一数据被接收，并作为后续处理的查询入口
问题向量化：根据用户初始输入的问题，调用Embedding模型，将问题转换为高维向量，以便于后续的想来那个相似度检索。
向量数据库检索：系统会连接到一个向量数据库（如FAISS、Milvus、Pinecone、Weaviate）。然后用刚才生成的问题向量，检索知识库中与之最相似的文档片段

当检索的时候，常见的检索参数包括：

Tok-K ：检索最相关的K条记录

相似度阈值：控制检索到内容的相关性

最后输出的结果往往是K条知识片段

构建上下文：这一阶段需要组织提示词（Prompt），让LLM更好地理解背景信息

这一部分包括：

系统提示词（System Prompt），系统提示词可以有效地设定模型角色、控制回答风格、防止幻觉

构造最终输入（Final Prompt），一般会结合以上内容，按照如下格式进行组织

【背景资料】

蓝牙连接问题通常可以通过重启设备和重新配对解决。

如果手表固件版本较旧，请更新到最新版本以兼容蓝牙。

某些环境下，如电磁干扰，也会导致连接失败。

【用户问题】
我的智能手表出现蓝牙连接问题，怎么办？

【回答要求】
请结合以上资料，用简洁明了的方式回答用户的问题。如果答案无法直接从资料中找到，请礼貌告知用户。

调用LLM：将构造好的Prompt提交给LLM，模型读取检索到的内容和问题，组织自然、连贯、准确的回答
返回最终回答给用户

7.2 实现基本RAG流程

7.2.1 增加依赖

 <dependency>
     <groupId>org.springframework.ai</groupId>
     <artifactId>spring-ai-advisors-vector-store</artifactId>
</dependency>

7.2.2 向量模型配置

@Configuration
public class RagConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder.defaultSystem("你将作为一名Java开发语言的专家，对于用户的使用需求作出解答")
                .build();
    }

    @Bean
    public VectorStore vectorStore(EmbeddingModel embeddingModel) {
        SimpleVectorStore simpleVectorStore = SimpleVectorStore.builder(embeddingModel)
                .build();

        // 生成一个说明的文档
        String filePath = "src/main/resources/rag/product-description.txt";
        TextReader textReader = new TextReader(filePath);
        textReader.getCustomMetadata().put("filePath", filePath);
        List<Document> documents = textReader.get();
        // 文本切分段落
        TokenTextSplitter splitter = new TokenTextSplitter(1200,
                350, 5,
                100, true);
        splitter.apply(documents);
        simpleVectorStore.add(documents);
        return simpleVectorStore;
    }

}

7.2.3 测试

@Slf4j
@SpringBootTest
public class EmbeddingTest {

    @Resource
    private ChatClient chatClient;

    @Resource
    private VectorStore vectorStore;

    @Test
    public void testEmbedding() {
        String response = chatClient.prompt()
                .system("你将作为一名Java开发语言的专家，对于用户的使用需求作出解答")
                .user("Java")
                .advisors(new QuestionAnswerAdvisor(vectorStore))
                .call()
                .content();

        log.info("响应: {}", response);
    }

}