Categories
Tags
671 words
3 minutes
RAG Api
Rag API实现
主要对于上一节Test内容进行封装
主要代码
package cn.shuchang.sc.dev.tech.trigger.http;
@Slf4j
@RestController
@CrossOrigin("*")
@RequestMapping("/api/v1/rag/")
public class RAGController implements IRAGService {
@Resource
private OllamaChatClient ollamaChatClient;
@Resource
private TokenTextSplitter tokenTextSplitter;
@Resource
private SimpleVectorStore simpleVectorStore;
@Resource
private PgVectorStore pgVectorStore;
@Resource
private RedissonClient redissonClient;
@RequestMapping(value = "query_rag_tag_list", method = RequestMethod.GET)
@Override
public Response<List<String>> queryRagTagList() {
RList<String> elements = redissonClient.getList("ragTag");
return Response.<List<String>>builder()
.code("0000")
.info("调用成功")
.data(elements)
.build();
}
@RequestMapping(value = "file/upload", method = RequestMethod.POST, headers = "content-type=multipart/form-data")
@Override
public Response<String> uploadFile(@RequestParam String ragTag,@RequestParam("file") List<MultipartFile> files) {
log.info("Uploading files to RAG: {}", ragTag);
for (MultipartFile file : files) {
TikaDocumentReader documentReader = new TikaDocumentReader(file.getResource());
List<Document> documents = documentReader.get();
List<Document> documentSplitterList = tokenTextSplitter.apply(documents);
documents.forEach(doc -> doc.getMetadata().put("knowledge", ragTag));
documentSplitterList.forEach(doc -> doc.getMetadata().put("knowledge", ragTag));
pgVectorStore.accept(documentSplitterList);
RList<Object> elements = redissonClient.getList("ragTag");
if (!elements.contains(ragTag)) {
elements.add(ragTag);
}
}
log.info("Uploading files to RAG finished: {}", ragTag);
return Response
.<String>builder()
.code("0000")
.info("调用成功")
.build();
}
}
Git仓库上传到向量库功能实现
对知识库的解析进行扩展,增加Git仓库解析。用户填写Git仓库地址和账密,即可拉取代码并上传到知识库。
主要代码
在上述代码的基础上添加
@RequestMapping(value = "analyze_git_repository", method = RequestMethod.POST)
@Override
public Response<String> analyzeGitRepository(@RequestParam String repoUrl, @RequestParam String userName, @RequestParam String token) throws Exception, GitAPIException {
String localPath = "./git-cloned-repo";
String repoProjectName = extractProjectName(repoUrl);
log.info("克隆路径:{}", new File(localPath).getAbsolutePath());
FileUtils.deleteDirectory(new File(localPath));
Git git = Git.cloneRepository()
.setURI(repoUrl)
.setDirectory(new File(localPath))
.setCredentialsProvider(new UsernamePasswordCredentialsProvider(userName, token))
.call();
Files.walkFileTree(Paths.get(localPath), new SimpleFileVisitor<>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
log.info("{} 遍历解析路径,上传知识库:{}", repoProjectName, file.getFileName());
try {
TikaDocumentReader reader = new TikaDocumentReader(new PathResource(file));
List<Document> documents = reader.get();
List<Document> documentSplitterList = tokenTextSplitter.apply(documents);
documents.forEach(doc -> doc.getMetadata().put("knowledge", repoProjectName));
documentSplitterList.forEach(doc -> doc.getMetadata().put("knowledge", repoProjectName));
pgVectorStore.accept(documentSplitterList);
} catch (Exception e) {
log.error("遍历解析路径,上传知识库失败:{}", file.getFileName());
}
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
log.info("Failed to access file: {} - {}", file.toString(), exc.getMessage());
return FileVisitResult.CONTINUE;
}
});
FileUtils.deleteDirectory(new File(localPath));
RList<String> elements = redissonClient.getList("ragTag");
if (!elements.contains(repoProjectName)) {
elements.add(repoProjectName);
}
git.close();
log.info("遍历解析路径,上传完成:{}", repoUrl);
return Response.<String>builder().code("0000").info("调用成功").build();
}
private String extractProjectName(String repoUrl) {
String[] parts = repoUrl.split("/");
String projectNameWithGit = parts[parts.length - 1];
return projectNameWithGit.replace(".git", "");
}
扩展OpenAI模型对接
目前我们能使用的模型只有DeepSeek,并且embedding model也是单一的。
为了增加扩展性,我们通过Spring AI来对接ChatGPT并且使用它提供的embedding model进行向量化。
完成这一步后,我们就有两个模型的选择了
主要流程
首先,我们对于application-dev.yaml进行修改,加入以下内容
ai:
openai:
base-url:<your api url>
api-key:<your api key>
embedding-mode: text-embedding-ada-002
对于OllamaConfig.java文件,我们需要加入OpenAIApi初始化以及对于VectorStroe进行更改从而让他们可以选择使用DeepSeek或者ChatGPT
@Bean
public OpenAiApi openAiApi(@Value("${spring.ai.open-api.url}") String baseUrl, @Value("${spring.ai.openai.api-key") String apiKey){
return new OpenAiApi(baseUrl,apiKey);
}
@Bean
public SimpleVectorStore vectorStore(@Value("${spring.ai.rag.embed}") String model, OllamaApi ollamaApi, OpenAiApi openAiApi) {
if ("nomic-embed-text".equalsIgnoreCase(model)) {
OllamaEmbeddingClient embeddingClient = new OllamaEmbeddingClient(ollamaApi);
embeddingClient.withDefaultOptions(OllamaOptions.create().withModel("nomic-embed-text"));
return new SimpleVectorStore(embeddingClient);
} else {
OpenAiEmbeddingClient embeddingClient = new OpenAiEmbeddingClient(openAiApi);
return new SimpleVectorStore(embeddingClient);
}
}
@Bean
public PgVectorStore pgVectorStore(@Value("${spring.ai.rag.embed}") String model, OllamaApi ollamaApi, OpenAiApi openAiApi, JdbcTemplate jdbcTemplate) {
if ("nomic-embed-text".equalsIgnoreCase(model)) {
OllamaEmbeddingClient embeddingClient = new OllamaEmbeddingClient(ollamaApi);
embeddingClient.withDefaultOptions(OllamaOptions.create().withModel("nomic-embed-text"));
return new PgVectorStore(jdbcTemplate, embeddingClient);
} else {
OpenAiEmbeddingClient embeddingClient = new OpenAiEmbeddingClient(openAiApi);
return new PgVectorStore(jdbcTemplate, embeddingClient);
}
}