Pgvector

default

安装 pgvector

docker-compose 安装 pgvector

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Use postgres/example user/password credentials
version: '3.0'

services:
  pgvector:
    image: pgvector/pgvector:pg16
    restart: always
    environment:
      POSTGRES_PASSWORD: 123456
    volumes:
      - ./data:/var/lib/postgresql/data
    ports:
      - 5432:5432
  

初始化

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 初始化插件
CREATE EXTENSION vector;

# 新建表
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));

# 插入数据
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');

# 查询
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

文本向量化

word2vec

Java 实现:NLPchina/Word2VEC_java: word2vec java版本的一个实现 (github.com)

训练数据集:Chinese-Word-Vectors/README_zh.md at master · Embedding/Chinese-Word-Vectors (github.com)

Gear(夕照)的博客。记录开发、生活,以及一些不足为道的思考……