Semantic search engine for personal GitHub repositories using embeddings and TF-IDF.
https://github.com/davidbmar/github-portfolio-search · public · shipped
A tool that indexes GitHub repositories, extracts README content, generates embeddings, and provides a semantic search interface via CLI, REST API, and static web UI. It enables developers to find reusable code patterns across their own portfolio.
git clone https://github.com/davidbmar/github-portfolio-search.git cd github-portfolio-search python3 -m venv .venv source .venv/bin/activate pip install -e ".[dev]" cp .env.example .env # Edit .env to add GITHUB_TOKEN ghps index <your_github_username> ghps export cd web && python3 -m http.server 8000
flowchart TD
User[User Browser]
WebUI[Static Web UI]
API[FastAPI Server]
Indexer[Indexing Pipeline]
DB[(SQLite-vec Index)]
GitHub[GitHub API]
Lambda[AWS Lambda OAuth]
S3[S3 Bucket]
User --> WebUI
WebUI --> API
WebUI --> S3
API --> DB
Indexer --> GitHub
Indexer --> DB
Lambda --> GitHub
User --> Lambda
Python backend using sentence-transformers for embeddings and SQLite-vec for vector storage. FastAPI serves the REST API. The web UI is a static HTML/JS application consuming JSON data exported by the CLI. AWS Lambda handles OAuth and logging. GitHub Actions automate weekly reindexing.
sequenceDiagram
participant User
participant WebUI
participant API
participant DB
participant GitHub
User->>WebUI: Enter search query
WebUI->>API: GET /search?q=query
API->>DB: Query embeddings and TF-IDF
DB-->>API: Return ranked results
API-->>WebUI: JSON results
WebUI-->>User: Display repos and snippets
User->>WebUI: Click repo detail
WebUI->>API: GET /repo/slug
API->>DB: Fetch repo record
DB-->>API: Return full record
API-->>WebUI: JSON record
WebUI-->>User: Show README and details
Clone the repo, install dependencies, configure a GitHub token, index your repos, and serve the web UI or use the CLI/MCP server to search your codebase.
✓ all on main — nothing unmerged.