GitHub Portfolio Search

Semantic search engine for personal GitHub repositories using embeddings and TF-IDF.

https://github.com/davidbmar/github-portfolio-search  ·  public  ·  shipped

What it is

A tool that indexes GitHub repositories, extracts README content, generates embeddings, and provides a semantic search interface via CLI, REST API, and static web UI. It enables developers to find reusable code patterns across their own portfolio.

Features

Quickstart

git clone https://github.com/davidbmar/github-portfolio-search.git
cd github-portfolio-search
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env
# Edit .env to add GITHUB_TOKEN
ghps index <your_github_username>
ghps export
cd web && python3 -m http.server 8000

Architecture

flowchart TD
    User[User Browser]
    WebUI[Static Web UI]
    API[FastAPI Server]
    Indexer[Indexing Pipeline]
    DB[(SQLite-vec Index)]
    GitHub[GitHub API]
    Lambda[AWS Lambda OAuth]
    S3[S3 Bucket]
    User --> WebUI
    WebUI --> API
    WebUI --> S3
    API --> DB
    Indexer --> GitHub
    Indexer --> DB
    Lambda --> GitHub
    User --> Lambda

How it's built

Python backend using sentence-transformers for embeddings and SQLite-vec for vector storage. FastAPI serves the REST API. The web UI is a static HTML/JS application consuming JSON data exported by the CLI. AWS Lambda handles OAuth and logging. GitHub Actions automate weekly reindexing.

How it runs

sequenceDiagram
    participant User
    participant WebUI
    participant API
    participant DB
    participant GitHub
    User->>WebUI: Enter search query
    WebUI->>API: GET /search?q=query
    API->>DB: Query embeddings and TF-IDF
    DB-->>API: Return ranked results
    API-->>WebUI: JSON results
    WebUI-->>User: Display repos and snippets
    User->>WebUI: Click repo detail
    WebUI->>API: GET /repo/slug
    API->>DB: Fetch repo record
    DB-->>API: Return full record
    API-->>WebUI: JSON record
    WebUI-->>User: Show README and details

How to apply & reuse

Clone the repo, install dependencies, configure a GitHub token, index your repos, and serve the web UI or use the CLI/MCP server to search your codebase.

At a glance

CapabilitiesSemantic SearchVector IndexingREST APIStatic Site GenerationOAuth AuthenticationMCP Server
ComponentsCLI ToolIndexing PipelineFastAPI ServerStatic Web UIAWS Lambda FunctionsGitHub Actions Workflow
TechPythonHTMLJavaScriptFastAPISQLite-vecsentence-transformersAWS LambdaS3CloudFront
Depends onGitHub Personal Access TokenPython 3.9+AWS CLI
Integrates withGitHub APIGoogle OAuthClaude Code MCPTelegram Bot API
PatternsSemantic SearchStatic Site GenerationServerless BackendProgressive DisclosureVector Database
Reuse tagssearch-engineportfolio-toolcode-discoveryai-assistant-integration

Repo hygiene

✓ all on main — nothing unmerged.