COE Overview

Weekly aggregator for security and operational errors from Jira, Wiz, CrowdStrike, and Vibranium.

https://github.com/davidbmar/COE-overview  ·  public  ·  shipped

What it is

A Python-based ETL pipeline that ingests incident data from multiple security and operations platforms, normalizes it into a PostgreSQL database, and prepares it for weekly reporting. It handles authentication, data fetching, and schema management via Alembic.

Features

Quickstart

pip install -e .
export DATABASE_URL=postgresql+asyncpg://user:pass@localhost/db
export JIRA_API_TOKEN=your_token
python -m coe

Architecture

flowchart TD
    User[User] --> Config[Configuration]
    Config --> Pipeline[Ingestion Pipeline]
    Pipeline --> Jira[Jira API]
    Pipeline --> Wiz[Wiz API]
    Pipeline --> CrowdStrike[CrowdStrike API]
    Pipeline --> Vibranium[Vibranium API]
    Jira --> Pipeline
    Wiz --> Pipeline
    CrowdStrike --> Pipeline
    Vibranium --> Pipeline
    Pipeline --> Database[PostgreSQL Database]
    Pipeline --> Output[Run ID File]

How it's built

Built with Python using SQLAlchemy for async database interactions, Pydantic for configuration management, and Alembic for database migrations. It uses structlog for logging and is designed to run as a Kubernetes CronJob.

How it runs

sequenceDiagram
    participant Main as Main Entry
    participant Config as Settings Loader
    participant Pipeline as Ingestion Pipeline
    participant Sources as External APIs
    participant DB as PostgreSQL
    participant Output as Run ID Writer
    Main->>Config: Load Settings
    Config-->>Main: Return Settings
    Main->>Pipeline: Initialize Session
    Pipeline->>Sources: Fetch Events
    Sources-->>Pipeline: Return Raw Data
    Pipeline->>DB: Normalize and Store
    DB-->>Pipeline: Confirm Write
    Pipeline-->>Main: Return Result
    Main->>Output: Write Run ID

How to apply & reuse

Configure environment variables for source APIs (Jira, Wiz, CrowdStrike, Vibranium) and database connection. Run the CLI entrypoint to execute the ingestion pipeline, which writes a run ID for downstream rendering processes.

At a glance

CapabilitiesData IngestionSchema ManagementAsync ProcessingConfiguration ValidationLogging
ComponentsCLI EntrypointConfiguration ManagerIngestion PipelineDatabase ModelsAlembic Migrations
TechPythonSQLAlchemyPydanticAlembicAsyncIOStructlog
Depends onPostgreSQLJira APIWiz APICrowdStrike APIVibranium API
Integrates withKubernetes CronJobsGoogle Docs Rendering
PatternsETL PipelineRepository PatternDependency InjectionAsync/Await
Reuse tagssecurity-opsdata-aggregationpython-etlkubernetes-cronjob

⚠ Needs attention