GhostLM chat (v0.9)

An 81M-parameter cybersecurity language model trained from scratch in PyTorch. The pretrain corpus is 273M tokens (PRIMUS-Seed, PRIMUS-FineWeb, NVD CVEs, MITRE ATT&CK, CWE, CAPEC, OWASP, IETF RFCs, Exploit-DB, CTFtime, arXiv cs.CR, plus a fact-dense Q&A set). Architecture: 6 layers · d_model 768 · 12 heads, with RoPE + SwiGLU + RMSNorm.

Chat-tuned with supervised fine-tuning on the chat-v3 SFT recipe. The v0.9 chat checkpoint is the bench winner of the ghost-small line:

28.9% on CTIBench MCQ full test split (n=2500, 2-permutation debiased text-scoring)
59.2% on the in-repo CTF MCQ eval (n=30)
39.3% on SecQA (n=210, external)

Honest expectations. v0.9 wins on multiple-choice, but free-form fact recall is at the floor of the entire ghost-small rung (1/50 on a hand-written 50-question fact-recall set, the one "hit" arguably spurious). The model has learned the register of cybersec writing (sentence shape, technique vocabulary, OWASP-style cadence) but not the facts in any retrievable form. Treat outputs as register-shaped fiction: identity, OOD-refusal, and chat shape work; specific CVE numbers, CVSS scores, dates, and technique IDs are unreliable. Always verify against authoritative sources.

The next rung is ghost-base (~360M, SmolLM2-360M shape), gated on rented GPU compute, where literature reports factual recall on cybersec MCQ starting to emerge. Spec at docs/ghost_base_spec.md.

Retrieval-augmented mode: OFF. RAG could not load at startup (`RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-69ff60ae-610ffabd412e3deb02ed04ed;fe717579-80db-41fc-a7f1-4da959b8e72c)

Entry Not Found for url: https://huggingface.co/Ghostgim/GhostLM-v0.9-experimental/resolve/main/rag/index.npy.`). Generation is bare; expect hallucination on factual questions.

Loaded checkpoint: /root/.cache/huggingface/hub/models--Ghostgim--GhostLM-v0.9-experimental/snapshots/e56a7ed6b4970f6dd0a4422cd908e3ac9406a67d/best_model.pt