phi-thread

CONNECT, DON'T CREATE
Every answer already exists somewhere — Stack Overflow, Reddit, HN, GitHub, blogs.
You don't need another knowledge base. You need a router.
Thread math = every answer has a permanent coordinate. We link to the EXACT answer, not the page.

01 What happens when you type a query

phi-thread "docker container keeps restarting" → here's every step behind the scenes

?

You ask a question

phi-thread "docker container keeps restarting"

The query string is parsed. Platforms are set (default: so, reddit, hn, github, google). Keywords are extracted: {"docker", "container", "restarting"} — stopwords like "keeps" are filtered out.

$

Layer 1 — Cache check (instant)

Query is hashed: sha256("docker container keeps restarting")[:16]a3f8b2c1d4e5f6a7

Checks ~/.phi-thread/cache/a3f8b2c1d4e5f6a7.json
If it exists and is < 24 hours old → return cached results immediately. Done in < 1ms.

Cache MISS? → Move to Layer 2...

Layer 2 — Keyword index (the self-building KB)

Checks ~/.phi-thread/index.json — a persistent keyword → answer-coordinate map that grows with every query you ever make.

Looks up each keyword: index["docker"], index["container"], index["restarting"]
Each returns a list of previously-seen answer coordinates with scores. Answers matching more keywords rank higher.

These indexed results are kept as backup candidates. Live search still runs to find fresh answers.

Layer 3 — Live search across 5 platforms

Hits real APIs. All use urllib (zero dependencies, stdlib only):


📚
Stack Overflow
api.stackexchange.com/2.3/search/advanced
/questions/{id} → /a/{answer_id}
💬
Reddit
reddit.com/search.json
/comments/{post}/{slug}/{comment}
🔶
Hacker News
hn.algolia.com/api/v1/search
/item?id={comment_id}
🐙
GitHub
api.github.com/search/issues
/{org}/{repo}/issues/{n}
🌐
DuckDuckGo
html.duckduckgo.com/html/
catches Twitter, Medium, Dev.to, blogs

Thread Math — Deep-link into the answer tree

This is the core insight. We don't just link to the question page. We follow the thread tree and link to the exact answer.


depth 0
Question: "Docker container keeps restarting"
↓ fetch accepted answer via SO API
depth 1 ✓
Accepted answer by user (the actual fix)
stackoverflow.com/a/38716397
↓ could go deeper
depth 2
Comment on answer (clarification)
...#comment-1234567

Stack Overflow: API returns accepted_answer_id → we call /questions/{ids}/answers to get the answer body → link to /a/{answer_id}

Reddit: For each post, we fetch {permalink}.json?limit=1&sort=top → extract top comment → link to /comments/{post}/{slug}/{comment_id}

HN: Algolia returns both stories AND comments. If it's a comment, the objectID IS the direct link: /item?id={comment_id}

This is thread math: every answer has a deterministic coordinate in (platform, thread, answer) space.

Rank — Platform score × Title relevance

Each platform scores answers differently, then a relevance multiplier ensures matching titles beat popular-but-unrelated results:


Stack Overflow
highest base score
5.0 + answers×1.5 + views/5000
Reddit
engagement
votes×0.01 + comments×0.1
Hacker News
points + disc.
points×0.05 + comments×0.1
GitHub
reactions
comments×0.2 + reactions×0.1
Web (DDG)
flat 2.0
base score = 2.0

Then: final_score = platform_score × title_relevance
Where title_relevance = |query_keywords ∩ title_keywords| / |query_keywords|

A perfectly matching SO title with 5 upvotes beats a Reddit post with 500 upvotes but irrelevant title (gets 0.1× penalty).

💾

Save — Cache + Index (the KB grows)

Cache: Full results saved to ~/.phi-thread/cache/{hash}.json with 24hr TTL. Same exact query = instant next time.

Index: Every answer gets its keywords extracted. Each keyword maps to the answer's coordinate + score in ~/.phi-thread/index.json.

This means: you search "docker container restarting" today. Tomorrow someone searches "docker restart loop" — the keyword index finds the answers from yesterday, even though it's a different query. The KB builds itself.

Return — Pretty-printed answers with deep-links

$ phi-thread "docker container keeps restarting" -n 5

phi-thread | docker container keeps restarting
12.3s | 5 answers

1. Docker container keeps restarting [accepted answer]
[SO] docker
"Note: following issue 11008 and PR 15348, you would avoid the issue with: sudo docker..."
https://stackoverflow.com/a/38716397 ← exact answer, not question page

2. Mysql docker container keeps restarting [accepted answer]
[SO] mysql, docker, docker-compose
https://stackoverflow.com/a/66910240

3. PSA: Check Your Docker Memory Usage [top answer]
[Reddit] r/selfhosted | top comment
https://reddit.com/.../comments/.../mkj0mru ← exact comment

4. 10 Best Practices for Docker Containers
[Medium] via DuckDuckGo
https://medium.com/... ← caught by web search

02 Three-layer search: how the KB builds itself

First query = slow (live search). Same query = instant (cache). Similar query = fast (index).

Layer 1: Exact Cache

< 1ms

SHA256 hash of normalized query → JSON file.
~/.phi-thread/cache/{hash}.json
TTL: 24 hours. Same exact question = instant return.

Layer 2: Keyword Index

~5ms

Persistent map: keyword → [answer coordinates].
~/.phi-thread/index.json
Lives forever. "docker restart" finds answers from "docker container restarting" because they share keywords. This is the self-building KB.

Layer 3: Live Search

5-15s

Hits real APIs: SO, Reddit, HN, GitHub, DuckDuckGo.
Fetches answers, follows thread trees for deep-links.
Results are cached (Layer 1) and indexed (Layer 2) for the future.

03 Thread coordinates — the math

Every answer on every platform has a permanent, deterministic address.

Platform Coordinate format What phi-thread does
Stack Overflow /a/{answer_id} Fetches accepted_answer_id from question → builds deep-link to exact answer
Reddit /comments/{post}/{slug}/{comment_id} Fetches post JSON with ?sort=top&limit=1 → extracts top comment ID → builds permalink
Hacker News /item?id={objectID} Algolia returns stories AND comments. Comments have story_id — the objectID IS the deep-link
GitHub /{org}/{repo}/issues/{n} GitHub search API → direct issue URL
Slack /archives/{ch}/p{ts}?thread_ts={parent} The original insight — message timestamp IS the ID. This is where it all started.
Twitter/X /{user}/status/{snowflake} Snowflake IDs embed timestamp: ts = (id >> 22) + 1288834974657. Caught via DuckDuckGo.
Discord /channels/{server}/{ch}/{snowflake} Same snowflake math: ts = (id >> 22) + 1420070400000

The spacetime analogy

An answer is an event in information spacetime. Thread coordinates are its (platform, thread, time) position.

General RelativityEvent = point in spacetime (x, y, z, t)
Thread MathAnswer = point in (platform, thread_id, timestamp)
GRLight cone = causal future/past
ThreadsReply timestamp must be > parent timestamp
GRGravity curves spacetime → affects paths
ThreadsTrust/score curves routing → affects which answer you see
GRCoordinate system = reference frame
ThreadsPlatform = reference frame (each has its own ID scheme)

04 Architecture — zero dependencies

Entire thing is stdlib Python. urllib + json. Installs in 1 second.

phi_thread/
├── __init__.py ← version (0.2.0)
├── search.py ← core engine (689 lines)
│ ├── Answer ← dataclass: platform, title, url, score, coordinate
│ └── ThreadSearch ← search engine + cache + index
│ ├── search() ← 3-layer: cache → index → live
│ ├── _search_stackoverflow() ← SO API + fetch accepted answers
│ ├── _fetch_so_answers() ← DEEP: question → accepted answer body
│ ├── _search_reddit() ← Reddit JSON + fetch top comment
│ ├── _fetch_reddit_top_comment()← DEEP: post → top comment
│ ├── _search_hn() ← Algolia (stories + comments)
│ ├── _search_github() ← GitHub Issues search
│ ├── _search_google() ← DuckDuckGo HTML (catches everything else)
│ ├── _index_answer() ← INDEX: add to keyword KB
│ └── _search_index() ← INDEX: search keyword KB
└── cli.py ← terminal UI with colors + depth markers

~/.phi-thread/ ← persistent state
├── cache/ ← exact query cache (24hr TTL)
│ └── {sha256}.json
└── index.json ← keyword → coordinates (forever, grows with use)