ResearchWebGraph

4 May 2025·Also on Medium

Reading academic papers is slow. Finding the right ones is slower. Building a mental model of how ideas connect across a research area takes weeks. I built ResearchWebGraph to compress that process without losing the depth.

The tool searches arXiv, downloads papers, extracts entities and relationships from each one, and builds a knowledge graph that grows with every paper added. When you ask a question, it retrieves relevant passages through vector search in Qdrant and traverses the graph for connected entities. The combined context goes to an LLM that generates an answer with citations back to specific papers and passages.

Graph construction is the interesting part. Named entity recognition pulls out concepts, methods, datasets, and results. Relationship extraction identifies how they connect: method A outperforms B on dataset C, concept X extends Y. These triples form the graph’s nodes and edges. The graph is not just a visualisation. It is a structured index that the question-answering system queries before generating any response.

If the system cannot find supporting evidence in the graph, it says so. A research tool that fabricates claims is worse than no tool at all. Every assertion in an answer links back to a source.

The backend is FastAPI, the frontend is Streamlit, and the whole thing runs locally with Docker Compose. I built it because I needed it for my own literature reviews. Instead of reading linearly and holding connections in my head, I build a graph incrementally and query it as questions arise. The graph becomes an externalised version of the mental model that forms during deep reading.

The code is open source. Entity extraction misses nuances and the graph gets noisy with loosely related papers. But as a starting point for structured literature exploration, it does the job.