Preparing PDFs for RAGs. | About Data Science
I created a graph storage from a bunch of annual reports (and tables)

Converting PDFs to text used to be possible but never easy.
I recently created a graph data store for use in RAG. In other words, we built GraphRAG.
Graph RAGs are a good alternative to other RAG applications such as the widely used vector store supported RAGs. They bring thinking to the table. For example, with a semantic similarity search (a process used in vector stores to find information), you could ask who the CFO of XYZ, Inc. was. last year. Because last year's report of XYZ, Inc. will clearly state its CFO. But consider a question like this: Which two directors of XYZ, inc. did you go to the same school? The retrieval process will not be able to retrieve the relevant information without specifying the name of the school. But RAG graph can do it.
However, the main problem here is how we construct the graph for retrieval. I have addressed this issue in a separate post recently. If we consider another step back, how do we prepare for the year…