How to turn diagrams and images into text to include in LLM datasets?

We’re looking for input on how to best cope with images and diagrams (.pdf, .jpg, .svg, etc.) and get them input into the dataset of custom LLMs.

We’ve got a working strategy going creating our small open source LLM model - the first one will be an “expert” on all things Neon AI. We started discussing the challenge of getting information from diagrams and images into the data set, and realized that if anyone has good tools for that, @timonvanhasselt and others working on screen readers and systems for the blind or partially sighted would know what the best available open source resources might be.

Please share any suggestions you have, we look forward to sharing our process and results with the community!