💡 Main Idea

We have PDF of graphs, tiles ( dashboard elements ) which keeps changing
once in 24 hours (200 - 300 images). Have to figure out a solution so that we can do similarity 
matching `**text-to-image**` matching and then pass that along with query so that 
**VLM** hopefully will be able to understand and able to answer the query.

🤖 Models found

Pros cons comparision

	qwen + jina-clip	Colpali by vidore.	Visualized-BGE by BAAI	ImageBind by MetaAI
Memory requirement	223M model can run over CPU but old just suitable for smaller text matching tasks ( similar to CLIP arch)	3B model requires > 40GB ram in case of CPU	<4 GB ram required about 300mb model	10GB ram required
Performance	equivalent performance as of imagebind.	Almost perfect	slightly worse sometimes works sometimes doesn’t	working most of times compared to BGE.

🧑‍💻 Codes for below approaches.

https://culinda-my.sharepoint.com/:u:/p/somesh/EXIRYlSZ8f9Is2zg33X2RLcBB90QG1IX35drj54EGEGaKw?e=5c7aI3

💡 Main Idea

🤖 Models found

Pros cons comparision

🧑‍💻 Codes for below approaches.

✅ Solution approaches

🍯 Approach 1: ( `jina-clip-v1` )

💡 Main Idea

🤖 Models found

Pros cons comparision

🧑‍💻 Codes for below approaches.

✅ Solution approaches

🍯 Approach 1: ( jina-clip-v1 )

🍯 Approach 1: ( `jina-clip-v1` )