Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from images. We pass the prompt, along with the extracted features, to LLM, allowing for advanced image dataset queries.
MIT License
Our idea is to combine the power of computer vision model and LLMs. We use YOLO, CLIP and DINOv2 to extract high-level features from images. Then we use natural language as an interface for the user. We pass the prompt, along with the extracted features, to LLM, allowing for advanced image dataset queries.
Keep in mind that the project is still under construction. We will successively publish updates and new features. Follow me on Twitter to stay up to date.
You have a project idea and would like to use our data? No problem! Our data, just like codebase is open-source. Take a peek at our Roboflow project to access images and annotations.
Do you want to help me?