(Work in progress)
Note: I am not very good at English. This README.md was proofread by ChatGPT.
Showrooms can be costly, but their ROI is unclear. Therefore, companies generally aim to achieve semi-automation using Multimodal AI.
The Virtual Showroom is highly cost-effective. It only requires a 240-degree panoramic screen to provide VR experiences without the need for headsets. I aspire to create a "real" virtual showroom at work someday, but for now, this project is focused on developing a "virtual" virtual showroom using Unity, either as an AR app or a console app, as part of my hobby project.
https://github.com/user-attachments/assets/a55178b7-909f-4702-ad78-8f9438cf234d
Although the video showcases Japanese chat, ChatGPT is multilingual.
I visited NISSAN GALLERY at Nissan global headquarters in Yokohama in May 2024. This was for a driving experience in harsh environments.
My original design for a virtual showroom.
The 240-degree panoramic screen in this project also supports perspective drawing, similar to the driving experience at the NISSAN GALLERY mentioned above.
The Flask-based API server will run on a PC or Mac. It can also be deployed on a virtual machine in the cloud.
This project utilizes OpenAI's "gpt-4o-mini" with Multimodal RAG (text and image).
(Work in progress)
This project currently has two secenes:
More scenes will be added in the future.
=> MODELS.md
=> SHOWROOM.md
The API server runs on Raspberry Pi.
Run the "app.py" that is in my another repo: Compact RAG
I was impressed by the demo video for Figure 01. If such a robot were to exist, human guides in showrooms might no longer be necessary.
I modified the robot included in Unity's Starter Assets to experiment with what Generative AI can achieve.
A few years ago, I developed several image recognition and object detection apps using TensorFlow.
This time, Im utilizing OpenAI's multimodal AI for image recognition and object detection.
Image Recognition test (gpt-4o-mini)
After trying various experiments, I found that an image size of at least 512px x 512px seems to work best.
Received: {"answer":"I see an image of a cat grooming itself.
The cat appears to be licking its paw while resting on a red surface.",
"query":"What can you see in this image?"}
Object Recognition test (gpt-4o-mini)
(Coming soon)