**GPT Vision Models: The Missing Ingredient in My Master's Project.** *Exploring how SOTA Transformer Vision models compare to pre-Transformer Vision architectures*
![[Pasted image 20240110220119.png]]
In 2022, for my [[🥕 Master's Project (PK1.0)]], I created a computer vision fridge to help people use more food they already have and waste less. Over 9 months, this is involved:
- Taking 2740 photos of 12 food items in 16 sub-classes and labelling all of them with bounding boxes (which took 2 days).
- Fine-tuning EfficientDet0-Lite on my custom dataset with corresponding metadata
- Integrating a database, web app, recipe recommendations and expiry date tracking.
It was a lot of work and I completed the project in June 2022.
At the time, no existing model **_generalised_** well enough to recognise numerous food items consistently, so I made my own.
However, there was always one major design flaw: **scalability**. Pocket Kitchen needs to recognise apples now too? Ok, take 200+ photos, draw labelled bounding boxes and then retrain the model before compiling it for TPU. Not ideal.
Then in March 2023, OpenAI released the image above demoing GTP4 Vision!
**Nearly my entire Master’s project one prompt!** Pocket Kitchen just became genuinely viable and way easier to implement, so I began to design and build Pocket Kitchen v2.0...
>[!note] Rabbit r1 Keynote Demo
>On Jan 9th 2024, Rabbit released a [keynote](https://youtu.be/22wlLy7hKP4?si=SGCBbDsm7JUVLC1-&t=1064) to demo their AI assistant device, r1. From the video, it's clear smart food usage is still a hot topic (and yet to be thoroughly solved!):
>
>![[Pasted image 20240214174932.png]]
>
>It bothers me how unrealistic this demonstration is for a 'real-world' use-case; for one, last time I checked people don't store eggs loose on the shelf in a single file row. Good design engineering means designing for users! ðŸ˜
### Building v2.0
I’ll summarise the differences between the old and new Pocket Kitchen here:
**Old Pocket Kitchen**
- Object detection with TensorFlow Object Detection on Coral TPU
- Birds eye view facilitates tracking bounding boxes to identify whether food items are coming in / out.
**New PK!**
- GPT4V
- No box tracking, just inference on a single image of the fridge.
#### PK2.0 Plan
Here is a flow chart of my plan:
![[Pasted image 20240124170345.png]]
For cash and compute efficiency, I will only make an OpenAI API call when the user asks what is in the fridge. Also, I don't want to lose time building / hosting / connecting to a UI, so I'm using Telegram for my MVP.
#### Initial GPT-V Tests
```
"You are an expert in identifying items of food from images. List the food items in this image. Give responses only in the format 'Food item: [food item]'"
```
Looking good...
![[IMG_5995.png|350]]
![[IMG_5996.png |350]]
#### Building the MVP
The minimum viable product is complete - check out the images below!
![[IMG_6147.png|350]]
![[Pasted image 20240122170317.png | 450]]
#### Getting a Better View
Mounting above the fridge doesn't give an adequate view in, so I decided to mount the camera to the fridge door instead. While I could have intricately designed something to position the camera at the perfect angle, I've opted to taping a £3 tripod to the fridge door for now!
I also switched to GPT4o to get results faster.
![[IMG_6408.png]]
### Deciding on a Form Factor
Having previously made an app for Pocket Kitchen, I'm keen to explore new form factors to deliver the menu of fridge contents. The most recent delivery format I've tried is email - I think there are enough apps in the world.
It would be quite good to receive an email each day before you leave work, so you know what you have *before* hitting the shops on the way home!
![[IMG_BC79D99CAC39-1.jpeg | 400]]
### Next Steps
I'm looking to decide on a form factor. Then, I'll fully automate the end-to-end system, from taking photographs on opening the fridge door to sending the contents to the user!
With thanks to my supervisor, Dr David Boyle, for his guidance and support on v1.0 : [[🥕 Master's Project (PK1.0)]]
*I'm still working on this project - check in soon / follow me on Twitter @mireyburn for updates.*
---
[![[Pasted image 20240110213631.png | 100]]](<https://github.com/myPocketKitchen/PK2.0>)