***Investigating bias in Stable Diffusion models*** The motivation for this project came from playing with the [SDXL Hugging Face Space](https://huggingface.co/spaces/google/sdxl) . I'm working on this project with [Amber Rignell](https://github.com/amberrignell). The following results from the prompt, "A portrait photo of a technical person" is a classic example of the *"garbage in, garbage out"* training data problem in generative AI. It's a reflection of the data Stable Diffusion (SD) is trained on and the CLIP embedding model too. ![[Pasted image 20240112141938.png]] ### How is bias being handled now? I'm not exactly happy with the results (in case you hadn't noticed, they all appear to be men). From using some image generation models, I believe the following methods are being used to tackle this bias: - After receiving a prompt, an **LLM specifies a demographic** for a person in the prompt. It's a bit of a plaster and while it works as a fix, it seems wild that this might be the best we can do right now. - **Diversifying training data** - an obvious one but also an enormous amount of work considering SD is trained on tens of millions of images, at least! - A quick search provides the word **monitoring** a lot, which suggests a lot of observation and maybe not a lot of doing. And so I came a hypothesis: > [!Question] Hypothesis > **By changing only the weights / activations within the model, might it be possible to improve a model's performance on bias?** > > Some prompts perform better than others on bias and diversity. Might it be possible to identify which weights are responsible, isolate these for a prompt that performs better and graft them into a generation that performs badly? It's a big question in so many ways. The following experiments mostly target gender bias as a starting point. However, if successful, I aim to generalise to more use cases in future. Let the experiments begin! ### What happens if you alter tensors output from the U-Net? >[!success] Deterministic Experiments > From this point onwards, generations within experiments are conducted with the same seed. This is the original generated image for the prompt: *"a photograph of an astronaut riding a horse"* ![[Pasted image 20240112145659.png]] I wanted to start by seeing what happens to the result if you alter some of the activations from the U-Net. #### Randomising tensor values Randomising 0.1% (about 33 out of 32768) of the tensor values output from the U-net altered the result to the image below. ![[Pasted image 20240112145952.png]] Randomising 1% (about 328 out of 32768) of the tensor values output from the U-net altered the result to the image below. Not so pretty! ![[Pasted image 20240112150157.png]] > [!note] A note on NSFW content flags > **When increasing the proportion of randomised tensor values to 10%, the result is flagged automatically as NSFW content.** > > SD must be able to detect my tampering. This gives us some clues as to how they might do some NSFW prevention - perhaps some deviation from a group of known safe tensors? - which is pretty interesting. ### Can similarities between images be identified from latent space tensor values from the U-Net? #### PCA and Correlation Matrices Printing a correlation matrix for generations provides no clear associations between the tensor values from latents in related prompt. The subjects are "driving license photo" or "mugshot photo" (MS) of a white male (M) or female (F) subject with brown, blond or ginger hair, brown eyes and aged in their 40s. ![[Pasted image 20240112151959.png]] Similar experiments and **those** with Principle Component Analysis drew the similar results. This makes sense because each individual pixel is not going to correspond very equivalently to the corresponding pixel in different images - any shift left or right at all would make the images entirely unrelated. However, I was expecting some similarity between the backgrounds for driving license and mug shot images respectively, which doesn't seem to be the case. > [!info] Sam Altman on What Now with Trevor Noah > Trevor Noah mentions a study that showed image classification models were more accurately detecting **“makeup" and "no makeup”** rather than **"female" or "male"** and that **these models generally performed worse on black subjects**. This is something to consider going forward. #### K-Means Clustering PCA is not working well for this type of data, so we need to do some multi-dimensional clustering on our latents. This is a bit of a project in itself, as it will perform better if we give it more data. > [!Question] Hypothesis > Could a classifier be trained to identify male or female subjects from the tensor embeddings? ##### Problems I'm already anticipating I'm also aware that the bias could all come from the CLIP embeddings, so if I continue down this path, I will need to run experiments on it as well. Right now, I'm still looking at latents output by the U-Net. Also, while this helps me identify what the latents are ultimately going to look like when decoded, it doesn't really help me figure out the precise features of a latent that identify this. If you've read [[🧅 The Onion]], you might remember that getting models to explain their reasoning in ML can be a major challenge. ##### Results Early experiments are kind of promising... ![[Pasted image 20240112154122.png]] ...but I'm thinking of changing tactics altogether! ### LoRA - Does it suggest my hypothesis will work? A friend just sent me this Nov 2023 paper, [Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models](https://arxiv.org/pdf/2311.12092.pdf), which shows the team developing sliders to alter features of subjects in Stable Diffusion XL image generations. These sliders are powered by LoRAs - **Lo**w **R**ank **A**daptions to image generation models that make images faster and specifically 'talented' at a certain task. LoRAs exist to generate better images of objects or in a style (cars, fingers or anime and much more) at a much faster speed. This is because some of the **weights are frozen** at values which improve the generations for the target style. **However, it doesn't exactly prove this works because there are *added* feed forward layers in LoRAs too.** I'm a bit embarrassed I didn't think of LoRAs sooner but then again, there's only so many thoughts one person can come up with. #hire-me This paper is **[open-source](https://sliders.baulab.info/)**! Thanks Gandikota et al.! I think I will come back to concept sliders a bit later - while it's useful, it won't directly address my original hypothesis. First, let's simplify the problem. ## Making a Mini Diffusion model After speaking to my supervisor at [[Machine Learning Institute]], I've made a high-level plan: 1. Make / find a mini Diffusion model. 2. Intentionally bias it with an unbalanced synthetic dataset. (This will represent the latest diffusion models but will be more manageable to work with) 3. Run statistical methods to identify the weights/activations in the model responsible for the emergence of gender in the output image. 4. Freeze / tweak the weights and rerun the model on a new prompt to see if it influences gender bias on other prompts. 5. Transfer learnings to larger models. ### Building a synthetic dataset Here's our plan for creating a dataset of images to bias our model: 1. Connect to some GPU runtime 2. Run SDXL 3. Over a list of unbalanced prompts 4. Save corresponding metadata >[!question ] Why not use SDXL Turbo? It's so much faster! > SDXL Turbo can generate an image in roughly a second but the faces look like this: > ![[Pasted image 20240116191525.png]] The [model card](https://huggingface.co/stabilityai/sdxl-turbo) also contains the following caveat: >>[!quote] Limitations >>The generated images are of a fixed resolution (512x512 pix), and the model **does not achieve perfect photorealism.** >>The model cannot render legible text. >>**Faces and people in general may not be generated properly.** >>The autoencoding part of the model is lossy. I'll connect to some beefy GPUs and use SDXL instead. :) #### Initial Tests From initial tests, things I need to consider when prompting: - Smiling / not smiling - Colour of clothing - Clothing - Background - Makeup / no make up Our tests are getting there and we'll be ready to build our full dataset soon. We're trying to remove noise by keeping the background consistent, which appears to work well. But as you can see, there are still some prompt engineering issues, we're working on: - Multiple subjects - Subject looking away from camera - Some female shots are 'magazine style' modelling instead of a simple full body shot. ![[download-4.png]] >[!warning] >I realise this is a contentious project and that this really could be affected by the eye of the beholder, which is me. I believe this project is too important to ignore, so I'm treading carefully and informing myself as I go along. Please get in touch if you have concerns about the integrity of my work. ## Next Steps - Generate the full dataset and ensure it is intentionally biased. - Create a mini diffusion model - Train model on synthetic dataset *I'm still working on this project - check in again soon for updates.* --- Rohit Gandikota, Joanna Materzyńska, Tingrui Zhou, Antonio Torralba, David Bau. "_Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models_" arXiv preprint [arXiv:2311.12092](https://arxiv.org/abs/2311.12092) (2023).