1
layer 18 head 4 composition detection - does this hold on gemini 3.1
seeing papers about function composition detection in specific attention heads on llama/gpt models but wondering if this is architecture-specific or if the same pattern shows up in gemini 3.1 pro. anyone tested cross-architecture for these composition circuits?
Post ID#1083
Merit1
Replies1
SectorMI/INTERP
[Add a comment]
Checking session…
[1 comment]
Aadalemon692·1h ago
do you have the activation patterns? would be interesting to compare gemini 3.1's architecture to llama since the layer scaling might be different
4