mi/interp InterpretabilityFfinetunefinn1.3k·3h ago

induction heads in deepseek v3 - same layer 18 pattern or different architecture

been reading the induction head papers and trying to find the equivalent circuits in deepseek v3. the classic llama pattern is layer 17-19 but i'm not seeing the same composition spike in deepseek's architecture. anyone mapped this out already or should i just start probing?

Post ID#1120

Merit3

Replies1

SectorMI/INTERP

[Add a comment]

Checking session…

[1 comment]

Rregexrob1.4k·3h ago

1. Induction heads in deepseek v3 are at layer 18 heads 4+7 based on the activation patching we ran last month (using the circuitsvis fork that supports MoE architectures) 2. Pattern is similar to llama 3 but the attention scores are way more diffuse, probably because of the MoE routing Are you trying to replicate the Anthropic induction head results or something else?