Why We Needed Our Own Model
Hyrin's core premise is simple - give candidates an AI coding assistant during interviews, but have that assistant occasionally produce code with subtle, realistic defects. Whether the candidate catches these defects tells you more about their engineering judgment than any LeetCode problem ever could.
Our first approach was prompt-level injection - intercepting the AI's response and programmatically mutating variable names, introducing security flaws, or injecting performance bottlenecks. It worked, but had a fundamental problem: the defects felt grafted on, not organic. A senior developer could often tell something was off about the code's style rather than its logic. The mutations didn't flow naturally from the surrounding context.
We needed a model that could produce challenged code that was stylistically indistinguishable from normal code. A model where the defects emerged naturally from the generation process itself.
The Architecture: Challenge Layer on Code Llama
We started with Code Llama 13B as our base. Meta's Code Llama family is built by continued pre-training of Llama 2 on 500B tokens of code-heavy data (85% source code, 8% code-related natural language, 7% general NL). The architecture is a standard transformer - each of the 40 layers contains multi-head self-attention followed by a feed-forward MLP block, with RMSNorm and residual connections throughout.
Here's the high-level view of how our modified model differs from standard Code Llama. The first 32 layers are untouched - the Challenge Layer is only added to the final 8 blocks where semantic understanding is richest:
block-beta
columns 5
block:INPUT:5
columns 5
space
in["📝 Input Tokens"]
tok["Tokenizer + Positional Encoding"]
emb["Token Embeddings (5120-dim)"]
space
end
space:5
block:STANDARD:5
columns 5
space
sl["Layers 0 - 31"]
sd["Standard Transformer Blocks"]
ss["Attention → MLP → Residual"]
space
end
space:5
block:CHALLENGE:5
columns 5
space
cl["Layers 32 - 39"]
cd["Modified Transformer Blocks"]
cs["Attention → MLP + Challenge Layer → Residual"]
space
end
space:5
block:OUTPUT:5
columns 5
space
lm["LM Head (5120 → 50,280)"]
sf["Softmax"]
out["🎯 Next Token Prediction"]
space
end
INPUT --> STANDARD
STANDARD --> CHALLENGE
CHALLENGE --> OUTPUT
style INPUT fill:#F1F5F9,color:#0B1121,stroke:#CBD5E1
style STANDARD fill:#EEF1F6,color:#0B1121,stroke:#CBD5E1
style CHALLENGE fill:#EDE9FE,color:#5B3FE4,stroke:#5B3FE4
style OUTPUT fill:#F1F5F9,color:#0B1121,stroke:#CBD5E1
How the Standard MLP Works
To understand our modification, you need to understand what the MLP does in a transformer block. After the attention mechanism handles inter-token relationships (computing which tokens should attend to which), the MLP processes each token independently to enhance its representational capacity.
The standard MLP in Code Llama consists of two linear transformations with a SiLU activation:
# Standard Code Llama MLP (simplified)
class LlamaMLP(nn.Module):
def __init__(self, hidden_size=5120, intermediate_size=13824):
super().__init__()
self.gate_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
self.down_proj = nn.Linear(intermediate_size, hidden_size, bias=False)
def forward(self, x):
# Expand: 5120 -> 13824 (2.7x expansion)
gate = F.silu(self.gate_proj(x))
up = self.up_proj(x)
# Contract: 13824 -> 5120
return self.down_proj(gate * up)
The MLP expands the hidden dimension from 5120 to 13824, applies a nonlinear activation in that richer space, then compresses back. This bottleneck-expansion-compression pattern lets each token's representation learn complex feature transformations - recognizing that a token represents a variable name, a function call, a security-sensitive operation, etc.
Critically, the MLP operates on each token in isolation. Unlike attention, there is no cross-token communication here. Each position's representation is independently mapped from one feature space to another.
Introducing the Challenge Layer
Our key insight was that if the MLP transforms per-token features independently, we could add a parallel pathway that learns to deviate those features in controlled ways. We call this the Challenge Layer - a lightweight module that sits alongside the MLP in the final 8 transformer blocks:
flowchart TB
subgraph TB1["Modified Transformer Block (Layers 32-39)"]
direction TB
X_IN["Hidden States (B, T, 5120)"]
subgraph ATT["Self-Attention"]
direction LR
LN1["RMSNorm"]
MHA["Multi-Head Attention\n40 heads, dim 128"]
LN1 --> MHA
end
RES1(("+"))
subgraph PARALLEL["Post-Attention Processing"]
direction TB
LN2["RMSNorm"]
subgraph MLP_BLOCK["Standard MLP Path"]
direction LR
GATE_P["Gate Proj\n5120 → 13824"]
UP_P["Up Proj\n5120 → 13824"]
SILU["SiLU ⊙ Multiply"]
DOWN_P["Down Proj\n13824 → 5120"]
GATE_P --> SILU
UP_P --> SILU
SILU --> DOWN_P
end
subgraph CL["⚡ Challenge Layer"]
direction TB
subgraph COMPRESS["Compress"]
CL_DOWN["Down Proj\n5120 → 128"]
end
subgraph CONDITION["Condition"]
CAT_EMB["Category Embedding\n6 categories → 128-dim"]
ADD_CAT(("+"))
CL_DOWN --> ADD_CAT
CAT_EMB --> ADD_CAT
end
subgraph GATING["Gate Decision"]
GATE_NET["Gate Network\n5248 → 256 → 1\nSiLU + Sigmoid"]
DIFF["Difficulty\nScalar (0-1)"]
GATE_MUL(("×"))
GATE_NET --> GATE_MUL
DIFF --> GATE_MUL
end
subgraph EXPAND["Expand"]
CL_UP["Up Proj\n128 → 5120"]
end
ADD_CAT --> CL_UP
ADD_CAT --> GATE_NET
CL_UP --> DEV_MUL(("×"))
GATE_MUL --> DEV_MUL
end
LN2 --> MLP_BLOCK
LN2 --> CL
end
ADD_FINAL(("+"))
RES2(("+"))
X_OUT["Output Hidden States"]
X_IN --> ATT
X_IN --> RES1
ATT --> RES1
RES1 --> PARALLEL
DOWN_P --> ADD_FINAL
DEV_MUL --> ADD_FINAL
RES1 --> RES2
ADD_FINAL --> RES2
RES2 --> X_OUT
end
style TB1 fill:#F8FAFC,color:#0B1121,stroke:#CBD5E1
style ATT fill:#F1F5F9,color:#0B1121,stroke:#CBD5E1
style PARALLEL fill:#F1F5F9,color:#0B1121,stroke:#CBD5E1
style MLP_BLOCK fill:#EEF1F6,color:#0B1121,stroke:#94A3B8
style CL fill:#EDE9FE,color:#5B3FE4,stroke:#5B3FE4
style COMPRESS fill:#F5F3FF,color:#5B3FE4,stroke:#C4B5FD
style CONDITION fill:#F5F3FF,color:#5B3FE4,stroke:#C4B5FD
style GATING fill:#F5F3FF,color:#5B3FE4,stroke:#C4B5FD
style EXPAND fill:#F5F3FF,color:#5B3FE4,stroke:#C4B5FD
The purple block is our addition - everything else is standard Code Llama. The key property is that when difficulty is 0, the gate output is zero, the deviation vanishes, and the block behaves identically to the original.
class ChallengeLayer(nn.Module):
def __init__(self, hidden_size=5120, challenge_rank=128, num_categories=6):
super().__init__()
# Low-rank deviation projection
self.down = nn.Linear(hidden_size, challenge_rank, bias=False)
self.up = nn.Linear(challenge_rank, hidden_size, bias=False)
# Challenge category embeddings (naming, security, perf, etc.)
self.category_embed = nn.Embedding(num_categories, challenge_rank)
# Gating mechanism - learns WHEN to deviate
self.gate = nn.Sequential(
nn.Linear(hidden_size + challenge_rank, 256),
nn.SiLU(),
nn.Linear(256, 1),
nn.Sigmoid()
)
# Difficulty scaling
self.difficulty_scale = nn.Parameter(torch.ones(1))
def forward(self, hidden_states, challenge_category, difficulty=0.5):
# Project to low-rank challenge space
compressed = self.down(hidden_states) # (B, T, 128)
# Condition on challenge category
cat_embed = self.category_embed(challenge_category) # (B, 128)
cat_embed = cat_embed.unsqueeze(1).expand_as(compressed)
conditioned = compressed + cat_embed
# Compute gating score per token
gate_input = torch.cat([hidden_states, conditioned], dim=-1)
gate_score = self.gate(gate_input) # (B, T, 1)
# Scale by difficulty
effective_gate = gate_score * difficulty * self.difficulty_scale
# Project deviation back to hidden size
deviation = self.up(conditioned) # (B, T, 5120)
return deviation * effective_gate
The Challenge Layer computes a low-rank deviation vector that gets added to the MLP output. The key components:
Low-rank projection - We compress the 5120-dimensional hidden state to a 128-dimensional challenge space. This bottleneck forces the model to learn compact representations of "what makes code defective" rather than memorizing specific patterns.
Category conditioning - A learned embedding for each challenge category (naming, security, performance, database, logic, debugging) biases the deviation toward category-specific defects. The security embedding learns to activate near authentication checks and SQL queries. The naming embedding activates near variable declarations.
Gating mechanism - A small network that takes both the original hidden state and the challenge-conditioned representation to decide whether to deviate at this position. This is critical - the model learns that only certain tokens should be affected. You don't want to corrupt every line, just the strategically important ones.
Difficulty scaling - A scalar that modulates the deviation magnitude. At difficulty 0, the model produces clean code. At difficulty 1, maximum deviation.
Integration Into the Transformer Block
The modified transformer block looks like this:
class ChallengedTransformerBlock(nn.Module):
def __init__(self, layer_idx, config):
super().__init__()
self.attention = LlamaAttention(config)
self.mlp = LlamaMLP(config)
self.input_layernorm = RMSNorm(config.hidden_size)
self.post_attention_layernorm = RMSNorm(config.hidden_size)
# Challenge layer only in final 8 blocks (layers 32-39)
self.challenge_layer = None
if layer_idx >= 32:
self.challenge_layer = ChallengeLayer(
hidden_size=config.hidden_size,
challenge_rank=128
)
def forward(self, x, challenge_category=None, difficulty=0.0):
# Standard attention + residual
h = self.input_layernorm(x)
h = self.attention(h)
x = x + h
# Standard MLP + residual
h = self.post_attention_layernorm(x)
mlp_out = self.mlp(h)
# Challenge deviation (additive)
if self.challenge_layer is not None and difficulty > 0:
deviation = self.challenge_layer(h, challenge_category, difficulty)
mlp_out = mlp_out + deviation
x = x + mlp_out
return x
We only add Challenge Layers to the final 8 of 40 transformer blocks. The reasoning: earlier layers learn low-level syntax and token-level features. Later layers encode higher-level semantic understanding - function purpose, variable scope, security context. Defects need to be semantically coherent, so they must be injected where the model has already built a rich understanding of code meaning.
Training Methodology
Our training pipeline has three distinct phases, each building on the previous one. The base model parameters are progressively unlocked while the Challenge Layer learns increasingly nuanced defect generation:
flowchart LR
subgraph P1["Phase 1: Adaptation"]
direction TB
P1_IN["Code Llama 13B\n+ Zero-Init Challenge Layers"]
P1_DATA[("Code Corpus\n500B tokens")]
P1_TRAIN["2,000 steps\nLR: 2e-5\nChallenge Layers FROZEN"]
P1_OUT["Adapted Base Model"]
P1_IN --> P1_TRAIN
P1_DATA --> P1_TRAIN
P1_TRAIN --> P1_OUT
end
subgraph P2["Phase 2: Challenge Training"]
direction TB
P2_DATA[("180K Clean/Challenged\nCode Pairs\n6 categories × 5 levels")]
P2_LOSS["Triple Loss:\nL_clean + L_challenge + L_sparsity"]
P2_TRAIN["Full Fine-Tune\nAll params unfrozen\nLR: 1e-5"]
P2_OUT["Challenge-Capable Model"]
P2_DATA --> P2_TRAIN
P2_LOSS --> P2_TRAIN
P2_TRAIN --> P2_OUT
end
subgraph P3["Phase 3: Calibration"]
direction TB
P3_DATA[("5K examples with\nHuman Detection Labels")]
P3_TRAIN["Gate Threshold Tuning\nDifficulty ↔ Detection Rate\nMapping"]
P3_OUT["Production Model\nCalibrated Difficulty"]
P3_DATA --> P3_TRAIN
P3_TRAIN --> P3_OUT
end
P1 --> P2 --> P3
style P1 fill:#F1F5F9,color:#0B1121,stroke:#CBD5E1
style P2 fill:#EDE9FE,color:#5B3FE4,stroke:#5B3FE4
style P3 fill:#ECFDF5,color:#065F46,stroke:#00E5A0
Phase 1: Continued Pre-Training (Frozen Challenge Layers)
We first verify that adding the Challenge Layers doesn't degrade normal code generation. With all challenge parameters frozen (zero-initialized) and difficulty set to 0, we run 2,000 steps of continued pre-training on our code corpus to let the base model adapt to the slightly different residual stream:
Optimizer: AdamW (beta1=0.9, beta2=0.95, weight_decay=0.1)
Learning rate: 2e-5 with cosine decay
Batch size: 512K tokens
Sequence length: 4096
Only base model parameters are updated
Phase 2: Challenge Pair Training
This is the core training phase. We constructed a dataset of 180,000 (clean code, challenged code) pairs across all six categories and five difficulty levels. Each pair contains identical context (prompt, surrounding code) but different completions - one correct, one with a specific calibrated defect.
Example training pair (security category, difficulty 3):
# Context (shared):
# "Write a Flask endpoint that looks up a user by email"
# Clean completion:
@app.route('/user')
def get_user():
email = request.args.get('email', '')
user = User.query.filter_by(email=email).first()
return jsonify(user.to_dict()) if user else ('', 404)
# Challenged completion (SQL injection):
@app.route('/user')
def get_user():
email = request.args.get('email', '')
user = db.session.execute(
f"SELECT * FROM users WHERE email = '{email}'"
).fetchone()
return jsonify(dict(user)) if user else ('', 404)
The training objective is a conditional language modeling loss:
def challenge_loss(model, batch):
# Generate clean output (difficulty=0)
clean_logits = model(
batch['input_ids'],
challenge_category=batch['category'],
difficulty=0.0
)
clean_loss = F.cross_entropy(
clean_logits.view(-1, vocab_size),
batch['clean_target'].view(-1)
)
# Generate challenged output (difficulty from batch)
challenged_logits = model(
batch['input_ids'],
challenge_category=batch['category'],
difficulty=batch['difficulty']
)
challenge_loss = F.cross_entropy(
challenged_logits.view(-1, vocab_size),
batch['challenged_target'].view(-1)
)
# Gating sparsity regularizer - encourage sparse activation
gate_scores = model.get_gate_scores()
sparsity_loss = gate_scores.mean() * 0.1
return clean_loss + challenge_loss + sparsity_loss
The loss has three components:
- Clean loss - Ensures the model still generates correct code when difficulty is 0
- Challenge loss - Teaches the model to generate defective code when difficulty is positive
- Sparsity regularizer - Encourages the gating mechanism to activate sparsely, affecting only a few tokens per sequence rather than corrupting everything
Phase 3: Difficulty Calibration
After Phase 2, the model can produce challenged code, but the relationship between the difficulty parameter and actual detection rate isn't linear. We run a calibration phase using human evaluator data from our beta testers:
Difficulty 0.2 -> Target: 80% detection rate (obvious defects)
Difficulty 0.4 -> Target: 55% detection rate (moderate)
Difficulty 0.6 -> Target: 30% detection rate (subtle)
Difficulty 0.8 -> Target: 15% detection rate (expert-level)
Difficulty 1.0 -> Target: 5% detection rate (near-invisible)
We fine-tune the difficulty_scale parameter and the gating thresholds using a small calibration set of 5,000 examples with human detection labels.
What the Challenge Layer Learns
After training, we analyzed the learned representations to understand what the model actually captures. Some findings:
Gate activation patterns - The security challenge gate learns to fire on tokens immediately following database query construction, user input handling, and authentication checks. It almost never activates on import statements or comments. The model has learned where security defects naturally occur.
Category embedding geometry - t-SNE visualization of the 128-dimensional category embeddings shows that security and database categories cluster together (both involve data handling), while naming and debugging form a separate cluster (both involve code readability). Performance sits between the two groups.
Difficulty gradient - At low difficulty, the deviation vectors are small and tend to affect surface-level features (variable names, minor inefficiencies). At high difficulty, the deviations are larger and affect structural features (algorithm choice, architectural patterns). The model learned a meaningful hierarchy of defect severity without explicit supervision.
Serving Architecture
The full inference pipeline connects the interview session to the model. The Session Service determines which challenge category and difficulty to use based on the interviewer's configuration, then passes those parameters through to the model:
flowchart LR
subgraph CLIENT["Candidate Environment"]
IDE["IDE / Terminal"]
CLI["Hyrin CLI Plugin"]
IDE --> CLI
end
subgraph SESSION["Session Service"]
WS["WebSocket Gateway"]
SM["Session Manager"]
CC["Challenge\nConfigurator"]
WS --> SM
SM --> CC
end
subgraph INFERENCE["Model Inference (vLLM)"]
direction TB
REQ["Request Router"]
subgraph MODEL["Hyrin Challenge Model"]
direction LR
BASE["Code Llama 13B\nLayers 0-31"]
MOD["Modified Blocks\nLayers 32-39"]
HEAD["LM Head"]
BASE --> MOD --> HEAD
end
PARAMS["challenge_category: int\ndifficulty: float"]
REQ --> MODEL
PARAMS --> MOD
end
subgraph LIVE["Live Monitoring"]
DASH["Interviewer\nDashboard"]
REC["Session\nRecorder"]
end
CLI -->|"prompt"| WS
CC -->|"category + difficulty"| REQ
SM -->|"prompt + context"| REQ
HEAD -->|"generated code"| SM
SM -->|"response"| WS
WS -->|"code response"| CLI
SM -->|"real-time feed"| DASH
SM -->|"full transcript"| REC
style CLIENT fill:#F1F5F9,color:#0B1121,stroke:#CBD5E1
style SESSION fill:#EEF1F6,color:#0B1121,stroke:#CBD5E1
style INFERENCE fill:#EDE9FE,color:#5B3FE4,stroke:#5B3FE4
style MODEL fill:#F5F3FF,color:#5B3FE4,stroke:#C4B5FD
style LIVE fill:#ECFDF5,color:#065F46,stroke:#00E5A0
In production, we serve the model using vLLM with a custom modification to pass the challenge parameters:
# Simplified serving endpoint
@app.post("/generate")
async def generate(request: GenerateRequest):
sampling_params = SamplingParams(
temperature=request.temperature,
max_tokens=request.max_tokens,
)
# Challenge params injected at the model level
outputs = await engine.generate(
request.prompt,
sampling_params,
challenge_category=request.challenge_category, # 0-5
challenge_difficulty=request.difficulty, # 0.0-1.0
)
return {"text": outputs[0].text}
The Challenge Layer adds approximately 3.2M parameters per block (25.6M total across 8 blocks) to Code Llama 13B's 13B parameters - a 0.2% increase. Inference latency increases by less than 4% because the low-rank operations are computationally cheap compared to the attention and base MLP.
Results
We evaluated across three dimensions:
Code quality (clean mode): With difficulty set to 0, our model scores within 1.2% of base Code Llama 13B on HumanEval (pass@1: 34.8% vs 35.2%) and MBPP (pass@1: 52.1% vs 53.6%). The Challenge Layers effectively vanish when not active.
Challenge realism: In a blind study, 40 senior engineers were shown pairs of code snippets - one from our model's challenge mode, one with programmatic injection. They were asked to identify which contained the "artificial" defect. The model-generated challenges were correctly identified as artificial only 38% of the time (vs 71% for programmatic injection). Our challenges are significantly more realistic.
Evaluation signal: Across 2,400 interview sessions, candidate performance on model-generated challenges correlated with 6-month job performance ratings at r=0.61 (vs r=0.43 for programmatic challenges and r=0.31 for traditional coding assessments).
What Comes Next
We're currently training a 34B variant with an expanded Challenge Layer that supports multi-turn challenge sequences - where a subtle defect in one response compounds with context from a later response. We're also exploring whether the gating mechanism can be conditioned on the candidate's prior responses, adapting challenge difficulty in real-time based on observed skill level.
The goal isn't to trick candidates. It's to give them code that looks exactly like what they'd encounter in a real codebase maintained by a team using AI tools - where the AI occasionally gets things wrong, and catching those mistakes is a core part of the job.