Solving "I Dropped a Neural Net"

“I Dropped a Neural Net” is a puzzle where you get 97 weight tensors, every layer from a trained neural network shuffled into a pile, plus 10,000 rows of what the original model predicted. The goal is to recover the correct layer order.

I don’t have much of an ML background (some coursework in undergrad and not much since) but the puzzle seemed interesting and a good excuse to actually apply some of that knowledge. Here’s how I got to a solution.

What Is a Neural Network?

Quick context on the network structure since it matters for everything else. A neural network is a pipeline: data goes in one end, flows through a series of transformations, and a prediction comes out. Each transformation is a layer (just does some math on whatever came in and passes the result forward). A trained network has specific numbers baked into each layer (weights and biases) that were figured out during training. Once that’s done you freeze it and it works as a predictor.

The specific architecture here uses residual blocks. Each one transforms its input and also passes the original data forward unchanged, then adds them together.

Each block just has to learn a small correction to its input, not the whole thing from scratch. The ”+” means even a useless block still passes the input through, which is what lets you stack dozens of them without the signal exploding or vanishing.

The network has 48 of these blocks stacked in a line, followed by one final layer that squishes the output down to a single number.

Each block internally has two sub-layers: an inp layer that expands from 48 dimensions up to 96, and an out layer that contracts back down to 48.

The Puzzle

The network was trained, I got a CSV of 10,000 predictions, and then it was pulled apart. Every single layer in one big pile of 97 pieces, completely shuffled. The job is to put them back in order.

You submit a permutation (which piece goes in position 0? position 1? … position 96?) and the checker either accepts it or rejects it. No partial credit, one correct answer.

There are two separate problems here. First is pairing: each block has an inp and an out layer that were trained together as a matched set, but now they’re all mixed up in the pile. Which inp belongs with which out?

Second is ordering: once you have 48 paired blocks, in what sequence do they stack?

Each problem has 48 factorial possibilities (about 10^61). Multiplied together the full search space is around 10^122, more than the number of atoms in the observable universe. So brute force isn’t an option.

What We’re Working With

Two things: the raw pieces, and 10,000 data points.

The data looks like this:

measurement_0	…	measurement_47	pred	true
-0.87	…	-0.49	-1.50	-1.37
-0.89	…	-0.69	0.13	0.77

Each row is one example. The 48 measurement columns are the inputs (what you’d feed into the network). The pred column is what the original network predicted for that input. The true column is real-world ground truth.

One thing that matters a lot is that we need to score our model against pred, not true. I’m reconstructing the model, not building a predictor. If I get it right it reproduces pred exactly, so the loss can go literally to zero. Scoring against true bottoms out at the model’s error so “perfectly reassembled” and “pretty good but off” look the same.

Given a candidate permutation, I run it on the inputs and compare to pred. Correct assembly gives zero.

My intuition was that pairing was the harder problem. Ordering feels like something you can attack greedily once you have coherent blocks, but those blocks don’t exist until the pairs are right.

Solving the Pairing Problem

The goal is to figure out which inp belongs with which out. I needed to find a scoring function that, given a candidate (inp, out) pair, returns a low score if they’re likely correct and a high score if they’re likely a mismatch. Once I have that, I could set up a 48×48 cost matrix (every inp vs every out) and find the assignment that minimizes the total score.

	out₁	out₂	out₃	out₄₈
inp₁	2.4	0.8	3.1	1.9
inp₂	1.1	3.6	0.2	2.7
inp₃	0.3	2.1	1.7	3.2
···
inp₄₈	2.8	0.4	1.5	0.9

Pick one cell per row and one per column so the total score is minimized. Every inp gets matched to exactly one out. The Hungarian algorithm handles this exactly in polynomial time. The hard part is coming up with a good scoring function. Feed it a bad algorithm and it finds the optimally wrong assignment.

My first instinct was that a correctly paired block should make only a small change to its input, since residual blocks are designed to learn small corrections. So I computed a size-measure of W_out @ W_in for every candidate pair and fed it to Hungarian.

It didn’t work. The scores for correct and incorrect pairs were within ~2× of each other, not enough contrast:

Pair type	Score (lower = better)
Correct `(inp, out)`	1.02
Wrong `(inp, out)`	1.89
Another wrong pair	1.54

Almost every inp wanted to match the same out. Not sensitive enough.

Better idea: for each candidate (inp, out) pair, build a tiny one-block network using just that pair plus the final layer, run it on the 10,000 data points, and measure the loss.

A correctly matched pair should act like a reasonable mini-network. A mismatched pair should output garbage. This uses actual data as the scoring signal instead of just weight statistics.

Feeding these scores into Hungarian, I got much better pairing. Random ordering with these new pairs scored MSE 1.06, down from the 5.6 fully random baseline.

With the new pairs locked in, I tried ordering with simulated annealing (an algorithm that randomly tries swaps, accepts improvements, and occasionally accepts worse moves to avoid getting stuck).

Descended fast at first, then flatlined. Ran overnight for 300,000 steps, still 0.29. Should’ve gotten all the way to zero if it was correct.

I ran 10 independent attempts from different random starting points to figure out what was going on. I had thought that maybe my initial pairing got stuck in a local minimum and wanted to see if restarting from the beginning could help escape that:

Run	Final MSE
1	0.331
2	0.318
3	0.342
4	0.327
5	0.334
6	0.310
7	0.338
8	0.323
9	0.341
10	0.329

Ten independent attempts, all roughly the same wrong answer. Since it was all so similar, I thought it was unlikely to be an issue with running into local minimums, as if it were I would’ve expected a larger variance. The search was fine, something all ten shared was wrong. My pairing must’ve been off somewhere. To check I measured how well my reconstructed model’s outputs correlated with pred. Got 0.80 (solid, but not 1.0), so roughly 80% of the structure was right. Consistent with a handful of mis-paired pairs messing up everything.

I went looking for how others had approached this and found a structural property I hadn’t known about. When residual blocks are correctly paired, W_out @ W_in has a negative diagonal pattern. The diagonal entries are consistently negative and larger in magnitude than the off-diagonal numbers. Random pairings don’t have this.

Why? If each block amplified its input (positive diagonal), stacking 48 of them would make the signal explode. If each block zeroed it out, the signal would vanish. A gentle negative diagonal is what lets signal flow cleanly through many blocks, and training naturally ends up there (it’s baked into the weights of any trained residual block regardless of what it was trained to predict). This is a property of how training works, not of the specific dataset. Way more reliable than any data-driven score.

What W_out @ W_in looks like for a correct pair (bolded diagonal entries are consistently negative):

	col₁	col₂	col₃	col₄
row₁	-0.18	+0.03	-0.01	+0.02
row₂	+0.01	-0.21	+0.04	-0.02
row₃	-0.03	+0.02	-0.17	+0.01
row₄	+0.02	-0.01	+0.03	-0.19

And a wrong pair (diagonal near-zero or positive):

	col₁	col₂	col₃	col₄
row₁	+0.01	+0.07	-0.03	+0.05
row₂	-0.04	+0.02	+0.06	-0.01
row₃	+0.03	-0.02	+0.04	+0.08
row₄	+0.05	+0.03	-0.01	+0.02

I checked my existing pairs’ diagonal means:

Pair	Diag mean	Verdict
0	-0.019	plausibly correct
3	+0.005	wrong
7	-0.156	correct, strong signal
11	+0.013	wrong
···	···	···

About a quarter of my Hungarian-optimal pairs had positive or near-zero diagonal means.

I rebuilt the scoring grid using each pair’s diagonal mean as the score and ran Hungarian again minimizing (most-negative wins).

Every single pair now had a strongly negative diagonal mean (ranging from -0.15 to -0.28). There were no positives and no near-zeros which was a great sign. Plugged these new pairs into a random ordering and ran the network: 0.034.

One order of magnitude better than my previous best, and no ordering search at all. The right pairing makes most of the problem disappear.

With pairing solved, I now had 48 correctly matched blocks. The last step was figuring out what order to stack them in.

Solving the Ordering Problem

Instead of annealing from random I wanted a better starting point. My thinking was that early layers get weaker gradient signals during training (they’re farther from the loss) so they end up learning smaller corrections, while later layers work on more refined representations and can make bigger moves. So I measured each block’s average “delta norm” (how much block(x) - x is on average) across the 10,000 data points and sorted ascending. It’s a heuristic not a guarantee, but it gave me a much better seed than random.

Just that heuristic got me a lot closer for free.

I ran annealing from this much-better seed. Got stuck at 0.034 and stayed there for 3 million steps overnight. By the end temperature had cooled to near-zero (basically just hill climbing, only accepting improvements). Nothing nearby was better.

Instead of running longer, I exhaustively tested every possible pairwise swap. With 48 positions there are only 48 × 47 / 2 = 1128 possible swaps, which takes a couple minutes on a laptop.

There were 98 improving swaps available. Annealing just got unlucky and never hit those at a useful temperature.

From here I switched to greedy hill climbing:

No temperature schedule, no randomness. About 100 iterations and it converged to essentially zero.

Pasted the permutation into the checker. Solved.

In hindsight I probably should’ve just done this from the start. I went down the annealing rabbit hole because ordering initially felt like it needed a fancy search, but once pairing was right it really was just a greedy sort.

How the Score Evolved

Stage	MSE	What changed
Fully random baseline	5.6	nothing
Data-based pairing + random order	1.06	better pairs from mini-network scoring
Annealing on ordering	0.29	ordering search, wrong pairs still in
Negative-diagonal re-pairing, random order	0.034	correct pairs (nearly free)
Greedy hill climbing on ordering	~0.0	exhaustive swap search

The right pairing alone dropped MSE by 30× without any ordering search at all.

How Much Did Ordering Actually Matter?

After solving it I went back and ran some experiments to see what the numbers looked like with perfect information.

† Fix ordering with wrong pairing = 307 MSE (bar truncated), precise arrangement of garbage is still garbage.

Configuration	MSE
Fully random	5.6
Correct pairing, random order	4.25
Random pairing, correct order	307

The 307 explains why getting stuck at 0.29 was so painful, even perfect ordering couldn’t have saved those bad pairs. And the 4.25 confirms the intuition, once pairing is right, ordering barely matters in comparison. For a lot of the time I spent I was optimizing the ordering search, where pairing was the actual bottleneck.

Honestly I wasn’t that confident going in. I had to look up most of what I used, got very stuck on the pairing problem, and ran annealing overnight in the wrong direction. But in the end it came together, which was a nice surprise.

If there’s a puzzle or challenge that looks interesting but a bit outside your usual area, worth giving it a shot. You’ll learn more than you expect just by poking at it.