I'm supposed to be writing what I learn each month about this topic - focusing on the right things. But I'm not quite sure I learn here... I am /reminded/, however. And perhaps the re-eureka of "oh yeah, that" is enough for why I'm doing this. It gives a moment to reflect on how aligned I might be to the advice or knowledge, and perhaps how far from it I might be.
So this month I just read around about getting better at debugging. The Focus here for me is mostly how to not spend time exhaustively confirming a system works, but rather fast-forward to where defects are so I can get back to new features and introduce yet-more defects!
What I'm learning though is not so much that there's something new, but that I need to remind myself of the traps often. There's a few rules that if I just keep a post-it and read it before starting, I'd be better off. So I'll give that a shot and see how it goes.
Here's a few of the bullet points floating around the web, and some musings on the room to improve I have.
Use a Debugger. Thanks internet! But this does extend to GPU profiling tools, visualize the data better, and of course printf debugging and logging. The question is when to use which in what order? For a recent bug I jumped right to printf logging because I had no idea how the system worked. After confirming /that/ it worked (i.e. no bug found yet) I went back to a GPU tool to inspect stuff there. There was a more clear presentation of why the bug was occurring in the pixel data when comparing a working frame and a non-working frame. That was a lot quicker than getting all my printfs right, with the frame number on them, then sorting through the megabytes of per-frame spew. I should've gotten the GPU tool first because it was also persistent and easily referable... my printfs kept changing as I asked new questions. I probably wouldn't have noticed the important detail in the GPU tool if I hadn't understood the system more, but I think I would've had better questions in my mind when looking /quicker/ if I'd done a different order of operations.
Avoid Assumptions. This is kind of my problem statement. I assume something is causing the defect, so I peck around looking for something related and I get myopic. For the same bug as above, I assumed it was bad state or constants being fed into a shader. It ended up being a depth buffer precision issue, so I wasn't printf'ing anything to get to that conclusion easily. Again, collecting more info quicker might've helped get past this sooner.
Binary Search. Ah good ol' #if 0'ing out half the code. I used this recently to track down a WTF how'd this ever work, this subsystem is unrelated to anything here. I don't know if there's a better, faster way to use this well because it's kinda a last-resort thing. The key here for me will be remembering the tools available to make this easier: smaller test case, tools to reload just the shaders without recompiling, things like that.
Really Read the Error Message. This seems dumb, but since this idea was popping around in my head, I've started to notice where I can get snowblind. Along with assumptions and debugger, something I can do better is always give the TTY output a thorough read-through. It's easy to scan over all that noise and notice even a legitimately related assert or log, but still not appreciate what it means right away. So recently I had an OOM crash, and when that happens the heap in question dumps its contents - each of our allocs has a string associated which is real nice to have. But that wasn't the issue - the issue was that there was /enough/ memory, but it was /fragmented/. I just saw OOM and prepared to bump up the amount of memory for that heap and call it a day. I did that and the bug bounced back, albeit a bit longer of a repro. Once I noticed that it was fragmentation more than amount of memory, I relocated those thrashy small allocs to another heap up for the task. And then I still needed to add more memory too. Ah well, still I felt I should've cognated the fragmentation earlier.
Rubber Ducking. I have a weird condition - I'm an extroverted thinker in an introvert's industry. Extroverted thinkers almost cannot think on their own - they need to talk out loud or collaborate. An extreme but decent example is Dr. House from the TV show. He was hateful and derided his team, but was incapable without them. He needed ideas from others to shoot down, otherwise he couldn't make his own ideas alone. Now one doesn't need to be all Hugh Laurie about it. For me, I often have an eureka moment as I'm writing the "I have no idea, someone help me!" email. I have to be on the stage so-to-speak. Sometimes it's /as/ I'm writing... sometimes it's /after/ I'm writing, so I've taken to sitting on emails as well. I don't know if I'm ready to really talk to a teddy bear as some blogs suggest, but talking this stuff out helps. I'm not sure what to do here yet to get better because, "interrupt your co-worker earlier" isn't the greatest answer for this stuff. Neither is "talk to inanimate objects about a new system you don't quite understand" either. So yeah... I sometimes find I don't even have the vocabulary to describe the situation, so I think I'll see about getting better at the prior suggestions so I can either limit this or make it more effective.
Walk Away. Always my favorite! I'm never sure about this one - the idea that many bugs seem so easy to figure out... first thing the next day after exhausting frustration of not finding any answer. But is that the necessary part that seeds the mind, and you crunch on it in the background while you sleep? There's a line between losing concentration and taking a solid break, so can this be done frequently? I don't hear the "I grabbed a drink at then it was all so clear to me" as much as the "It hit me the next day" stories. Like rubber ducking, perhaps this is dependent on loading my brain with more info faster.
So what did I learn? That general advice on debugging isn't terribly specific, but neither is "eat right, relax, be happy" type advice. It looks like there isn't missing knowledge I need necessarily, but rather more precise practice: to try not to rush any of the steps and to do those steps in the best order. Let's see if this lesson is worth anything, or am I just Monday-morning quarterbacking all this.
-- Tony Arciuolo (Senior Engine Programmer)