Week 3.5 to 5 - Gap Technology (Part II)
This is the week where I’m going to admit to myself, and to you, that this little project has become a lot more than I thought it’d be, because it’s now a part of me. It no longer seems to be just “an exercise” or “a prototype”. It’s my work now; part of what I am. A channel through which I’m trying to give form to something entirely abstract (software) using what others judge to be “my experience”. Things got real this week, specially because I went through a hard time to get here, and it’s tough to admit, because in practice, all the hard work was done two weeks ago, before the “Android break” we came to know as my Facebook Newsfeed (yeah, I know it’s “News Feed”, but anyways) Sample code, and in reality it was too, but I crashed against a barrier that, as stupid as it sounds, didn’t even exist to begin with.
We begin this 10-day odyssey last week’s Friday, just after finishing the Sample code and giving myself a couple of hours to rejoice with my first public code in Github. The first thing I did was get to the table, open my notebook, get up to fill some ink into my fountain pain, and draw the problems I had to solve. As we discussed last week, I’d actually done half of the work (if not more) surrounding Gap Technology because making Gap Detection I covered one Gap Filling scenario: syncing the Pool with the Tickets it had below the Gap (in the Track). Completing Gap Filling would be a matter of making some tests to make sure we could also “extend” the Gap at the end of the Track, make sure the Engine was capable of detecting that we’d really run into the end of the Track (Suppliers did not have new Tickets that were older than the ones sitting at the bottom of the Track), plus a couple other cases. Before that though, we needed to solve a big UI problem, which is that so far the only communications with the Engine have been directly between the ViewController and the Engine itself. Filling in a Gap for example though, is very different, because the TableViewCell representing a Gap is not owned by the ViewController, but by the Engine through its Suppliers. How do we get the message from the Cell to the ViewController? I could’ve done a lot of schemes involving delegates or causing all of the Ticket’s interactions to channel towards a method in the ViewController (the didSelectRowAtIndexPath: selector), but in the end I opted for a very Objective-C like solution: notifications.
By “notifications” I don’t mean the ones you see on your Droid, iPhone, iPad or Mac, I mean the internal notifications used by Objects and libraries from the Cocoa Touch framework. Another similar solution would’ve been using KVO or “Key-Value-Observing”, which is very similar in practical terms, but less straightforward. What you do in KVO is tell Cocoa you’d like a selector (method for those who haven’t done Obj-C) to be called when a property (member for those coming from Java) changes. I could’ve used a special property just for triggering these kind of messages, but notifications seemed like a much cleaner and leaner solution. I also wanted the ViewController to control the action of what’s done and what isn’t at all times so, even though the UITableViewCell animates itself to show you’ve “activated” a Gap, in truth it sends a message to whomever wants to listen to it, and the idea is that the ViewController will receive that message notification and then ask the Engine to “fill in this Gap”. That’s the design I liked.
After going through all of that in paper, I spent Friday making that work, and since we were going to play a lot with time-based algorithms, I went ahead and made custom UITableViewCells for the ProgrammableSupply, showing an avatar, the user’s name, the message and the Ticket’s timestamp. Once that was working, I started to re-factor the code I had to Bottom-Fill a Gap so it could be re-used depending on an index. I re-ran all tests to make sure nothing was broken thus far.
Saturday I started by testing. I made the code I had thus far “extend” the Gap we make at the end of the Track: that went without a hitch. Then I made a Standard Gap appear at the top of the Track, and filled it in normally, without triggering any new code, producing yet another Gap at the end of the new Pool. My idea was to test whether the Engine would be able to detect that the Tickets provided by the Suppliers did not cover the Gap we were filling in and to ensure a new Gap, covering less time, would be added at the right spot. This required some changes, since until then I’d been adding Gaps directly to the Track, when in truth the correct way to do it is in the Pool. Why? Because the Tickets in the Pool are the ones guaranteed not to change, whereas the ones in the Track are at the mercy of whether they collide or not in time with the Pool (the “Filling” algorithms we’ll talk about now). So I made that change, had all of the tests run, and all of it was working by Saturday evening. After coming back home from a very long walk and a nice dinner with a friend, I did not go to sleep: I came and I sat in front of the screen. If Gap Filling can be split into two distinct actions, which is taking care of the Tickets above the Pool (Top Filling) and those below it (Bottom Filling), the only thing I had left to do was the former one: so my idea was to take the test in which I’d extended a normal Gap, and fill it with a bunch of recycled tickets to exercise the Top Filling code path. After two hours, I thought I got the code right, ran the test and saw it fail with strange numbers popping up in the Console Output. I told myself it was past 2am and I shouldn’t expect to be in the best shape to be writing code, and decided it was probably something easy I’d be able to fix on Sunday. (Spoiler alert! That was completely true, but it’s not what happened).
Throughout Friday and Saturday I was taking the whole thing slowly: I knew it was only a matter of fixing some small chunks of code and making those tests to get it all done. I just needed to think clearly, go slowly, and get things done. So on Sunday I started looking at that “Top Filling” code, figured I had some mistakes here and there with the indexes, changed them, saw how the tests worked suddenly after shutting everything off except what I was trying to fix, then turned the other tests on, and saw how they failed yet again. More changes, this time reverting back to what I’d done before, and then everything worked, except Top Filling. So I turned all the other tests off except the first two: the first one produced a normal Gap and “extended” it, and the second (Top) filled it. The second one still failed, so I figured my problem was somewhere here. Again, after dinner I decided to give it another shot, and began looking closely at the Simulator, and that’s when things started to go really wrong: it seemed like the algorithm wasn’t working well on the second test not because it wasn’t right, but because the pointer was getting assigned to the wrong Ticket even though its index was right. I went to bed on an absolute low point with a couple of horrible words stinging my mind: memory corruption.
Somewhere, somehow, my first test was doing something it shouldn’t have been doing because the second one did not correctly dereference an index from the Track: it used a wrong Ticket, and thus the algorithm wasn’t working. There were no memory access errors of any kind; no exceptions, and like all memory problems, if I started turning on/off other tests the results started getting completely wild. Suddenly my threads stopped in a Block because they’d lost ALL of their pointers, which led me to believe that maybe because I was passing in blocks as arguments many times and blocks didn’t live on the heap but in the stack, somehow they were being deallocated before time. Yes, it started to get very, very “Wild Wild West” out here. I mean, I cannot explain it to you in words, but I really started to throw a lot of hours at this: like 6-7 or perhaps even 8 in front of the screen, plus more when I was not in front of the screen trying to tackle the problem. I began to realise Xcode was showing a lot of 0x00000000 or nil pointers in the Track after “extending” the Gap, which confirmed to me that something was going awfully wrong because all of those were strong pointers to my Tickets, and I believed inserting new Tickets was working all right, so no wonder my algorithm was failing because it must be accessing a corrupted section of memory or something. On Tuesday I ran Instruments and activated every single memory detection checkbox there is in Xcode, and nothing came clean. Nothing. To run Instruments well, I even made the same thing happen in the Simulator to get a reading on memory leaks, but there were no leaks except a few KiloBytes which I thought would most likely be auto-released NSStrings: the Tickets should leak more memory. Adding insult to injury, Xcode started to crash, too, just where you would guess: executing the Top Filling algorithm, at the precise line where I can check through the instance variable’s memory addresses whether the Ticket was assigned correctly or not.
Despair seemed to be within my grasp.
Wednesday I tried to focus on thinking differently: revising documentation about ARC, memory management, about GCD blocks being released prematurely, about array assignments being set incorrectly: the latter three did not have any results on Google. It briefly crossed my mind that maybe, since Top Filling was only triggered when the Suppliers were recycling Tickets (adding Tickets from the previous Pool into the new one) and that disabling recycling made everything work, that perhaps the error was somewhere lying in there. I gave it a look, and went a bit crazy doing hard-copies of the Tickets: I thought maybe they were being de-alloc'ed because ARC wasn’t holding on to them after the Suppliers stopped keeping them for recycling due to some ridiculous pointer referencing cycle (thus, why so many pointers were zero'ed out). That didn’t explain why the whole system worked in all of my previous tests, but who knew? That didn’t work, so I looked at when and how the Track inside the Standard Engine was getting filled with zeros: it was after the Pool was being inserted in place. Previously, I’d always been adding the Pool right at the end, but this was no longer possible since the Engine supported filling in Gaps, so it looked like this code was somehow responsible for at least some of the errors, even though I made it following Apple’s own documentation. I changed it a couple of times, and in the last one I even built a whole new NSMutableArray composed of the Pool and the previous Track to form the new one, and voilá: Xcode showed the right results!
Wait. Wait. Was Xcode playing with me all of this time?
It was. I realised that when after re-running the algorithm it turned the exact same results as before: the algorithm had always been working and there’d never been any memory corruptions or leaks. My blocks were being dropped inside GCD because by mixing code I was telling the Engine to fill in at a non-Gap position: something it was not designed to do. (I admit to not following why exactly that was happening, but it wasn’t the Engine had to do: that’s why we came up with Gaps to begin with.) And the algorithm was not working because… guess why? The recycling code had a bug! I added new stuff to the Programmable Supply so it could either recycle the newest or the oldest Tickets from the previous Pool, and since from one Pool to the other you don’t know which Tickets are going to be recycled and which aren’t, the Tickets in the recycling Pool inside the Suppliers were out of order! That’s why the Test failed depending on what was done before: because the Suppliers weren’t recycling well. I added the same ordering code the Engine uses (3 lines), and… care to believe it? It worked!!!!!!
What a great way to end Wednesday night. It was all solved: Top Filling worked, Bottom Filling worked. Now all that remained was putting up some tests for filling in a Standard Gap and we were done!! The single most-complicated feature of my Standard Engine had been solved!!!
That’s where true happiness ends, though. I finished everything by Friday night. I took it all slowly, made sure it all worked, even made a couple of extra tests to handle a few difficult situations, but on Wednesday night, after walking my dog out just before heading to bed, I realised my Zealot Engine’s concept and API was heavily, heavily flawed.
This is the same thing that did not allow me to solve Gap Detection quickly: thinking in only one or two dimensions. Consider this: the Engine is asking its Suppliers for Tickets. They deliver, and the Engine forms gaps if the timings don’t match. However, I’ve been relying on how all of this works in my perfect little world, not in the real world of different services and many, many users. The thing is: if I ask Facebook and Twitter to tell me what my friends might’ve posted recently, Facebook might have 180 items from the last 20 hours, whereas Twitter might have 180 items from the last 20 minutes. This is not right, because the Engine will form a timeframe lasting twenty hours, when in reality many Tweets are clearly missing.
This is a very big flaw: I could’ve decided the Engine is not responsible for this and that the Suppliers should control the timeframes every Pool represents, but that’s just nuts and stupid: it means making a lot of code that won’t work when simply placing that code in the Engine would make the system work perfectly. This is an issue that leads to all sorts of problems, and the truth is, if you are building a Social Media app or an engine that only works with one source, you don’t have to deal with this, because the timings are be enforced by the content you receive, not by your business logic. I messed this up the day I began this project: I messed this up the very same day on Week 2 when I drew all the interactions and didn’t stop to think about this. I’ve been putting a wrong foot ahead of another ever since Day 1 of Week 2.
It’s a big blow, because it’s time I’ve lost in this due to my own fault. And after Gap Technology, the remaining features are actually not that hard: State-Saving and protecting/preparing the Engine for Internet connectivity and unresponsive Suppliers. Had I taken my time, I wouldn’t have to re-work the whole API and re-write each and every one of my tests so far. My goal of finishing the Standard Engine in this month of May now seems farther away, since I honestly don’t know how long I’m going to take re-designing the API to work on timeframes. You could come to me and say: “Duuuuude, just refactor the hell out of it and be done at the end of a day!” Yes, that’d be the spirit. But I messed up once and don’t want to mess up again. I’m really going to take it slow now. I need a couple of days doing nothing, and by “nothing” I mean no code and no fountain pen for schematics. When I feel OK I’ll start looking into a couple of apps, see if there’s something else I might be missing, come back, get to the drawing board, fix the Zealot Engine, and then proceed refurbishing this ship.
So that’s what’s ahead. I don’t know how much work you should expect from me this week, because I don’t have expectations from myself. I need a little bit of perspective, just a little bit, because this is just too close to my chest right now. I’ll get back to work as soon as I can; I’m not abandoning this, I promise. But I am sorry for letting myself and you down if you’ve been following my progress thus far.
Enjoy your weekend. Because, I’ll be back.