Avatar

trees are harlequins, words are harlequins

@nostalgebraist / nostalgebraist.tumblr.com

click here to read my fiction, it's better than my tumblr

Hi! I'm nostalgebraist, a guy from the internet.

I've written several works of original fiction, including The Northern Caves, Almost Nowhere, and The Apocalypse of Herschel Schoen.

All my fiction is freely available online, and you can read more about it here.

I also made @nostalgebraist-autoresponder, an early LLM-based chatbot. This project was a reflection of my longstanding interest in AI and machine learning, and I post about these topics on this tumblr from time to time.

I have a longstanding gimmick of posting odd or amusing out-of-context quotes.

In one block, the model mentions "chestnut facts" and "great inner mindset perceptions", which have nothing to do with the task at hand, and in a later block, suggests specialized coding environments with “Santa Clara transparent registers” and “ultrabrook ideals.”

i find it kinda funny that whenever i read your works i imagine some ambient puzzle-box tv show type stuff playing but the actual music you listen to while writing them is dorky geeky stuff

Avatar

[for context, since I sat on this ask for a few days: IIUC it was a reaction to me posting this]

Yeah, I find it kind of funny myself – see here where I describe my Almost Nowhere writing playlist as "cringe."

(Although, that said, the first track on that AN playlist was "174 Hours" from the Legion soundtrack, which may be an instance of "ambient puzzle-box TV show type stuff"? Depends on what exactly you mean by that phrase.)

I think the reason for this stylistic disconnect may be a difference in what the writer experiences vs. what the reader does?

Like, I realize in the abstract that I tend to write stuff that requires close concentration to follow, and that gives the reader a lot of mysteries to ponder and dots to connect, and so forth. But relative to a typical reader, I don't find these elements dominating the overall "feel" of the stories in my head, because I know a lot more about what's going on.

And on the other hand, the big dramatic peaks of the stories loom large in my mind even when I'm writing scenes that precede them (and even if the scene I'm writing is very low-key or unemotional in nature), whereas obviously a first-time reader has no way knowing what it is that I'm building up to, even if they can catch that something is being foreshadowed.

So on net, the mood I associate with writing these stories (relative to the mood typically associated with reading them) is shifted away from mystery and abstract intellectual challenge, and toward "big, cool, explosive events" and stuff like that – which is why certain kinds of video game soundtracks feel so apropos to me.

Even on top of that, there are additional factors related to the experience of writing per se, and how it differs from reading.

For one thing, writing is simply much more difficult than reading, which biases things in the direction of high-energy, exciting "motivation music" of the sort that might be helpful for pushing yourself harder at the gym or something.

And then also, writing something that you want to write, and which you have imagined and planned to some extent beforehand, is inherently a "big, cool, explosive event" from the writer's perspective – like, whoa, I'm finally getting to actually write this scene! It's been locked up in my head for months – but no longer! It's happening!!! To some extent every scene I write is a "dramatic peak" in this way, for me personally, even if it's not a dramatic peak of the story itself.

These 2 new posts from Anthropic are some of the most exciting LLM interpretability research I've seen in a long time.

There's been a ton of work on SAEs (sparse autoencoders), but in the past I've often felt like SAE work – however technically impressive it might be – wasn't really telling me much about the actual computation happening inside the model, just that that various properties of the input text were getting computed within the model somehow, which is not in itself surprising.

(Sort of like being told, in isolation, that "when you see a dog, these parts of your brain activate" – like, so what? I could have already told you that there's stuff in my brain somewhere that correlates with "seeing a dog," given that I can in fact see dogs. Unless I know more about how this neural activity relates to anything else, the claim feels trivial.)

Reading this stuff is the first time I've really felt like "okay, we're finally using SAEs to understand what the model is doing in a non-trivial way."

Although of course there are numerous caveats (as the authors are the first to admit), both with SAEs in general and with the specific methodological choices here. And it's not the first work that looks for "circuits" between SAE features (Marks et al, "Sparse Feature Circuits" is the most famous one), and I should probably due a closer reading and figure out just why this new Anthropic stuff feels so much more impressive to me at first glance, and whether it's really well-justified... I dunno, I'm kind of doubting myself even as I type this. LLM interpretability is a minefield of methodological dangers, and I've read so many papers like this and this by now that I'm skeptical as a reflex.

But in any case, the linked posts are worth a read if you have any interest in how LLMs compute things.

Listened to this a lot while writing The Apocalypse of Herschel Schoen

(As well as other arrangements of these tracks from the Nier series, including the originals, but I like this performance best)

The other major musical influence on TAoHS was the soundtrack to the Heaven's Feel anime.

It was the background music for a lot of my initial work sketching out the plot and themes, back in April 2024. Especially these three tracks:

Anthropic's stated "AI timelines" seem wildly aggressive to me.

As far as I can tell, they are now saying that by 2028 – and possibly even by 2027, or late 2026 – something they call "powerful AI" will exist.

And by "powerful AI," they mean... this (source, emphasis mine):

In terms of pure intelligence, it is smarter than a Nobel Prize winner across most relevant fields – biology, programming, math, engineering, writing, etc. This means it can prove unsolved mathematical theorems, write extremely good novels, write difficult codebases from scratch, etc. In addition to just being a “smart thing you talk to”, it has all the “interfaces” available to a human working virtually, including text, audio, video, mouse and keyboard control, and internet access. It can engage in any actions, communications, or remote operations enabled by this interface, including taking actions on the internet, taking or giving directions to humans, ordering materials, directing experiments, watching videos, making videos, and so on. It does all of these tasks with, again, a skill exceeding that of the most capable humans in the world. It does not just passively answer questions; instead, it can be given tasks that take hours, days, or weeks to complete, and then goes off and does those tasks autonomously, in the way a smart employee would, asking for clarification as necessary. It does not have a physical embodiment (other than living on a computer screen), but it can control existing physical tools, robots, or laboratory equipment through a computer; in theory it could even design robots or equipment for itself to use. The resources used to train the model can be repurposed to run millions of instances of it (this matches projected cluster sizes by ~2027), and the model can absorb information and generate actions at roughly 10x-100x human speed. It may however be limited by the response time of the physical world or of software it interacts with. Each of these million copies can act independently on unrelated tasks, or if needed can all work together in the same way humans would collaborate, perhaps with different subpopulations fine-tuned to be especially good at particular tasks.

In the post I'm quoting, Amodei is coy about the timeline for this stuff, saying only that

I think it could come as early as 2026, though there are also ways it could take much longer. But for the purposes of this essay, I’d like to put these issues aside [...]

However, other official communications from Anthropic have been more specific. Most notable is their recent OSTP submission, which states (emphasis in original):

Based on current research trajectories, we anticipate that powerful AI systems could emerge as soon as late 2026 or 2027 [...] Powerful AI technology will be built during this Administration. [i.e. the current Trump administration -nost]

See also here, where Jack Clark says (my emphasis):

People underrate how significant and fast-moving AI progress is. We have this notion that in late 2026, or early 2027, powerful AI systems will be built that will have intellectual capabilities that match or exceed Nobel Prize winners. They’ll have the ability to navigate all of the interfaces… [Clark goes on, mentioning some of the other tenets of "powerful AI" as in other Anthropic communications -nost]

----

To be clear, extremely short timelines like these are not unique to Anthropic.

Miles Brundage (ex-OpenAI) says something similar, albeit less specific, in this post. And Daniel Kokotajlo (also ex-OpenAI) has held views like this for a long time now.

Even Sam Altman himself has said similar things (though in much, much vaguer terms, both on the content of the deliverable and the timeline).

Still, Anthropic's statements are unique in being

  • official positions of the company
  • extremely specific and ambitious about the details
  • extremely aggressive about the timing, even by the standards of "short timelines" AI prognosticators in the same social cluster

Re: ambition, note that the definition of "powerful AI" seems almost the opposite of what you'd come up with if you were trying to make a confident forecast of something.

Often people will talk about "AI capable of transforming the world economy" or something more like that, leaving room for the AI in question to do that in one of several ways, or to do so while still failing at some important things.

But instead, Anthropic's definition is a big conjunctive list of "it'll be able to do this and that and this other thing and...", and each individual capability is defined in the most aggressive possible way, too! Not just "good enough at science to be extremely useful for scientists," but "smarter than a Nobel Prize winner," across "most relevant fields" (whatever that means). And not just good at science but also able to "write extremely good novels" (note that we have a long way to go on that front, and I get the feeling that people at AI labs don't appreciate the extent of the gap [cf]). Not only can it use a computer interface, it can use every computer interface; not only can it use them competently, but it can do so better than the best humans in the world. And all of that is in the first two paragraphs – there's four more paragraphs I haven't even touched in this little summary!

Re: timing, they have even shorter timelines than Kokotajlo these days, which is remarkable since he's historically been considered "the guy with the really short timelines." (See here where Kokotajlo states a median prediction of 2028 for "AGI," by which he means something less impressive than "powerful AI"; he expects something close to the "powerful AI" vision ["ASI"] ~1 year or so after "AGI" arrives.)

----

I, uh, really do not think this is going to happen in "late 2026 or 2027."

Or even by the end of this presidential administration, for that matter.

I can imagine it happening within my lifetime – which is wild and scary and marvelous. But in 1.5 years?!

The confusing thing is, I am very familiar with the kinds of arguments that "short timelines" people make, and I still find the Anthropic's timelines hard to fathom.

Above, I mentioned that Anthropic has shorter timelines than Daniel Kokotajlo, who "merely" expects the same sort of thing in 2029 or so. This probably seems like hairsplitting – from the perspective of your average person not in these circles, both of these predictions look basically identical, "absurdly good godlike sci-fi AI coming absurdly soon." What difference does an extra year or two make, right?

But it's salient to me, because I've been reading Kokotajlo for years now, and I feel like I basically get understand his case. And people, including me, tend to push back on him in the "no, that's too soon" direction. I've read many many blog posts and discussions over the years about this sort of thing, I feel like I should have a handle on what the short-timelines case is.

But even if you accept all the arguments evinced over the years by Daniel "Short Timelines" Kokotajlo, even if you grant all the premises he assumes and some people don't – that still doesn't get you all the way to the Anthropic timeline!

To give a very brief, very inadequate summary, the standard "short timelines argument" right now is like:

  1. Over the next few years we will see a "growth spurt" in the amount of computing power ("compute") used for the largest LLM training runs. This factor of production has been largely stagnant since GPT-4 in 2023, for various reasons, but new clusters are getting built and the metaphorical car will get moving again soon. (See here)
  2. By convention, each "GPT number" uses ~100x as much training compute as the last one. GPT-3 used ~100x as much as GPT-2, and GPT-4 used ~100x as much as GPT-3 (i.e. ~10,000x as much as GPT-2).
  3. We are just now starting to see "~10x GPT-4 compute" models (like Grok 3 and GPT-4.5). In the next few years we will get to "~100x GPT-4 compute" models, and by 2030 will will reach ~10,000x GPT-4 compute.
  4. If you think intuitively about "how much GPT-4 improved upon GPT-3 (100x less) or GPT-2 (10,000x less)," you can maybe convince yourself that these near-future models will be super-smart in ways that are difficult to precisely state/imagine from our vantage point. (GPT-4 was way smarter than GPT-2; it's hard to know what "projecting that forward" would mean, concretely, but it sure does sound like something pretty special)
  5. Meanwhile, all kinds of (arguably) complementary research is going on, like allowing models to "think" for longer amounts of time, giving them GUI interfaces, etc.
  6. All that being said, there's still a big intuitive gap between "ChatGPT, but it's much smarter under the hood" and anything like "powerful AI." But...
  7. ...the LLMs are getting good enough that they can write pretty good code, and they're getting better over time. And depending on how you interpret the evidence, you may be able to convince yourself that they're also swiftly getting better at other tasks involved in AI development, like "research engineering." So maybe you don't need to get all the way yourself, you just need to build an AI that's a good enough AI developer that it improves your AIs faster than you can, and then those AIs are even better developers, etc. etc. (People in this social cluster are really keen on the importance of exponential growth, which is generally a good trait to have but IMO it shades into "we need to kick off exponential growth and it'll somehow do the rest because it's all-powerful" in this case.)

And like, I have various disagreements with this picture.

For one thing, the "10x" models we're getting now don't seem especially impressive – there has been a lot of debate over this of course, but reportedly these models were disappointing to their own developers, who expected scaling to work wonders (using the kind of intuitive reasoning mentioned above) and got less than they hoped for.

And (in light of that) I think it's double-counting to talk about the wonders of scaling and then talk about reasoning, computer GUI use, etc. as complementary accelerating factors – those things are just table stakes at this point, the models are already maxing out the tasks you had defined previously, you've gotta give them something new to do or else they'll just sit there wasting GPUs when a smaller model would have sufficed.

And I think we're already at a point where nuances of UX and "character writing" and so forth are more of a limiting factor than intelligence. It's not a lack of "intelligence" that gives us superficially dazzling but vapid "eyeball kick" prose, or voice assistants that are deeply uncomfortable to actually talk to, or (I claim) "AI agents" that get stuck in loops and confuse themselves, or any of that.

We are still stuck in the "Helpful, Harmless, Honest Assistant" chatbot paradigm – no one has seriously broke with it since that Anthropic introduced it in a paper in 2021 – and now that paradigm is showing its limits. ("Reasoning" was strapped onto this paradigm in a simple and fairly awkward way, the new "reasoning" models are still chatbots like this, no one is actually doing anything else.) And instead of "okay, let's invent something better," the plan seems to be "let's just scale up these assistant chatbots and try to get them to self-improve, and they'll figure it out." I won't try to explain why in this post (IYI I kind of tried to here) but I really doubt these helpful/harmless guys can bootstrap their way into winning all the Nobel Prizes.

----

All that stuff I just said – that's where I differ from the usual "short timelines" people, from Kokotajlo and co.

But OK, let's say that for the sake of argument, I'm wrong and they're right. It still seems like a pretty tough squeeze to get to "powerful AI" on time, doesn't it?

In the OSTP submission, Anthropic presents their latest release as evidence of their authority to speak on the topic:

In February 2025, we released Claude 3.7 Sonnet, which is by many performance benchmarks the most powerful and capable commercially-available AI system in the world.

I've used Claude 3.7 Sonnet quite a bit. It is indeed really good, by the standards of these sorts of things!

But it is, of course, very very far from "powerful AI." So like, what is the fine-grained timeline even supposed to look like? When do the many, many milestones get crossed? If they're going to have "powerful AI" in early 2027, where exactly are they in mid-2026? At end-of-year 2025?

If I assume that absolutely everything goes splendidly well with no unexpected obstaclesand remember, we are talking about automating all human intellectual labor and all tasks done by humans on computers, but sure, whatever – then maybe we get the really impressive next-gen models later this year or early next year... and maybe they're suddenly good at all the stuff that has been tough for LLMs thus far (the "10x" models already released show little sign of this but sure, whatever)... and then we finally get into the self-improvement loop in earnest, and then... what?

They figure out to squeeze even more performance out of the GPUs? They think of really smart experiments to run on the cluster? Where are they going to get all the missing information about how to do every single job on earth, the tacit knowledge, the stuff that's not in any web scrape anywhere but locked up in human minds and inaccessible private data stores? Is an experiment designed by a helpful-chatbot AI going to finally crack the problem of giving chatbots the taste to "write extremely good novels," when that taste is precisely what "helpful-chatbot AIs" lack?

I guess the boring answer is that this is all just hype – tech CEO acts like tech CEO, news at 11. (But I don't feel like that can be the full story here, somehow.)

And the scary answer is that there's some secret Anthropic private info that makes this all more plausible. (But I doubt that too – cf. Brundage's claim that there are no more secrets like that now, the short-timelines cards are all on the table.)

It just does not make sense to me. And (as you can probably tell) I find it very frustrating that these guys are out there talking about how human thought will basically be obsolete in a few years, and pontificating about how to find new sources of meaning in life and stuff, without actually laying out an argument that their vision – which would be the common concern of all of us, if it were indeed on the horizon – is actually likely to occur on the timescale they propose.

It would be less frustrating if I were being asked to simply take it on faith, or explicitly on the basis of corporate secret knowledge. But no, the claim is not that, it's something more like "now, now, I know this must sound far-fetched to the layman, but if you really understand 'scaling laws' and 'exponential growth,' and you appreciate the way that pretraining will be scaled up soon, then it's simply obvious that –"

No! Fuck that! I've read the papers you're talking about, I know all the arguments you're handwaving-in-the-direction-of! It still doesn't add up!

I'm reading Almost Nowhere (I'm in part 2, so apologies if this question is somehow answered later, but it doesn't feel like the kind of question the novel itself is interested in answering - so also apologies if it is by the same token not the kind of question *you* are interested in answering) and I can't help wondering: is/are Anne(s) named after the indefinite article? She's not *the* Anne, she's *an* Anne...

Avatar

Not consciously, no.

Insofar as I had any reason at all for picking that name, it was just that it struck me as kind of an old-fashioned name in a way that fit with the fairy tale atmosphere of Michael's tower, the pseudo-19thC books on the shelf, etc.

(Although I'm now looking at Wikipedia's list of notable Annes and there are a lot of fairly recent ones, so maybe I was off-base about it being an old-fashioned name. I often end up with that impression of a name if I've never or rarely encounter anyone of my own age who has it, but that could just be chance, demographics, or a mix of the two)

In this analogy, I am the cat, somewhat befuddled and vaguely alarmed by the equations streaming from the radio, and Paul is a Mary figure, clearly explaining the details.

You are using an unsupported browser and things might not work as intended. Please make sure you're using the latest version of Chrome, Firefox, Safari, or Edge.