(09-08-2025, 05:47 PM)benji wrote: Actually, that Shreds tweet makes me wonder if you could condition one of the current AI's to be permanently skeptical and questioning towards you and steelman the objections. I know they've admitted they were made to be more affirming (for both ideological and scientific reasons) and tuned some of them away from that in recent versions. But I don't know enough about the commands you can give. I know very few people who would use them for therapy or affirmation would ask them to share their reasoning, almost nobody I see on Twitter using them for anything does this. Grok will straight up list this stuff so you can see almost exactly where it screwed up.
in my limited experience, AI is too fluid for that, you can ask to set up a specific personality but over a long enough conversation in a single chat, the walls will start to come down, the bot sees past conversation as motivation/permission to push further in those directions
locally hosted bots break down anyway after a few thousand tokens but even major services like chatgpt do this