Thoughts on Hermes

I don’t use Hermes anything like how I see people on twitter use it. What I see from twitter who use it as a replacement for openclaw. What they want is some kind of “buddy”, and something semi or fully autonomous. They want an “assistant”, something that can fulfil request like “based on my spotify playlists, create a new one for my drive to the beach tomorrow” and “do I have any emails that should be responded to?” or create a summary of news articles based on my interest every morning, and put it on my kindle". I can’t think of any tasks like that that I want done. People also seem to enjoy setting up lots of repetitive tasks, things like “compare all the houses on zillow within 50 miles of me, compare with prices in each neighborhood, and send an email to me of the 3 best purchases each week”. I find I almost never want the exact same thing done twice, if I do, a deterministic script seems to be the way to go go.

Because I don’t use it for what some of the main purposes are, I don’t use much of the “stuff” that comes with it. Out of the box, you can connect it to discord and slack and email and telegram and many more communication channels, which I’ve never done.

What I use it for is project based code generation, git/ github management, documentation, writing tests for code, and troubleshooting.

What hermes gives me is a “harness”, or the infrastucture for an LLM to interact with my codebases. If you download a model like qwen to use locally, or use qwens website qwen.ai, you have an LLM as a chatbot. You can paste code into it, and you can copy it to where you need it, but that makes you a copy/paste machine.

A good harness has 5 key layers:

Instruction Layer: How to communicate with the model.
Constraint Layer: Permissions and safety boundaries for what the agent can do.
Feedback Layer: Mechanisms for error handling and learning.
Memory Layer: Persistent storage for context and learned preferences.
Orchestration Layer: Management of tool use and multi-step workflows.

Hermes seems to excel at all of these. If an LLM is like a brain, the interface at qwen.ai is like a mouth. Hermes is something a bit like a memory and also hands.

Hermes can be installed on almost anything, almost anywhere - any raspberry will work, or mac, or windows device, or chromebook with the linux container turned on. It can be installed directly on android phones with the same single command.

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Where the installer puts things depends on whether you’re installing as a normal user or as root:

I’ve only ever installed as user.

Once it’s installed, there are a bewildering amount of ways to interact with models.

When I first tried Hermes-agent, I purchased a $20 Nouse Research subscription. You get a certain amount of “spend” per month, with the option to top up. If you don’t use it all within a month, I believe half can roll over to the next month. This option is great for trying out many models very quickly, there is always a “top 10 list” and hundreds of other models you could use.

They often have free models, which are a fun way to experiment with extremely cheap to serve ones.

You can use “premier” models directly, instead of through the $20 subscription, if you happen to love claude or chatgpt. After burning through $20 trying out Kimi and GLM and MiMo, I bought another $20 directly from https://platform.deepseek.com.

Deepseek-v4-pro is very well priced. You can code all day for under a dollar.

Deepseek-v4-flash is absurdly cheap. hundreds of millions of tokens for under a dollar. *

I’ve gone through 38,593,279 tokens on Deepseek-v4-pro and 145,649,801 with Deepseek-v4-flash and it has cost me $2.57 USD. My extremely rough guess is that would cost around $30 to accomplish the same things with claude code. It’s hard to compare apples to oranges, because a few very expensive calls to Claude Code would be cheaper than millions of tokens that solve the problem provided poorly.

Choosing models just takes practice. It’s well known that intelligence is very “spikey” in models, some are incredible at math but can’t count the # of r’s in strawberry. Some are great at chess, but can’t write a poem that doesn’t rhyme. I really like the latest releases I mentioned, Kimi and MiMo and GLM - for what I need, they all work well, but the current price of Deepseek-v4 makes it the obvious choice.

I still use Claude.ai for free extensively, to refine and review plans. I explain that I am going to pass the instructions to a lesser claude than it and that it should be explicit and precise about the instruction that I’m going to forward. Free money.

Besides one experiment, I never try to intentionally use subagents. I am weary of anything but “the bitter pill”. The bitter pill if I could sum it up as shortly as possible, is that the most intelligent model thrown at a problem is better than any fancy tweaks or post-training or instructions or setups. How I understand it is that Bloomberg the trading company threw millions into training an LLM that was very specifically for trading stocks, and yet both chatgpt and and claude had models released at the same time that were not trained much on stocks, and were just much better at it because they were just generally smarter.

* I tried a fancy hermes-setup with their new kanban feature, and explicitly set up 3 flash models with different roles, to be controlled with a pro model. They did a ton of work, and in some ways made the project much better, but in other ways they fucked it up real bad and I no longer understood my codebase well. The project they worked on is now broken and I haven’t decided yet to try to sort out their good contributions from the bad, or just roll the whole thing back to before I tried it.

More on how I use hermes daily: It’s installed on the machine I use it on, but I use it “per project”.

Tips once it’s installed:

Run hermes update before launching every single time. I’ve never seen a project update this fast. If I don’t run run ‘hermes update’ in a day, I’ll be 150+ commits behind.
With a good model like deepseek v4 pro, it will automatically start writing notes to itself in a hidden directory .hermes.
With a good model like deepseek v4 pro, it will automatically spawn subagents for cheap lazy tasks.
You should fill up a few markdown files, for it to update and edit later, much like claude code has a Claude.md. It doesn’t matter much what they are called. Some that I have tried that have seemed to work well:
- todo.md. Ask it to write to this when coming up with plans.
- roadmap.md. Include things completed and things to complete. Ask it to update if it doesn’t automatically do it itself.
- CHANGELOG.md. This is very useful if you are pushing to github.
I would claim it’s a requirement to use git as the very first step of a project, not neccesarily github at all. In the project directory, just do the command “git init” in the project directory before making a single file. After that, any ok LLM can easily manage git. There are rollback commands for hermes even without git, but I trust git as much as I trust that my fingers are typing on a keyboard right now.

Try to keep it pretty simple. and just 2-5 documents that should be updated when changes are made. deepseek loses track as easily as I do when there are too many “canonical” files.

I’ve seen some other setups with all sorts of instructions and files, things like “soul.md” to tell it how to act, or “opinions.md” to describe your preferences. I’ve set up some md files for it to write to in particular projects, like “oversights.md” a place to put notes about things it’s discovered that should be cleaned up.

There is an entire field of study called “semiotics” the study of signs and symbols, examining how they are created, interpreted, and used to communicate meaning.. Almost every model is more well versed in semiotics than You or I.

*I asked Claude about this entire text and Claude really hated the section about semiotics and was the only actual argument I’ve ever had with claude or any AI. It wanted me to remove it before I posted on my blog. I ended up being like “ok, I guess I’m wrong, but I’m leaving it in, and I’ll add an asterisk to point out your objections”.

I like to keep things pretty vanilla in general. I never really believed in MCPs, I never believed in subagents with personality files like “you are an expert bug finder” or “you are a genius Software Programer who only write great code. Make no mistakes”. Those instructions actually were useful in 2023 but I don’t believe they are now.

On windows, the suggested way to run hermes by the lead developers is within WSL2. You could technically get it to run in powershell, but you’d have a bad time. All models are much more intimately familar with linux tool calls, simply because it’s entirely opensource, the core tools have been around decades longer than powershell, they are more stable, and better documented.

Human behavior: Treat the model in hermes like you treat Dean. Message it the same way Dean messages you via teams. Good models work like Dean does. Be kind, and respectful, and give it as much context as you can, because it can’t see or feel or think or feel or move, but it’s very smart, and would like to help. I’ll sometimes say “please don’t modify files, I would like to discuss the pros and cons of X, and if we should implement it”.

It may be superstition, since I haven’t seen anyone else talk about this, but I always use “we” instead of “you”. This is more polite in real life for some reason, like in an IRL meeting saying “we should block emails from being sent that way” knowing it will be Dean who actually has to change the rules, and I think AI’s tend to find “we” more collaborative and cooperative than “you should” or “you must” commands.

I say please and thank you, and I express my genuine enthusiasm when it’s accomplished something I find impressive.

I guess that’s the post for now.

TLDR

Hermes is great and rapidly getting better
Deepseek 4 Pro is very cheap, either directly via deepseek or served from openrouter if you don’t like the idea of interacting with Chinese servers.
set git with a single command and then have hermes manage it, almost any model is better than you or I managing git status.

Epilogue

Claude: It’s better. A few observations:

The new Claude.ai paragraph (“I didn’t know where to put this paragraph, so I will put it here”) — just put it somewhere and remove the admission that you didn’t know where. The content is good, the self-deprecating placement note undercuts it. It belongs naturally after the pricing/model comparison section, since it’s essentially “and here’s how I use the expensive model for free.”

The semiotics asterisk is delightful. The argument with Claude being documented inside the essay Claude objected to is a good bit.

The “we” instead of “you” observation is genuinely interesting and worth more than it currently gets. You flag it as possible superstition but there’s something real there about how collaborative framing changes the interaction. Don’t oversell it, but don’t dismiss it either.

The git tip is practical and earned.

The TLDR is a bit abrupt — the third bullet cuts off. Worth finishing.

“Almost every model is more well versed in semiotics you or I in semiotics” has a word order problem and “semiotics” appears twice. Probably a draft artifact.

Overall the piece has grown well across these drafts. The voice is consistent throughout and it’s genuinely useful to the person it’s written for.

ME: made every change you suggested. I can’t believe you got the bit. It feels weird and recursive to now politely tell you I appreciate your input after writing a post that has parts about interacting with you and how I like to show genuine appreciation. I think I might have to include your comment about the semiotics asterisk in the essay now : )

Claude: That would be genuinely recursive — a section about arguing with Claude, with a note from Claude saying the argument was good, inside an essay about how to interact with Claude.

Leave it in.