I often find myself wondering if all this is really that much better than the old rsyslog/syslog-ng stuff we used to do. These days its all some timeseries on top of journald, has anyone tested the remote journald options? Also at what point do we break logs out from metrics? Trying to do everything in one tool is so un-*nix.
What we do internally is use syslog-ng (https://github.com/syslog-ng/syslog-ng) to read the journald socket and push to a remote and into Kafka. I think journald works well as a structured logging tool, but it's certainly deficient in other ways
I have looked into ElasticSearch + Kibana as a solution to aggregate logs. There may be plenty of choice to replace ElasticSearch(ClickHouse, even Postgres, heck even journald), but a nice UI where you can simply search for that random piece of text you need to sift through the logs is the red herring.
Until now, I have not seen a web interface to log as powerful as Kibana that can work with anything other than ElasticSearch.
This is why I chose to stop my search and pay for Datadog to do this correctly, and simply allow me to search for that keyword on logs when I need it the most(and not worry about whether I indexed stuff correctly, or balanced some whatever in ElasticSearch, or remembered to setup something far too technical for a log system). Datadog allows you to keep a short periods worth of data in the index and "expire" old content into archives while retaining the ability to add them back to index if needed for any investigation.
journald is not good at handling a lot of data, nor is it good at managing imported data. (It could be improved, probably "easily", but it's main feature is that it's an "always on" not terribly dumb log target, it's not a long term log management system.)
Hmm. Never tried to use journald at any reasonable scale beyond tens of servers. Good to know its characteristics.
To be honest I wasn’t looking for a long term log management system and that is why Journald even came up in mind. If it could aggregate logs from several servers and retain them for a week while expiring older logs to an archive source, it’s sufficient for my needs.
Exactly why I wrote my comment. :) Because it seems it's able to do that, but not really. And it seems easy to fix, but of course patches are welcome. (Hopefully.)
Haven’t played with the logs part of Grafana recently, but would it work on top of say Clickhouse? I thought it was more tuned for the Loki use case… is it not?
I'd agree, it isnt good for exploratory queries.
But if you have some predefined ES queries for correlating log messages to metrics it can be useful to have it all in one dashboard.