One of the persistent challenges I run into in this area, is that any sort of up front filtering/routing requires you to know in advanced which logs are going to be important when an issue happens. Which is sort of impossible. And nobody wants to be the guy that filtered out some logs because they looked useless and then only later on realize they would have been instrumental in getting back up and running quickly.
One of the biggest problem we hear about from CISOs is 'they don't know what they don't know' - meaning they need a way to catch all the data. This plays pretty directly into your comment - there's a need for wanting everything, but a penalty for having everything - slower queries, expensive, more false positives, slower time to resolution.
What's common as a middle ground is blob storage and rehydration - where you send everything into low cost storage like S3 while still peeling off the high value data into the SIEM / Datadog / etc. Then if you notice something is amiss, you can rehydrate the time window you care about.
Kudos for being self-aware and acknowledging that solving the problem which you saw doesn't always translate into solving the problem that potential customers want to pay you to solve.
One of my favorite talks [0] speaks to the problem with thinking that telemetry is valuable just because it is [logs|metrics|traces].
Alerts/notifications etc. are an attempt to distill something useful from something that is abundant. From Cribl's `About Us` page [1] -- \ ˈkribəl \ - “An instrument with a meshed or perforated bottom generally used for gold panning in order to strain valuable material from discardable matter.”
Quite a few years ago, I led a migration off from a legacy logging provider that offered little more than full text search over unstructured text.
Logging at the time was somewhere in the ballpark of 1% of our total common infrastructure spend and widely acknowledged as too expensive relative to the minimal value we got from it with that rudimentary feature set, but it also was nowhere near enough cost to justify doing something about it. We had other observability costs that dwarfed it.
What finally justified the overhaul was that security couldn’t really operate usefully on log data unless we pulled the data out somewhere else like Athena and processed it there. That slowed down security incident response times dramatically.
The migration ultimately benefited the whole engineering organization but it had to be security led to get any traction.
Who are security teams? At which size company hires those? Is having a security team driven by copmliance to get certaion certificates that required by vendoes? Did the product you built addressed the observability need, if so, why it wasn't used much?
One of the persistent challenges I run into in this area, is that any sort of up front filtering/routing requires you to know in advanced which logs are going to be important when an issue happens. Which is sort of impossible. And nobody wants to be the guy that filtered out some logs because they looked useless and then only later on realize they would have been instrumental in getting back up and running quickly.
One of the biggest problem we hear about from CISOs is 'they don't know what they don't know' - meaning they need a way to catch all the data. This plays pretty directly into your comment - there's a need for wanting everything, but a penalty for having everything - slower queries, expensive, more false positives, slower time to resolution.
What's common as a middle ground is blob storage and rehydration - where you send everything into low cost storage like S3 while still peeling off the high value data into the SIEM / Datadog / etc. Then if you notice something is amiss, you can rehydrate the time window you care about.
Kudos for being self-aware and acknowledging that solving the problem which you saw doesn't always translate into solving the problem that potential customers want to pay you to solve.
One of my favorite talks [0] speaks to the problem with thinking that telemetry is valuable just because it is [logs|metrics|traces].
Alerts/notifications etc. are an attempt to distill something useful from something that is abundant. From Cribl's `About Us` page [1] -- \ ˈkribəl \ - “An instrument with a meshed or perforated bottom generally used for gold panning in order to strain valuable material from discardable matter.”
[0] https://www.youtube.com/watch?v=qTf5pli3qRU
[1] https://cribl.io/about-us/
Quite a few years ago, I led a migration off from a legacy logging provider that offered little more than full text search over unstructured text.
Logging at the time was somewhere in the ballpark of 1% of our total common infrastructure spend and widely acknowledged as too expensive relative to the minimal value we got from it with that rudimentary feature set, but it also was nowhere near enough cost to justify doing something about it. We had other observability costs that dwarfed it.
What finally justified the overhaul was that security couldn’t really operate usefully on log data unless we pulled the data out somewhere else like Athena and processed it there. That slowed down security incident response times dramatically.
The migration ultimately benefited the whole engineering organization but it had to be security led to get any traction.
Who are security teams? At which size company hires those? Is having a security team driven by copmliance to get certaion certificates that required by vendoes? Did the product you built addressed the observability need, if so, why it wasn't used much?
Are you pitching or complaining? Because I can't tell.
Mostly aiming to share my anecdote about being too close to a problem for others to learn from, and to pitch - to validate the security thesis.
[dead]