💾 Archived View for dioskouroi.xyz › thread › 29443137 captured on 2021-12-04 at 18:04:22. Gemini links have been rewritten to link to archived content
➡️ Next capture (2021-12-05)
-=-=-=-=-=-=-
________________________________________________________________________________
Forseti is an open-source project to build dependency graphs of your usage of GCP services. It's primarily designed for security but can be applied into other areas. You could extend the model to understand relationships between services and SLAs, but this would be limited to how you design and run services on top of GCP.
https://forsetisecurity.org/docs/latest/concepts/
To understand the underlying design of how each GCP service relates to each other is complex and definitely not available to the public. There is also a huge amount of nuance between GCP services relying on underlying Google services vs. other GCP services. Container Registry and Artifact Registry may both depend on the same underlying storage service, which isn't necessarily GCS, but could be an internal Google storage service. How this is specifically managed, partitioned and run is very hard to extract. Failure modes and scenarios are well designed and understood internally, but not shared publicly.
If you had a very specific use case you could approach your Google Cloud TAM/sales/customer engineer with the questions and they will be able to help you understand.
Source: Former Customer Engineer in Google Cloud for 4 years
It's GCS.
I'm not sure even the cloud providers themselves could give you this information.
In Nov 2020, AWS Kinesis Firehose went down for a few hours and took down a slew of other services that depended on each other (Cloudwatch depends on kinesis, ec2 autoscaling and lambda depend on cloudwatch, everyone depends on ec2 and lambda...)
This was all sort of a large surprise internally that a "small" component like kinesis streams could take down so much.
https://aws.amazon.com/message/11201/
Im sure this must be documented somewhere internally. Whenever a new region is built for AWS, services must be deployed in topologically sorted order. So I'd imagine figuring out dependencies for each service is something that comes up quite often.
If you take into account dependencies in different regions it’s even more problematic (yes there shouldn’t be any but there usually are some)
For GCP you can enumerate and graph the publicly visible dependencies as per this blog post:
https://binx.io/blog/2020/10/03/how-to-find-google-cloud-pla...
However, that does not take account of GCP services being implemented behind the scenes using other GCP technologies in Google-managed projects - e.g. Cloud SQL uses Compute Engine and GCR (search "speckle umbrella"). Cloud Functions relies on Cloud Build to compile the function into a container. AI Platform Training uses a GKE cluster internally.
You can often get hints about these things from the VPC-SC documentation, which explains on a per-service basis which APIs need to be enabled to protect the perimeter:
https://cloud.google.com/vpc-service-controls/docs/supported...
You will never have information about it, even with account manager provided NDAs with cloud providers. And it's a surprise, like Kinesis took a lot of services with it, I would not imagine it. And Facebook disappeared from internet with BGP misconfiguration. My personal experience is that most AWS outages effect one service, one region, but in GCP there is more global outages. But AWS has fair share of some global outages themselves like Kinesis and S3 back then.
this is pure blatant self-promotion on my end, but I think we built - at least directionally - what you’re asking for:
https://github.com/someengineering/cloudkeeper
I’ll reply more in depth since I’m on the run right now, but for now I hope the link is sufficient.
You can check the previous incidents and see which services go down together? Just an idea.
If you use Azure Application insights, it will generate a dependency graph of your application.