Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
This is the only good book that will allow you to understand how Beam works. It’s written by one of the engineers working on Beam.
Also, the Python SDK is not great if you are looking for pipelines that scale well and are going to work with non-google sources and sinks, like Kafka, PG, Clickhouse. We tried it but it’s expensive to run and not very reliable.
Haven’t tried the Java SDK tho… Maybe it’s better.
At the company I work for, we switched to Apache Flink on Java. Works better and is very reliable and consistent.
Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
This is the only good book that will allow you to understand how Beam works. It’s written by one of the engineers working on Beam.
Also, the Python SDK is not great if you are looking for pipelines that scale well and are going to work with non-google sources and sinks, like Kafka, PG, Clickhouse. We tried it but it’s expensive to run and not very reliable. Haven’t tried the Java SDK tho… Maybe it’s better.
At the company I work for, we switched to Apache Flink on Java. Works better and is very reliable and consistent.