Streaming Change Data Capture Data Two Ways
Every database event is important: don’t let them rot away in an old batch, forgotten to the ravages of time and irrelevance. Let’s capture all that data.
Since we are out of the office and working remotely, I need our relational database records to follow us and be sent offsite. Our physical tables may be empty, but our database ones are not. Let’s get that data streaming and useful.
CDC (Change Data Capture, not Center for Disease Control and not Cat Data Capture) is well defined in Wikipedia and in this article.
Sometimes you don’t need pure change data capture. Sometimes you can just get data when an ID or date increments. You can do that really easily at scale (including grabbing every table in a database) with Apache NiFi. For further reading:
Cloudera CDC Use Cases