Apache Avro to ORC Using Apache Gobblin
Join the DZone community and get the full member experience.
Join For Free
Apache Avro and Apache ORC (Optimized Row Columnar) are top-level projects under the Apache Software Foundation. Fundamentally, they are data serialization formats with different strengths.Â
Apache Avro is an efficient row-based binary file format for serializing data during transfer or at rest. It uses a schema to define the data structure that has to be serialized, and the schema is collocated and stored as part of Avro’s data file. As frequently needed in big data space, Avro was designed to support data evolution by allowing the augmentation of new fields to the data structure without the need for a complete recompilation of the code that uses it.Â