AVRO - Schemas


Advertisements

Avro, being a schema-based serialization utility, accepts schemas as input. In spite of various schemas being available, Avro follows its own standards of defining schemas. These schemas describe the following details −

  • type of file (record by default)
  • location of record
  • name of the record
  • fields in the record with their corresponding data types

Using these schemas, you can store serialized values in binary format using less space. These values are stored without any metadata.

Creating Avro Schemas

The Avro schema is created in JavaScript Object Notation (JSON) document format, which is a lightweight text-based data interchange format. It is created in one of the following ways −

  • A JSON string
  • A JSON object
  • A JSON array

Example − The following example shows a schema, which defines a document, under the name space Howcodex, with name Employee, having fields name and age.

{
   "type" : "record",
   "namespace" : "Howcodex",
   "name" : "Employee",
   "fields" : [
      { "name" : "Name" , "type" : "string" },
      { "name" : "Age" , "type" : "int" }
   ]
}

In this example, you can observe that there are four fields for each record −

  • type − This field comes under the document as well as the under the field named fields.

    • In case of document, it shows the type of the document, generally a record because there are multiple fields.

    • When it is field, the type describes data type.

  • namespace − This field describes the name of the namespace in which the object resides.

  • name − This field comes under the document as well as the under the field named fields.

    • In case of document, it describes the schema name. This schema name together with the namespace, uniquely identifies the schema within the store (Namespace.schema name). In the above example, the full name of the schema will be Howcodex.Employee.

    • In case of fields, it describes name of the field.

Primitive Data Types of Avro

Avro schema is having primitive data types as well as complex data types. The following table describes the primitive data types of Avro −

Data type Description
null Null is a type having no value.
int 32-bit signed integer.
long 64-bit signed integer.
float single precision (32-bit) IEEE 754 floating-point number.
double double precision (64-bit) IEEE 754 floating-point number.
bytes sequence of 8-bit unsigned bytes.
string Unicode character sequence.

Complex Data Types of Avro

Along with primitive data types, Avro provides six complex data types namely Records, Enums, Arrays, Maps, Unions, and Fixed.

Record

A record data type in Avro is a collection of multiple attributes. It supports the following attributes −

  • name − The value of this field holds the name of the record.

  • namespace − The value of this field holds the name of the namespace where the object is stored.

  • type − The value of this attribute holds either the type of the document (record) or the datatype of the field in the schema.

  • fields − This field holds a JSON array, which have the list of all of the fields in the schema, each having name and the type attributes.

Example

Given below is the example of a record.

{
" type " : "record",
" namespace " : "Howcodex",
" name " : "Employee",
" fields " : [
 { "name" : " Name" , "type" : "string" },
 { "name" : "age" , "type" : "int" }
 ]
}

Enum

An enumeration is a list of items in a collection, Avro enumeration supports the following attributes −

  • name − The value of this field holds the name of the enumeration.

  • namespace − The value of this field contains the string that qualifies the name of the Enumeration.

  • symbols − The value of this field holds the enum's symbols as an array of names.

Example

Given below is the example of an enumeration.

{
   "type" : "enum",
   "name" : "Numbers", 
   "namespace": "data", 
   "symbols" : [ "ONE", "TWO", "THREE", "FOUR" ]
}

Arrays

This data type defines an array field having a single attribute items. This items attribute specifies the type of items in the array.

Example

{ " type " : " array ", " items " : " int " }

Maps

The map data type is an array of key-value pairs, it organizes data as key-value pairs. The key for an Avro map must be a string. The values of a map hold the data type of the content of map.

Example

{"type" : "map", "values" : "int"}

Unions

A union datatype is used whenever the field has one or more datatypes. They are represented as JSON arrays. For example, if a field that could be either an int or null, then the union is represented as ["int", "null"].

Example

Given below is an example document using unions −

{ 
   "type" : "record", 
   "namespace" : "howcodex", 
   "name" : "empdetails ", 
   "fields" : 
   [ 
      { "name" : "experience", "type": ["int", "null"] }, { "name" : "age", "type": "int" } 
   ] 
}

Fixed

This data type is used to declare a fixed-sized field that can be used for storing binary data. It has field name and data as attributes. Name holds the name of the field, and size holds the size of the field.

Example

{ "type" : "fixed" , "name" : "bdata", "size" : 1048576}
Advertisements