In the previous chapter, we explained how to add data into Solr which is in JSON and .CSV file formats. In this chapter, we will demonstrate how to add data in Apache Solr index using XML document format.
Suppose we need to add the following data to Solr index using the XML file format.
Student ID | First Name | Last Name | Phone | City |
---|---|---|---|---|
001 | Rajiv | Reddy | 9848022337 | Hyderabad |
002 | Siddharth | Bhattacharya | 9848022338 | Kolkata |
003 | Rajesh | Khanna | 9848022339 | Delhi |
004 | Preethi | Agarwal | 9848022330 | Pune |
005 | Trupthi | Mohanty | 9848022336 | Bhubaneshwar |
006 | Archana | Mishra | 9848022335 | Chennai |
To add the above data into Solr index, we need to prepare an XML document, as shown below. Save this document in a file with the name sample.xml.
<add> <doc> <field name = "id">001</field> <field name = "first name">Rajiv</field> <field name = "last name">Reddy</field> <field name = "phone">9848022337</field> <field name = "city">Hyderabad</field> </doc> <doc> <field name = "id">002</field> <field name = "first name">Siddarth</field> <field name = "last name">Battacharya</field> <field name = "phone">9848022338</field> <field name = "city">Kolkata</field> </doc> <doc> <field name = "id">003</field> <field name = "first name">Rajesh</field> <field name = "last name">Khanna</field> <field name = "phone">9848022339</field> <field name = "city">Delhi</field> </doc> <doc> <field name = "id">004</field> <field name = "first name">Preethi</field> <field name = "last name">Agarwal</field> <field name = "phone">9848022330</field> <field name = "city">Pune</field> </doc> <doc> <field name = "id">005</field> <field name = "first name">Trupthi</field> <field name = "last name">Mohanthy</field> <field name = "phone">9848022336</field> <field name = "city">Bhuwaeshwar</field> </doc> <doc> <field name = "id">006</field> <field name = "first name">Archana</field> <field name = "last name">Mishra</field> <field name = "phone">9848022335</field> <field name = "city">Chennai</field> </doc> </add>
As you can observe, the XML file written to add data to index contains three important tags namely, <add> </add>, <doc></doc>, and < field >< /field >.
add − This is the root tag for adding documents to the index. It contains one or more documents that are to be added.
doc − The documents we add should be wrapped within the <doc></doc> tags. This document contains the data in the form of fields.
field − The field tag holds the name and value of the fields of the document.
After preparing the document, you can add this document to the index using any of the means discussed in the previous chapter.
Suppose the XML file exists in the bin directory of Solr and it is to be indexed in the core named my_core, then you can add it to Solr index using the post tool as follows −
[Hadoop@localhost bin]$ ./post -c my_core sample.xml
On executing the above command, you will get the following output.
/home/Hadoop/java/bin/java -classpath /home/Hadoop/Solr/dist/Solr- core6.2.0.jar -Dauto = yes -Dc = my_core -Ddata = files org.apache.Solr.util.SimplePostTool sample.xml SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/Solr/my_core/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx, xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file sample.xml (application/xml) to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/Solr/my_core/update... Time spent: 0:00:00.201
Visit the homepage of Apache Solr web interface and select the core my_core. Try to retrieve all the documents by passing the query “:” in the text area q and execute the query. On executing, you can observe that the desired data is added to the Solr index.