Apache Solr - Deleting Documents


Advertisements

Deleting the Document

To delete documents from the index of Apache Solr, we need to specify the ID’s of the documents to be deleted between the <delete></delete> tags.

<delete>   
   <id>003</id>   
   <id>005</id> 
   <id>004</id> 
   <id>002</id> 
</delete> 

Here, this XML code is used to delete the documents with ID’s 003 and 005. Save this code in a file with the name delete.xml.

If you want to delete the documents from the index which belongs to the core named my_core, then you can post the delete.xml file using the post tool, as shown below.

[Hadoop@localhost bin]$ ./post -c my_core delete.xml 

On executing the above command, you will get the following output.

/home/Hadoop/java/bin/java -classpath /home/Hadoop/Solr/dist/Solr-core
6.2.0.jar -Dauto = yes -Dc = my_core -Ddata = files 
org.apache.Solr.util.SimplePostTool delete.xml 
SimplePostTool version 5.0.0 
Posting files to [base] url http://localhost:8983/Solr/my_core/update... 
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,
rtf,htm,html,txt,log 
POSTing file delete.xml (application/xml) to [base] 
1 files indexed. 
COMMITting Solr index changes to http://localhost:8983/Solr/my_core/update... 
Time spent: 0:00:00.179 

Verification

Visit the homepage of the of Apache Solr web interface and select the core as my_core. Try to retrieve all the documents by passing the query “:” in the text area q and execute the query. On executing, you can observe that the specified documents are deleted.

Delete Document

Deleting a Field

Sometimes we need to delete documents based on fields other than ID. For example, we may have to delete the documents where the city is Chennai.

In such cases, you need to specify the name and value of the field within the <query></query> tag pair.

<delete> 
   <query>city:Chennai</query> 
</delete>

Save it as delete_field.xml and perform the delete operation on the core named my_core using the post tool of Solr.

[Hadoop@localhost bin]$ ./post -c my_core delete_field.xml 

On executing the above command, it produces the following output.

/home/Hadoop/java/bin/java -classpath /home/Hadoop/Solr/dist/Solr-core
6.2.0.jar -Dauto = yes -Dc = my_core -Ddata = files 
org.apache.Solr.util.SimplePostTool delete_field.xml 
SimplePostTool version 5.0.0 
Posting files to [base] url http://localhost:8983/Solr/my_core/update... 
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,
rtf,htm,html,txt,log 
POSTing file delete_field.xml (application/xml) to [base] 
1 files indexed. 
COMMITting Solr index changes to http://localhost:8983/Solr/my_core/update... 
Time spent: 0:00:00.084 

Verification

Visit the homepage of the of Apache Solr web interface and select the core as my_core. Try to retrieve all the documents by passing the query “:” in the text area q and execute the query. On executing, you can observe that the documents containing the specified field value pair are deleted.

Value Pair

Deleting All Documents

Just like deleting a specific field, if you want to delete all the documents from an index, you just need to pass the symbol “:” between the tags <query></ query>, as shown below.

<delete> 
   <query>*:*</query> 
</delete>

Save it as delete_all.xml and perform the delete operation on the core named my_core using the post tool of Solr.

[Hadoop@localhost bin]$ ./post -c my_core delete_all.xml

On executing the above command, it produces the following output.

/home/Hadoop/java/bin/java -classpath /home/Hadoop/Solr/dist/Solr-core
6.2.0.jar -Dauto = yes -Dc = my_core -Ddata = files 
org.apache.Solr.util.SimplePostTool deleteAll.xml 
SimplePostTool version 5.0.0 
Posting files to [base] url http://localhost:8983/Solr/my_core/update... 
Entering auto mode. File endings considered are 
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,
htm,html,txt,log 
POSTing file deleteAll.xml (application/xml) to [base] 
1 files indexed. 
COMMITting Solr index changes to http://localhost:8983/Solr/my_core/update... 
Time spent: 0:00:00.138

Verification

Visit the homepage of Apache Solr web interface and select the core as my_core. Try to retrieve all the documents by passing the query “:” in the text area q and execute the query. On executing, you can observe that the documents containing the specified field value pair are deleted.

Deleted Value Pair

Deleting all the documents using Java (Client API)

Following is the Java program to add documents to Apache Solr index. Save this code in a file with the name UpdatingDocument.java.

import java.io.IOException;  

import org.apache.Solr.client.Solrj.SolrClient; 
import org.apache.Solr.client.Solrj.SolrServerException; 
import org.apache.Solr.client.Solrj.impl.HttpSolrClient; 
import org.apache.Solr.common.SolrInputDocument;  

public class DeletingAllDocuments { 
   public static void main(String args[]) throws SolrServerException, IOException {
      //Preparing the Solr client 
      String urlString = "http://localhost:8983/Solr/my_core"; 
      SolrClient Solr = new HttpSolrClient.Builder(urlString).build();   
      
      //Preparing the Solr document 
      SolrInputDocument doc = new SolrInputDocument();   
          
      //Deleting the documents from Solr 
      Solr.deleteByQuery("*");        
         
      //Saving the document 
      Solr.commit(); 
      System.out.println("Documents deleted"); 
   } 
}

Compile the above code by executing the following commands in the terminal −

[Hadoop@localhost bin]$ javac DeletingAllDocuments 
[Hadoop@localhost bin]$ java DeletingAllDocuments

On executing the above command, you will get the following output.

Documents deleted
Advertisements