Thursday, November 17, 2011

Building Solr as Maven Project

Now days maven is most commonly used build tool. But solr still uses ant build. In this article I am going to describe how you can quickly create Solr Maven Project and run solr using jetty.

Solr war is available in Maven Central repository(http://mvnrepository.com/artifact/org.apache.solr/solr).

Steps to build Solr Maven Project:

  1. Create a Maven Project from eclipse using: File --> New --> Project...
    1. Select Maven --> Maven Project from list
    2. Check Create Simple Project
    3. Specify some value in groupId and artifactId
    4. Select Packaging type: war
    5. Press Finish. This will create an empty maven project with basic folder structure in eclipse
  2. Now Edit pom.xml and update it with below content:
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.apache.mysolr</groupId>
    <artifactId>mysolr</artifactId>
    <version>3.4.0</version>
    <packaging>war</packaging>
    <name>Apache Solr Web Application</name>
    <repositories>
    <repository>
    <id>central</id>
    <name>Maven Repository Switchboard</name>
    <layout>default</layout>
    <url>http://repo1.maven.org/maven2</url>
    </repository>
    </repositories>
    <pluginRepositories>
    <pluginRepository>
    <id>central</id>
    <name>Maven Plugin Repository</name>
    <url>http://repo1.maven.org/maven2</url>
    <layout>default</layout>
    </pluginRepository>
    </pluginRepositories>
    <dependencies>
    <dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr</artifactId>
    <version>3.4.0</version>
    <type>war</type>
    </dependency>
    </dependencies>
    <build>
    <plugins>
    <plugin>
    <groupId>org.mortbay.jetty</groupId>
    <artifactId>maven-jetty-plugin</artifactId>
    <version>6.1.15.rc4</version>
    </plugin>
    </plugins>
    </build>
    </project>
  3. Copy the solr config files in the folder src/main/resources of the project.
    solrconfig.xml
    schema.xml
    elevate.xml
    protwords.txt
    stopwords.txt
    spellings.txt
    synonyms.txt
  4. Copy the default solr web.xml content from below to src/main/webapp/WEB-INF/web.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">
    <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements. See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    -->

    <web-app>

    <!-- Uncomment if you are trying to use a Resin version before 3.0.19.
    Their XML implementation isn't entirely compatible with Xerces.
    Below are the implementations to use with Sun's JVM.
    <system-property javax.xml.xpath.XPathFactory=
    "com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl"/>
    <system-property javax.xml.parsers.DocumentBuilderFactory=
    "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl"/>
    <system-property javax.xml.parsers.SAXParserFactory=
    "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"/>
    -->

    <!-- People who want to hardcode their "Solr Home" directly into the
    WAR File can set the JNDI property here...
    -->
    <!--
    <env-entry>
    <env-entry-name>solr/home</env-entry-name>
    <env-entry-value>/put/your/solr/home/here</env-entry-value>
    <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
    -->

    <!-- Any path (name) registered in solrconfig.xml will be sent to that filter -->
    <filter>
    <filter-name>SolrRequestFilter</filter-name>
    <filter-class>org.apache.solr.servlet.SolrDispatchFilter</filter-class>
    <!-- If you are wiring Solr into a larger web application which controls
    the web context root, you will probably want to mount Solr under
    a path prefix (app.war with /app/solr mounted into it, for example).
    You will need to put this prefix in front of the SolrDispatchFilter
    url-pattern mapping too (/solr/*), and also on any paths for
    legacy Solr servlet mappings you may be using.
    For the admin JSP's to work properly in a path-prefixed configuration,
    the admin folder containing the JSPs needs to be under the app context root
    named to match the path-prefix. For example:

    .war
    xxx
    admin
    stats.jsp
    -->
    <!--
    <init-param>
    <param-name>path-prefix</param-name>
    <param-value>/xxx</param-value>
    </init-param>
    -->
    </filter>

    <filter-mapping>
    <!--
    NOTE: When using multicore, /admin JSP URLs with a core specified
    such as /solr/coreName/admin/stats.jsp get forwarded by a
    RequestDispatcher to /solr/admin/stats.jsp with the specified core
    put into request scope keyed as "org.apache.solr.SolrCore".

    It is unnecessary, and potentially problematic, to have the SolrDispatchFilter
    configured to also filter on forwards. Do not configure
    this dispatcher as <dispatcher>FORWARD</dispatcher>.
    -->
    <filter-name>SolrRequestFilter</filter-name>
    <url-pattern>/*</url-pattern>
    </filter-mapping>
    <!-- Otherwise it will continue to the old servlets -->

    <servlet>
    <servlet-name>SolrServer</servlet-name>
    <display-name>Solr</display-name>
    <description>Solr Server</description>
    <servlet-class>org.apache.solr.servlet.SolrServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
    </servlet>

    <servlet>
    <servlet-name>SolrUpdate</servlet-name>
    <display-name>SolrUpdate</display-name>
    <description>Solr Update Handler</description>
    <servlet-class>org.apache.solr.servlet.SolrUpdateServlet</servlet-class>
    <load-on-startup>2</load-on-startup>
    </servlet>

    <servlet>
    <servlet-name>Logging</servlet-name>
    <servlet-class>org.apache.solr.servlet.LogLevelSelection</servlet-class>
    </servlet>

    <!-- @Deprecated -->
    <servlet>
    <servlet-name>ping</servlet-name>
    <jsp-file>/admin/ping.jsp</jsp-file>
    </servlet>

    <servlet-mapping>
    <servlet-name>SolrServer</servlet-name>
    <url-pattern>/select/*</url-pattern>
    </servlet-mapping>

    <servlet-mapping>
    <servlet-name>SolrUpdate</servlet-name>
    <url-pattern>/update/*</url-pattern>
    </servlet-mapping>

    <servlet-mapping>
    <servlet-name>Logging</servlet-name>
    <url-pattern>/admin/logging</url-pattern>
    </servlet-mapping>

    <!-- @Deprecated -->
    <servlet-mapping>
    <servlet-name>ping</servlet-name>
    <url-pattern>/admin/ping</url-pattern>
    </servlet-mapping>

    <!-- @Deprecated -->
    <servlet-mapping>
    <servlet-name>Logging</servlet-name>
    <url-pattern>/admin/logging.jsp</url-pattern>
    </servlet-mapping>

    <mime-mapping>
    <extension>.xsl</extension>
    <!-- per http://www.w3.org/TR/2006/PR-xslt20-20061121/ -->
    <mime-type>application/xslt+xml</mime-type>
    </mime-mapping>

    <welcome-file-list>
    <welcome-file>index.jsp</welcome-file>
    <welcome-file>index.html</welcome-file>
    </welcome-file-list>

    </web-app>


  5. Run the solr using Run As --> Run configurations...
    • Specify jetty:run-exploded in Goals field and press Run

  6. Access Solr at http://localhost:8080/<artifactId>/admin/

Wednesday, November 9, 2011

Solr Delta Import - Delete Index data

Delete Index data is essential requirement if you wants to have incremental delta imports. On Solr wiki delete is not described in very detail, so I thought of document the issues I faced and how we can delete indexing data with delta import.

schema.xml:
<fields>
<field name="userid" type="string" indexed="true" stored="true"
required="true"></field>
<field name="emailid" type="text" indexed="true" stored="true"></field>
<field name="name" type="text" indexed="true" stored="true"></field>
<field name="address" type="text" indexed="true" stored="true">
</field>
<uniquekey>emailid</uniquekey>
</fields>

data-config.xml:
<dataconfig>
<document>
<entity name="users" pk="emailid" query="SELECT * from users"
deletedPkQuery="SELECT emailid FROM users WHERE is_deleted = true and modification_date >
'${dataimporter.last_index_time}'"
<!-- deletedPkQuery must have emailid in select query -->
<field column="userid" name="userid"></field>
<field column="emailid" name="emaild"></field>
<field column="name" name="name"></field>
<field column="address" name="address"></field>
</entity>
</document>
</dataconfig>

Index data is deleted by uniqueKey defined in schema.xml and not by pk defined at entity level in data-config.xml
If pk at top level entity is not same as uniqueKey in your data-config uniqueKey, Delete will not work though in log file you will see number of deleted documents. This is because the fetched value will be matched with uniqueKey field.
Rows are deleted by uniqueKey.
Here is the method from SolrWriter which is actually called when documents are deleted using deletedPkQuery.

public void deleteDoc(Object id) { //here id value must be from uniqueKey field
try {
log.info("Deleting document: " + id);
DeleteUpdateCommand delCmd = new DeleteUpdateCommand();
delCmd.id = id.toString();
delCmd.fromPending = true;
delCmd.fromCommitted = true;
processor.processDelete(delCmd);
} catch (IOException e) {
log.error("Exception while deleteing: " + id, e);
}
}