Thursday, November 17, 2011

Building Solr as Maven Project

Now days maven is most commonly used build tool. But solr still uses ant build. In this article I am going to describe how you can quickly create Solr Maven Project and run solr using jetty.

Solr war is available in Maven Central repository(http://mvnrepository.com/artifact/org.apache.solr/solr).

Steps to build Solr Maven Project:

  1. Create a Maven Project from eclipse using: File --> New --> Project...
    1. Select Maven --> Maven Project from list
    2. Check Create Simple Project
    3. Specify some value in groupId and artifactId
    4. Select Packaging type: war
    5. Press Finish. This will create an empty maven project with basic folder structure in eclipse
  2. Now Edit pom.xml and update it with below content:
    <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.apache.mysolr</groupId>
    <artifactId>mysolr</artifactId>
    <version>3.4.0</version>
    <packaging>war</packaging>
    <name>Apache Solr Web Application</name>
    <repositories>
    <repository>
    <id>central</id>
    <name>Maven Repository Switchboard</name>
    <layout>default</layout>
    <url>http://repo1.maven.org/maven2</url>
    </repository>
    </repositories>
    <pluginRepositories>
    <pluginRepository>
    <id>central</id>
    <name>Maven Plugin Repository</name>
    <url>http://repo1.maven.org/maven2</url>
    <layout>default</layout>
    </pluginRepository>
    </pluginRepositories>
    <dependencies>
    <dependency>
    <groupId>org.apache.solr</groupId>
    <artifactId>solr</artifactId>
    <version>3.4.0</version>
    <type>war</type>
    </dependency>
    </dependencies>
    <build>
    <plugins>
    <plugin>
    <groupId>org.mortbay.jetty</groupId>
    <artifactId>maven-jetty-plugin</artifactId>
    <version>6.1.15.rc4</version>
    </plugin>
    </plugins>
    </build>
    </project>
  3. Copy the solr config files in the folder src/main/resources of the project.
    solrconfig.xml
    schema.xml
    elevate.xml
    protwords.txt
    stopwords.txt
    spellings.txt
    synonyms.txt
  4. Copy the default solr web.xml content from below to src/main/webapp/WEB-INF/web.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN" "http://java.sun.com/dtd/web-app_2_3.dtd">
    <!--
    Licensed to the Apache Software Foundation (ASF) under one or more
    contributor license agreements. See the NOTICE file distributed with
    this work for additional information regarding copyright ownership.
    The ASF licenses this file to You under the Apache License, Version 2.0
    (the "License"); you may not use this file except in compliance with
    the License. You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
    -->

    <web-app>

    <!-- Uncomment if you are trying to use a Resin version before 3.0.19.
    Their XML implementation isn't entirely compatible with Xerces.
    Below are the implementations to use with Sun's JVM.
    <system-property javax.xml.xpath.XPathFactory=
    "com.sun.org.apache.xpath.internal.jaxp.XPathFactoryImpl"/>
    <system-property javax.xml.parsers.DocumentBuilderFactory=
    "com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl"/>
    <system-property javax.xml.parsers.SAXParserFactory=
    "com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"/>
    -->

    <!-- People who want to hardcode their "Solr Home" directly into the
    WAR File can set the JNDI property here...
    -->
    <!--
    <env-entry>
    <env-entry-name>solr/home</env-entry-name>
    <env-entry-value>/put/your/solr/home/here</env-entry-value>
    <env-entry-type>java.lang.String</env-entry-type>
    </env-entry>
    -->

    <!-- Any path (name) registered in solrconfig.xml will be sent to that filter -->
    <filter>
    <filter-name>SolrRequestFilter</filter-name>
    <filter-class>org.apache.solr.servlet.SolrDispatchFilter</filter-class>
    <!-- If you are wiring Solr into a larger web application which controls
    the web context root, you will probably want to mount Solr under
    a path prefix (app.war with /app/solr mounted into it, for example).
    You will need to put this prefix in front of the SolrDispatchFilter
    url-pattern mapping too (/solr/*), and also on any paths for
    legacy Solr servlet mappings you may be using.
    For the admin JSP's to work properly in a path-prefixed configuration,
    the admin folder containing the JSPs needs to be under the app context root
    named to match the path-prefix. For example:

    .war
    xxx
    admin
    stats.jsp
    -->
    <!--
    <init-param>
    <param-name>path-prefix</param-name>
    <param-value>/xxx</param-value>
    </init-param>
    -->
    </filter>

    <filter-mapping>
    <!--
    NOTE: When using multicore, /admin JSP URLs with a core specified
    such as /solr/coreName/admin/stats.jsp get forwarded by a
    RequestDispatcher to /solr/admin/stats.jsp with the specified core
    put into request scope keyed as "org.apache.solr.SolrCore".

    It is unnecessary, and potentially problematic, to have the SolrDispatchFilter
    configured to also filter on forwards. Do not configure
    this dispatcher as <dispatcher>FORWARD</dispatcher>.
    -->
    <filter-name>SolrRequestFilter</filter-name>
    <url-pattern>/*</url-pattern>
    </filter-mapping>
    <!-- Otherwise it will continue to the old servlets -->

    <servlet>
    <servlet-name>SolrServer</servlet-name>
    <display-name>Solr</display-name>
    <description>Solr Server</description>
    <servlet-class>org.apache.solr.servlet.SolrServlet</servlet-class>
    <load-on-startup>1</load-on-startup>
    </servlet>

    <servlet>
    <servlet-name>SolrUpdate</servlet-name>
    <display-name>SolrUpdate</display-name>
    <description>Solr Update Handler</description>
    <servlet-class>org.apache.solr.servlet.SolrUpdateServlet</servlet-class>
    <load-on-startup>2</load-on-startup>
    </servlet>

    <servlet>
    <servlet-name>Logging</servlet-name>
    <servlet-class>org.apache.solr.servlet.LogLevelSelection</servlet-class>
    </servlet>

    <!-- @Deprecated -->
    <servlet>
    <servlet-name>ping</servlet-name>
    <jsp-file>/admin/ping.jsp</jsp-file>
    </servlet>

    <servlet-mapping>
    <servlet-name>SolrServer</servlet-name>
    <url-pattern>/select/*</url-pattern>
    </servlet-mapping>

    <servlet-mapping>
    <servlet-name>SolrUpdate</servlet-name>
    <url-pattern>/update/*</url-pattern>
    </servlet-mapping>

    <servlet-mapping>
    <servlet-name>Logging</servlet-name>
    <url-pattern>/admin/logging</url-pattern>
    </servlet-mapping>

    <!-- @Deprecated -->
    <servlet-mapping>
    <servlet-name>ping</servlet-name>
    <url-pattern>/admin/ping</url-pattern>
    </servlet-mapping>

    <!-- @Deprecated -->
    <servlet-mapping>
    <servlet-name>Logging</servlet-name>
    <url-pattern>/admin/logging.jsp</url-pattern>
    </servlet-mapping>

    <mime-mapping>
    <extension>.xsl</extension>
    <!-- per http://www.w3.org/TR/2006/PR-xslt20-20061121/ -->
    <mime-type>application/xslt+xml</mime-type>
    </mime-mapping>

    <welcome-file-list>
    <welcome-file>index.jsp</welcome-file>
    <welcome-file>index.html</welcome-file>
    </welcome-file-list>

    </web-app>


  5. Run the solr using Run As --> Run configurations...
    • Specify jetty:run-exploded in Goals field and press Run

  6. Access Solr at http://localhost:8080/<artifactId>/admin/

Wednesday, November 9, 2011

Solr Delta Import - Delete Index data

Delete Index data is essential requirement if you wants to have incremental delta imports. On Solr wiki delete is not described in very detail, so I thought of document the issues I faced and how we can delete indexing data with delta import.

schema.xml:
<fields>
<field name="userid" type="string" indexed="true" stored="true"
required="true"></field>
<field name="emailid" type="text" indexed="true" stored="true"></field>
<field name="name" type="text" indexed="true" stored="true"></field>
<field name="address" type="text" indexed="true" stored="true">
</field>
<uniquekey>emailid</uniquekey>
</fields>

data-config.xml:
<dataconfig>
<document>
<entity name="users" pk="emailid" query="SELECT * from users"
deletedPkQuery="SELECT emailid FROM users WHERE is_deleted = true and modification_date >
'${dataimporter.last_index_time}'"
<!-- deletedPkQuery must have emailid in select query -->
<field column="userid" name="userid"></field>
<field column="emailid" name="emaild"></field>
<field column="name" name="name"></field>
<field column="address" name="address"></field>
</entity>
</document>
</dataconfig>

Index data is deleted by uniqueKey defined in schema.xml and not by pk defined at entity level in data-config.xml
If pk at top level entity is not same as uniqueKey in your data-config uniqueKey, Delete will not work though in log file you will see number of deleted documents. This is because the fetched value will be matched with uniqueKey field.
Rows are deleted by uniqueKey.
Here is the method from SolrWriter which is actually called when documents are deleted using deletedPkQuery.

public void deleteDoc(Object id) { //here id value must be from uniqueKey field
try {
log.info("Deleting document: " + id);
DeleteUpdateCommand delCmd = new DeleteUpdateCommand();
delCmd.id = id.toString();
delCmd.fromPending = true;
delCmd.fromCommitted = true;
processor.processDelete(delCmd);
} catch (IOException e) {
log.error("Exception while deleteing: " + id, e);
}
}

Saturday, October 29, 2011

Scrum vs Kanban

Similarities
  • Both are Lean and Agile
  • Both use pull scheduling
  • Both limit WIP
  • Both use transparency to drive process improvement
  • Both focus on delivering releasable software early and often
  • Both are based on self-organizing teams
  • Both require breaking the work into pieces
  • In both cases the release plan is continuously optimized based on empirical data (velocity / lead time)
Differences
ScrumKanban
Timeboxed iterations prescribed.Timeboxed iterations optional. Can have separate cadences for planning, release, and process improvement. Can be event-driven instead of timeboxed.
Team commits to a specific amount of work for this iteration.Commitment optional.
Uses Velocity as default metric for planning and process improvement.Uses Lead time as default metric for planning and process improvement.
Cross-functional teams prescribed.Cross-functional teams optional. Specialist teams allowed.
Items must be broken down so they can be completed within 1 sprint.No particular item size is prescribed.
Burndown chart prescribedNo particular type of diagram is prescribed
WIP limited indirectly (per sprint)WIP limited directly (per workflow state)
Estimation prescribedEstimation optional
Cannot add items to ongoing iterationCan add new items whenever capacity is available
A sprint backlog is owned by one specific teamA kanban board may be shared by multiple teams or individuals
Prescribes 3 roles (PO/SM/Team)Doesn’t prescribe any roles
A Scrum board is reset between each sprintA kanban board is persistent
Prescribes a prioritized product backlogPrioritization is optional

Thursday, October 27, 2011

Agile Manifesto

4 Values
  • Individuals and interactions over processes and tools
  • Working software over comprehensive documentation
  • Customer collaboration over contract negotiation
  • Responding to change over following a plan
Twelve Principles of Agile Software
  • Early and continuous delivery of valuable software.
  • Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage.
  • Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.
  • Business people and developers must work together daily throughout the project.
  • Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.
  • Working software is the primary measure of progress.
  • Agile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.
  • Continuous attention to technical excellence and good design enhances agility.
  • Simplicity--the art of maximizing the amount of work not done--is essential.
  • The best architectures, requirements, and designs emerge from self-organizing teams.
  • At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly.

Wednesday, October 26, 2011

Scrum Roles

Product Owner
  • Customer representative
  • Prioritize Product requirements
  • Responsible for preparing Product Backlog

Team
  • Self managed
  • Responsible for developing product
  • Responsible for the success of each iteration and of the project as a whole

Scrum Master
  • Teaches and implements scrum
  • Ensuring that everyone follows Scrum rules and practices properly
  • Protect the team from impediments during the Sprint