Oh The Huge Manatee

A blog about technology, open source, and the web... from someone who works with all three.

How to Set Up Apache Solr Multicore for Drupal

Apache Solr is the search technology that powers many Drupal sites. It integrates easily with Drupal’s search_api contrib module, through the search_api_solr module. It’s easy enough to set up a single site installation of solr on your own, but if you’re serious about building Drupal sites with solr you will be building more than one site. Solr includes a ”multi-core” mode, which lets you serve multiple search cores to separate sites from a single installation. Each core looks, feels, and acts like a totally separate Solr install, so you can develop as many solr sites as you like. Each core has its own configuration, data directory… the works. One core can support extensive custom geo data, and another can be completely location blind. The best part: with Solr multicore, adding a new search core to your existing stack is a trivial 5 minute installation.

But getting a working, sane, and easy to manage solr multicore installation is not quite as trivial as a single site. I get asked for instructions on this all the time, so here they are for posterity. These instructions assume you’re running on a supported version of Ubuntu Linux, but very few steps are distribution-specific. I’ve noted them inline, so you can work out just what you need to do for your unique environment.

1) Understand what you’re doing

I know you aren’t going to read this whole post before getting started, so instead step 1 is to understand the beast.

Drupal’s search_api module allows you to plug in different search engines to act as the back end. We’re going to use the search_api_solr module, so that Drupal passes search indexing and queries to Apache Solr for processing. Installing and configuring this on the Drupal side is beyond the scope of this post, but it’s not radically different from any other contrib module. The radically different part is in setting up Solr, and making sure it knows how to understand the information coming from Drupal. 

Apache Solr is a search engine built in Java. Basically you run a java program, feed it a bunch of data, and it stores it in a searchable index. Then you send it search queries, and it gives you a sorted list of results. In order to do this, it needs to understand the structure of the data in the search index, and a handful of similar details. We’re lucky in that Drupal’s search_api_solr comes with exactly the configuration files we need to make Solr understand the indexing and search request data that comes from Drupal. We’ll be able to just copy the configuration files right out of the search_api_solr.

In order to run solr in a secure environment, and in order to feed it information that’s coming in over HTTP, you need a Java Servlet engine. This is basically a web server that is specifically built to pass requests on to Java applications like Solr. You can think of it as a wrapper for Solr. There are a few competing open source servlet engines, but we’re going to use Tomcat in this tutorial. Tomcat is popular, powerful, and easy to install on most modern Linux distributions. 

2) Install Apache Tomcat

On Ubuntu and Debian systems, this should be as easy as 

sudo apt-get install tomcat6 tomcat6-admin tomcat6-common tomcat6-user
Bam, you have tomcat installed, as well as everything you need for tomcat to control access to administrative and user functions. The default location for tomcat6 is /usr/share/tomcat6 , and configuration is stored in /etc/tomcat6 .

By default Tomcat’s user database is a simple xml file. We’re going to edit it so that we have an administrative user that can tweak behaviors from Tomcat’s web interface.

<role rolename="admin"/>
<role rolename="manager"/>
<user username="tomcat" password="mypasswordhere" roles="admin,manager"/>
</tomcat-users>
Make sure that you define both the admin and manager roles as above, and that you have one user that gets both roles. Save the file, and restart the tomcat6 service.
sudo service tomcat6 restart

Now Tomcat is up and running, and you can administer it. By default the administrative interface runs on port 8080. This is actually quite handy to keep around, so I recommend blocking external access to 8080 at the firewall level, and accessing it locally when you need to make configuration changes or tests. Test it out now at http://localhost:8080/ . By default you should see a page that shows links to the administration sections, and some other generic information about your Tomcat instance.

2) Install Apache Solr

Note: as of this writing, search_api_solr is not yet compatible with the latest major version of Solr (4.1). You can track progress on this issue at http://drupal.org/node/1676224 . In the meantime, we’re proceeding with a current version of 3.6. These instructions should work for the latest 3.6.x you can find on the Solr website.
 

Now we’re going to download the Solr Java application, and copy the compiled version of it into tomcat’s jailed “webapps” directory, where it’s safe to run. We’re also going to create a convenience simlink there so we can keep the version information in the filename, and we don’t have to update any config files when we want to update Solr versions. Finally, we’re going to simlink solr’s configuration directory to /etc/solr , so you don’t have to memorize the location.
cd /tmp
wget http://apache.rediris.es/lucene/solr/3.6.2/apache-solr-1.4.1.zip
unzip apache-solr-3.6.3.zip

Now we create a directory for Solr to keep its compiled Java applications in, right in tomcat’s homedir because that will be easy to find later. We copy the compiled .war file for solr into that new directory, and simlink it to an easier to remember name that we can use in configuration files.

sudo mkdir /usr/share/tomcat6/webapps
sudo cp /tmp/apache-solr-3.6.3/dist/apache-solr-3.6.3.war /usr/share/tomcat6/webapps
sudo ln -s /usr/share/tomcat6/webapps/apache-solr-3.6.3.war /usr/share/tomcat6/webapps/solr.war
Now technically we have the solr application installed… but Solr needs for its own home directory as well. This is where you will keep solr-specific configuration files, and eventually the files related to your search cores themselves. We’ll base our solr directory off of the example multicore setup that is distributed with solr itself. This directory needs to be writable by solr so it can keep search index information in it, so we have to make sure it’s owned by tomcat6 (the user who will be running solr). Finally, we’ll simlink solr’s home directory to /etc/solr so it’s more memorable.
sudo cp -a /tmp/apache-solr-3.6.3/example/multicore /usr/share/tomcat6/solr
sudo chown -R tomcat6 /usr/share/tomcat6/solr
sudo ln -s /usr/share/tomcat6/solr /etc/solr
Solr is now ready to go. Let’s tell tomcat6’s servlet container component, Catalina, about solr and what access it needs to run. We describe a new “Context” to Catalina, which is based on the solr.war simlink we just created. We tell it where to find the environment solr.war calls “solr/home”, which is the solr homedir we just set up. 
<Context docBase="/usr/share/tomcat6/webapps/solr.war" debug="0" privileged="true" allowLinking="true" crossContext="true">
  <Environment name="solr/home" type="java.lang.String" value="/usr/share/tomcat6/solr" override="true" />
</Context>
Restart tomcat6 with sudo service tomcat6 restart, so it reads the new configuration. Now we have a multi-core solr environment set up to run on Tomcat6. Congratulations! The hard part is over. Test to make sure solr is loading properly by visiting your Tomcat admin page at http://localhost:8080/manager/html . You should see new links there for solr, with two example cores already built in.

3) Configure Solr Multicore for Convenience

Let’s have a look at the solr configuration you just created. In your /etc/solr directory, you actually only need one file: solr.xml . This configuration file tells solr everything it needs to know about how it is set up. In fact, the only part you have to care about is at the end, where it lists <core> declarations. 
<cores adminPath="/admin/cores">
  <core name="core0" instanceDir="core0" />
  <core name="core1" instanceDir="core1" />
</cores>
Each new core gets a name and a directory path (relative to solr’s home, /etc/solr) where it should keep it’s configuration and data. By default we have two demonstration cores, called core0 and core1, kept right in Solr’s home directory. I find that messy, so we’re going to create a new directory called cores, and give each core a subdirectory under that. Then we’ll update that solr.xml file to tell it the new location of core0 and core1. This is purely for convenience and clarity of configuration, but when you have 20 concurrent solr dev sites to manage, you’ll thank me.
sudo mkdir /etc/solr/cores
sudo mv /etc/solr/core[0-1] /etc/solr/cores
<cores adminPath="/admin/cores">
  <core name="core0" instanceDir="cores/core0" />
  <core name="core1" instanceDir="cores/core1" />
</cores>
That’s it. If all you want is a Solr multicore setup, this is where you get off. From here on in it’s Drupal-specific.

4) Add a new core for a Drupal site

Now that everything is nicely organized in /etc/solr, let’s add a new search core for a Drupal site. We’re going to copy one of the example core directories, and then copy in the configuration files that are distributed with Drupal’s search_api_solr module. With each new site, I recommend copying in these configuration files from the module rather than trusting what’s in an existing core directory, simply because these configurations update with the search_api_solr module itself. You want to make sure that your core is using the configuration that your version of the module expects! These are the steps you will have to take every time you want to add a new core for a Drupal site.
sudo cp -a /etc/solr/cores/core0 /etc/solr/cores/myfirstcore
sudo cp /path/to/drupal_site/sites/all/modules/search_api_solr/solr-conf/* /etc/solr/cores/myfirstcore/conf
<cores adminPath="/admin/cores">
  <core name="core0" instanceDir="cores/core0" />
  <core name="core1" instanceDir="cores/core1" />
  <core name="myfirstcore" instanceDir="cores/myfirstcore" />
</cores>
sudo chown -R tomcat6 /etc/solr/
sudo service tomcat6 restart
That’s it. Copy a directory, add one line to a configuration file, and restart tomcat6, and you have a new solr core to work with.

5) Profit

Now that your solr core is up and running, you can visit the search_api_solr configuration page and add a new server of type “solr”, with the following settings:
 
Solr Hostname: localhost
 
Solr Port: 8080
 
Solr Path: /solr/myfirstcore
 
Test and enjoy! If you have any trouble with these instructions, or have anything to add… leave us a comment!

 

Comments