Using push collections with multiple query processors
Introduction
Push collections can be setup to have multiple query processors. We will refer to the machine which allows addition and deletes of documents as the admin. We refer to the machine that a query processors connects to as a query processor's master. You can setup multiple query processors to connect to a single admin as well as daisy chain query processors to other query processors. Thus a query processor can be a master. The query processors will only be able to fetch data from admin once the data has been committed. Query processors only replicate indexes, document data as well as some internal Push files. You should look at supporting_multiple_query_processors to find out about fetching query and click logs from query processors to the admin machine for better search quality and analytics.
Caveat: This feature is not supported on Windows.
Setting up a query processor
Push query processors work on a pull model which means the admin does not know about its query processors, so almost all of the configuration will be on the query processors. If you wish to daisy chain slaves you should connect the slaves closest to the master server and work your way out.
To setup a Push query processor:
- Ensure the master server and the query processor share the same server secret, see server secret (global.cfg).
- Create a push collection on the the query processors with the same name as the push collection you wish to replicate on master.
- Before you make any changes to the Push collection you must first set the push.initial-mode to
push.initial-mode=SLAVE
. - Now you must configure the query processor to talk to its master. Edit the following options in the query processor's
collection.cfg
.- You must set the push.replication.master.hostname to the hostname of the master e.g.
push.replication.master.hostname=<master>
- You may need to set the push.replication.master.push-api.port to the jetty.admin_port configured in
global.cfg
on the master. If the slave is running on the same port as master this does not need to be set.
- You must set the push.replication.master.hostname to the hostname of the master e.g.
- You must now tell Push to start syncing through the push API.
POST /push-api/v1/sync/collections/<collection>/state/start
- You can check if the Push collection is trying to synchronise with master by checking that its SyncState is Sync, by making the following request to the Push API:
GET /push-api/v1/sync/collections/<collection>/state
Promoting a query processor to Admin
If your Admin machine suffers a failure you can promote one of your query processors to be the new Admin machine. To do this make the following call to the PushAPI on the query processors you wish to promote:
POST /push-api/v1/collections/test-push2/mode/?mode=DEFAULT
As query processors are not synchronised with the Admin machine, your new Admin machine may be missing some data.
You should ensure that all of your query processors now point to the new master. To do this just edit the push.replication.master.*
options as above and Push will automatically connect to the new master.
Demoting a Admin to a query processor
To demote a admin machine to a query processor you must first empty the push collection by making the following API request:
DELETE /push-api/v1/collections/<collection>
You must then change the mode to SLAVE by making the following API request
POST /push-api/v1/collections/<collection>/mode/?mode=SLAVE
Reducing network load
Currently Funnelback supports ignoring some files to reduce the load on your network, at the cost of reduced usability. You may ignore the following:
- Document data: resulting in cache copies being unavailable on the query processor.
- Delete lists: has no effect on the query processor.
Important! If any of the push.replication.ignore.*
options are set true, you should not attempt to promote a query processor to Admin, as that will result in a corrupt Admin machine.
Deleting a Push collection with slaves
If a Push collection has slaves it can be difficult to delete that push collection. The easiest way to delete a Push collection with slaves is to first delete the Push collection on each of its slaves, ie delete the Push collections in the reverse order you set them up. This should be done because a Push collection must be stopped before it can be deleted and a slave will constantly make request to the Push collection effectively preventing the Push collection from being stopped.