Producing and maintaining vast quantities of content related to agriculture and rural development, CTA is a rich treasure trove of knowledge and information, much of which has been amassed over the years. There are currently 48,000 CTA documents which can be searched online. That volume will grow as more printed publications and other documents are indexed and added.
Under the previous system, visitors had to know where to look for a particular piece of information, which might have been channeled through one of CTA’s several web portals or other outlets. CTA website coordinator Thierry Lewyllie takes the example of an NGO in Nigeria which is working with cassava and wants to find information from CTA related to the crop in that country.
“Until now, I would have had to visit every site that CTA has – Knowledge, ICT Update, Agritrade and so on – supposing that I knew all those sites, and then do a search, hoping that the search function actually turns up with the results I am looking for, and hoping that I didn’t miss anything in the process,” he said. “It was a very time consuming process and there was no guarantee that it would produce the information you were looking for.”
The new system allows users to search the entire volume of information available in a simple and easy way. Now if that same NGO wants to know what CTA has on cassava in Nigeria, the staff member enters those two key words – cassava and Nigeria – and clicks the search button. At that point, everything that CTA has produced on cassava in Nigeria will turn up in the search results. Results can be displayed either as a list of items or on a map, allowing users to visualise where CTA is working.
A geodata tool allows users to search for all CTA information linked to a specific village or community with more than 5000 people, anywhere in the world. The function is expected to prove especially useful given CTA’s new move towards developing Regional Business Plans, which will result in a sharper focus on regional issues or features.
“That means if I am living in a small rural town in one of the ACP regions, I can now ask CTA to give me anything that it has ever produced which makes reference to where I am living. It means we can scale right down. We are leveraging big data technologies to provide more accurate and targeted small data” explained Lewyllie. “It means that if there is a farmer or someone running an NGO in a very specific place in the world, they can immediately see what CTA has produced on their particular area. And the nice thing is that it does that in milliseconds.”
Potential for partners
The system is open source and CTA has plans to offer smart search possibilities to partners and other stakeholders without extensive ICT resources to help them develop efficient search systems for their own web content and make valuable content more readily accessible to their users.
In order to produce the most relevant results to a search query based on actual content, CTA looked closely at the work done on semantic search algorithms and Knowledge Graph by IT companies such as Google.
The system makes use of AGROVOC, the FAO thesaurus of more than 32,000 agricultural terms and concepts available in 21 languages, to help rank key words and increase the chances of users finding a match for their search, even when they insert imprecise information.
“This is really cutting edge technology we are using. It’s something CTA can be proud of,” said Lewyllie. “I think the way we solved our problems with limited resources is a good example of how technology can provide high performing and scalable solutions through a clever combination of different tools.”
Visit the demo version of the search tool. We encourage you to try it out and report any glitches so that the system can be fine-tuned.
How does it work?
All online CTA content has been labelled according to its geographic location – using the geographical database Geonames.org – and according to its semantic definition, using FAO’s AGROVOC thesaurus. CTA has created a custom-built application to carry out this automatic labelling. Adding different database and search technologies, this system powers its semantic search engine.
Apache Stanbol scrutinises each piece of content and extracts the relevant geographic location for Geonames and the agricultural keyword for AGROVOC. This content is stored and managed by CouchDB, a database management system that combines an intuitive document storage model with a powerful query engine.
Next, CTA uses a graph database, Neo4j, to model and visualise relationships in its data. This knowledge of relationships in data provides improved flexibility in handling complex hierarchies in a deeply intuitive way.
Lastly, Elasticsearch, which supports geolocation, context aware ‘did-you-mean’ suggestions and smart autocomplete functions, provides fast and powerful full text search capabilities.