Orchestration of web services in the NIF project: using the Kepler workflow engine for data fusion
Filed under:
General neuroinformatics
Vadim Astakhov (UCSD), Anita Bandrowski (UCSD), Amarnath Gupta (UCSD), Jeffery Grethe (UCSD), Maryann Martone (UCSD)
We report on progress of employing the Kepler workflow engine and service oriented architecture (SOA) to prototype application integration workflows that integrate data and web services developed by the Neuroscience Information Framework (NIF). One prerequisite of the scientific enterprise consists of searching for effective and useful data and resources, i.e., reagents, neuro-anatomy features, genes, or proteins. Finding relevant resources is becoming not a challenge of scarcity, but one of overabundance; in fact relevant data can be found anywhere among thousands of neuroscience-relevant information resources created by a range of information providers including, research groups, funding agencies, vendor groups, and public data initiatives.
NIF provides a graphical user interface, GUI, to locate and access ontologically aligned and semantically fused heterogeneous federated information. NIF also atomized the various functions that serve the user interface and put them out as services that can be used like “Lego blocks” to query the data, build entirely new interfaces or tools. Currently, we use Kepler to orchestrate communication among various NIF services and provide a transparent layer for data fusion. Kepler combines data and processes into a configurable, structured set of steps that helps to implement semi-automated workflows. Kepler provides a development environment with a graphical user interface for designing workflows composed of a linked set of components called Actors, which can be executed under different Models of Computation. In this work, we report on specific workflows that perform data fusion and orchestration of diverse web services. This “Brain data flow” (See figures below) outputs categorized counts of information from 150 data sources about brain regions. Obtaining a similar set of data from the NIF GUI, requires manually writing down result counts that are the result values for each database for each query. Kepler, unencumbered by the current configuration of the user interface can be asked to pull a different set of data from the result set, in this case the number of results, and place that into a table. This table can then be easily turned into a graphic that helps users see which databases are information rich given a particular query. In this example, Kepler loops and recovers the same set of information for all of the brain parts and all databases, producing a massive matrix (http://tinyurl.com/6nkfe9f).
NIF provides a graphical user interface, GUI, to locate and access ontologically aligned and semantically fused heterogeneous federated information. NIF also atomized the various functions that serve the user interface and put them out as services that can be used like “Lego blocks” to query the data, build entirely new interfaces or tools. Currently, we use Kepler to orchestrate communication among various NIF services and provide a transparent layer for data fusion. Kepler combines data and processes into a configurable, structured set of steps that helps to implement semi-automated workflows. Kepler provides a development environment with a graphical user interface for designing workflows composed of a linked set of components called Actors, which can be executed under different Models of Computation. In this work, we report on specific workflows that perform data fusion and orchestration of diverse web services. This “Brain data flow” (See figures below) outputs categorized counts of information from 150 data sources about brain regions. Obtaining a similar set of data from the NIF GUI, requires manually writing down result counts that are the result values for each database for each query. Kepler, unencumbered by the current configuration of the user interface can be asked to pull a different set of data from the result set, in this case the number of results, and place that into a table. This table can then be easily turned into a graphic that helps users see which databases are information rich given a particular query. In this example, Kepler loops and recovers the same set of information for all of the brain parts and all databases, producing a massive matrix (http://tinyurl.com/6nkfe9f).
Preferred presentation format:
Poster
Topic:
General neuroinformatics