As more engineered nanomaterials (eNM) are developed for military use, it is crucial to minimize any unintended environmental impacts (NEI) resulting from the application of eNM. To realize this vision, industry and policymakers must base risk management decisions on sound scientific information about the environmental fate of NM; their availability to receptor organisms, including related concepts such as uptake; and any resultant biological effects, e.g. toxicity. To address this need, Intelligent Automation, Inc. (IAI) is developing a Model driven Data Mining System for studying Environmental Impact of Nanomaterials (NEIMiner).
The NEIMiner system consists of three layers: 1) Data Integration Layer, 2) Model Discovery and Analysis Layer, and 3) Knowledge Management Layer. The system architecture is shown in the figure above.
The Data Integration Layer is responsible for getting NEI related data for analysis. Basically, we consider two different categories of NEI data. The first category includes many existing NEI related websites and data sources. We have developed crawling algorithms to extract and dump the data to Drupal CMS. The second category includes the fresh publications from researchers. The fresh publications must be input to the system through manual data entry. We have developed online and extendible data capture process. All the publications are stored in NEI bibliography, and can be tagged as and grouped into different corpora. In order to further analyze these publications, we have developed an annotator to annotate NEI articles with NanoParticle Ontology (NPO), and an information extraction system to extract nano toxicity related information from unstructured text.
The Model Discovery and Analysis Layer consists of analysis capability for nanomaterials data. We have extended IAI’s machine learning meta-optimization tool ABMiner (Agent Based text Miner) to discover correlation between nano properties and environmental impact in this project. ABMiner is a distributed and optimized data mining engine, which can be used for discovering patterns, rules and useful knowledge. We have built a model base to manage NEI prediction models online. We have also developed a prediction cube to analyze and evaluate a large amount of NEI data at different granitudes. We have developed information visualization to view the NEI information and explore co-occurrence and toxicity patterns. Furthermore, we have performed information network analysis (author network, keyword network, and author-keyword network) to explore relationships between authors and keywords for those publications.
The Knowledge Management Layer consists of many modules of Drupal. Drupal is an extensible content management platform and there are more than 5000 modules available for internet applications. Drupal has the functions to call the web services of caNanoLab to access data. We will use Drupal modules and also develop our own modules including (1) web services for automatically accessing and fetching data from other sources, (2) web interfaces for manual data entry, (3) management of NEI publications and characterizations, (4) user management in multiple levels with different permisions, (5) colloboration connecting experts in the related area, (6) risk assessment querying. What’s more, to access information organized according to a faceted classification system, allowing users to explore a collection of information by applying multiple, we have developed faceted search in Drupal.
For more information about our project, please watch the demonstration video below.