Document Search

Introduction to Document Search

Document Search allows staff to perform powerful, full-text keyword searches on NSF proposals and annual, interim, and final reports, highlighting the section of searched text within its results. The tool provides an extensive range of searchable fields so that staff can associate a desired keyword with specific field filter results. Searching on common fields is even easier in the latest version of the search engine as Faceted Searching (http://en.wikipedia.org/wiki/Faceted_search) has been implemented. The real power of Document Search is the complex search criteria (using any combination of fields, proximity searches, phrases, boosting terms, grouping, etc.) to provide NSF staff with a very targeted result set.

 

An additional feature available in the latest version of the search engine is the “MoreLikeThis” button which is available along with the search results. The “MoreLikeThis” option lists documents in the NSF search index which are “similar” to the original returned result. This feature is useful for identifying documents with matching content, and the top five results are returned in order of similarity, along with a relative numerical “score”.  

Search

The most basic way to use Document Search is to simply type a word in the text box and to click the “Search” button (Figure 1). The real benefit of Document Search however comes from being able to create more complex queries using the Lucene query language which allows users to narrow their searches. See below for example search queries.

 

Beginning in May 2012, for complex search queries, NSF staff can easily identify (via check boxes) the proposal section(s) they wish to search. For example, rather than include Section_Title: Proposal_Description in a query string, users can simply check the box next to that section from the list.

 

Note: For more information about the Lucene Query Syntax, please review the official query syntax documentation, here.

 

 

 

Figure 1- Document Search Page

 

 

Example of Sample Queries

The following table provides sample queries and examples of what some of the search results would be.

 

Search Types

Description/Example

Phrases

“strange attractor”

 

matches documents containing the phrase “strange attractor” in the body

Wildcard Searches

be?t

 

matches belt, best, bent

 

univer*

 

matches universe, universal, university

Fuzzy Searches

heating~0.7

 

matches heating, healing, Keating, setting, seating, meeting

 

Note: Fuzzy searches may take longer to process as they are typically the most complex search.

Proximity Searches

"space weather"~5

matches “The Space Weather Workshop is an annual meeting that brings industry, academia, and government agencies together in a lively dialog about space weather.”
 

Note: Wildcards are also supported in proximity searches, such as:

"space weath*"~5

 

Which increases the flexibility of the search

Fields*

Proposal_Title: oil

 

matches “RAPID Chemical Analysis of Atmosphere Associated with Gulf Oil Spill" and "RAPID Impact of Gulf Oil Surface Films on Atmosphere-Ocean Exchange”

Boosting Terms

oil^10 water

 

matches documents containing oil or water and makes the word oil more important than water

 

oil water^10

 

matches documents containing oil or water and makes the word water more important than oil

Boolean Operators

gulf AND oil

 

matches documents containing both gulf and oil in the body

 

gulf NOT oil

 

matches documents containing gulf but not oil in the body

 

gulf OR oil

 

matches documents containing either gulf or oil or both in the body

Grouping

(galaxy OR universe) AND California

 

matches documents with either galaxy or universe and California in the body

Field Grouping*

Division: (AGS OR AST)

 

matches proposals from the AGS or AST divisions

 

Proposal_Title: (galaxy AND universe)

 

matches proposals with both galaxy and universe in the title

 

 

*Types of Fields Available to Search

 

 

Note:

 

Some special characters are reserved by the Document Search engine, these include: %+ - ! ( ) { } [ ] ^ " ~ * ? : \. Special characters are not searchable. For example:

Document Search Results

The results will display key data fields (such as title, institution name and program) as well as relevant text with the keywords highlighted. As of May 2012, results also include a link to the proposal PDF, and the ability to export Search Results to Excel, CSV and XML formats.

 

You can further filter results using the “Field Facets” feature. This feature will display the top matches for certain fields to make filtering quicker and easier. These fields include:

 

1) Institution Name (the top 10 matches will be displayed)

2) Section Title

3) Directorate (the top 10 matches will be displayed)

4) Division (the top 10 matches will be displayed)

5) NSF Program, (the top 10 matches will be displayed)

6) Proposal Status (the top 10 matches will be displayed)

7) PI / Co-PI Gender

8) PI / Co-PI Ethnicity and

9) PI / Co-PI Race 

 

See figure 2

 

 

 

 

Figure 2- Document Search Results

 

You can also view proposals using the view “More Like This” feature. Click on the “More Like This” link above each proposal to view five similar proposals. A “score” is also displayed indicating the degree of similarity for each similar result. You can access the PDF file for each proposal by clicking on the Proposal ID link and can also view other pertinent information about the similar items such as Institution, Program, Title and Section.

 

Figure 3- “More Like This” Results