505(b)(2) Drug Discovery with dedicated Search Engines

505(b)(1) vs. 505(b)(2) New Drug Application (NDA) Routes

New drugs whose active ingredients have not previously been approved, follow the 501(b)(1) route. This consumes billions of dollars, take years to complete, and requires extensive clinical trials.

Should a new drug contain similar active ingredients as a previously approved drug, the NDA applicant can in many cases rely on previous FDA findings. This results in a potentially substantive shorter approval process, resulting in lower cost.


The table below shows one example of a successful 505(b)(2) NDA. The drug Bendeka(R), based on the original approvel of Treanda(R). Note the period from submission to approval.

(from Seven Noteworthy 505(b)(2) Submissions, Charles O. Jaap, V, MBA, RAC, Jodi Hutchins, RAC, CQA and Mikel Alberdi, MPH, RAC)

We submitted a proposal to a 501(b)(2) consultancy, and our initial two-week exploration resulted in some interesting findings.

We examined some drug information data sources, some of which are listed below:

Drugs@FDA:http://www.accessdata.fda.gov/scripts/cder/daf/ Provides product specific regulator history 
FDA Orange Book: http://www.accessdata.fda.gov/scripts/cder/ob/ lists each product and their associated patents and exclusivity
FDA IID http://www.accessdata.fda.gov/scripts/cder/iig/index.Cfm (inactive ingredient search)
FDA OTChttp://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfivd/Search.cfm (OTC)
RightFindhttp://www.copyright.com/business/rightfind/ (copyright search)
PubMed: – https://www.ncbi.nlm.nih.gov/pubmed Citations data source
FDA Formshttps://www.fda.gov/AboutFDA/ReportsManualsForms/Forms/default.htm application forms etc
Hein Onlinehttp://home.heinonline.org/ laws?
21 CFR Searchhttp://www.accessdata.fda.gov/SCRIPTS/cdrh/cfdocs/cfCFR/CFRSearch.cfm code of fed regs
DailyMedhttps://dailymed.nlm.nih.gov/dailymed/ drug guidance, regs, labeling
Toxnethttps://toxnet.nlm.nih.gov/ toxicity dbs
USP-NFhttp://www.uspnf.com/uspnf/login guide through pharma process
REMS@FDA https://www.accessdata.fda.gov/scripts/cder/rems/ approved risk evaluation and mitigation
Health Canadahttp://www.hc-sc.gc.ca/index-eng.php
European Medicines Agency agency curl=/pages/medicines/landing/epar_search.jsp&mid=WC0b01ac058001d124 
FDA Drug Label Databasehttp://labels.fda.gov/

Our initial plan was to utilize as many APIs of the above data sources as possible. However, licensing costs and complexity (there is no standard API!) put paid to that idea.

The examples below are an import from the DrugBank  and the RxNorm downloads. Several database downloads were found to have had recent schema changes and data inconsistencies. In the interest of time, we decided to scrape the individual sites and not be the best netizens.



Google Custom Search

Google Custom Search is a very useful product! We decided to employ GCS as an interim search engine to aggregate drug documents to ingest into ElasticSearch. It has to be mentioned that it has certain drawbacks.


Watson Retrieve and Rank

We are using Watson to assist in curating the downloaded documents. This adds further ranking to individual documents.  Watson R&R definitely warrants further investigation.

Prior to Watson R&R being useful, the engine has to be trained on an uploaded document corpus. A sample of simplified training questions are listed below:

what are the active ingredients for solu-cortef?
what is the active NDA number for solu-cortef?
what is the approval date for solu-cortef?
what are the active ingredients for hydrocort?
what is the active NDA number for hydrocort?
what is the approval approval date for hydrocort?
when was the NDA issued for Nicorette?
what is an alternative name for Nicorette?
what are the active incredients for Nicorette?
what are the active ingredients for Vasotec?
what is the active NDA number for Vasotec?
what is the approval date for Vasotec?
where there any incomplete responses for Vasotec?


We further employ manual document tagging for custom curation:


Once our documents have been ingested into ElasticSearch, we perform clustering:


The pharmaceutical regulatory field is fertile as far as Data Science, Data Engineering, and especially Document Search is concerned. We hope to develop this into a commercial offering in the near future.

Please feel free to contact Steph van Schalkwyk with commercial requests. 

Steph van Schalkwyk
314 452 2896 (direct)

Leave a Reply

Your email address will not be published. Required fields are marked *