Want easy access to all chemical patents?

Get a license in our online shop.

Start your research immediately.

order




Fully automated processing of 50.000 patent documents every week

Cippix Database is updated on a regular basis every week. No editorial system can cope with the this amount of information in just hours. Instead, Cippix uses proprietary technology to extract all chemical entities from these documents in a fully automated manner.

Cippix attempts to recognise all named chemical entities of the document including exemplified compounds, synthetic procedures, reactants, intermediates, catalysts, solvents, reference compounds, prophetic compounds by IUPAC and IUPAC-like names, USAN, INN, trade and trivial names, and other synonyms.

 

Perfect recognition in an imperfect world

Cippix extracts chemical entities from full text documents issued by various patent offices. These documents are prone to a number of challenges that make recognition difficult. Text quality may have suffered from setting and reformatting. Human error leads to misspelled names and ambiguities. During the conversion into plain text sub- and superscripted numbers, often important to comply with the IUPAC naming standard, may get lost. Moreover, many official full text documents are actually derived from hard copies that were re-scanned and OCRed. 

Cippix recognises more than 99.99% of well-formed names. Most common OCR-related errors, like the inclusion of blanks and/or minus blanks somewhere in the name are recognised and are be corrected automatically in more than 85%. Misspellings are widely accepted, which increases coverage but can sometimes lead to the erroneous recognition of non-chemical words as chemical entities.

 

Multilanguage support

Cippix Chemical Content Recognition supports multiple languages: English, German, French, and Japanese.

 

Lexicographical and structural chemical indexes

Cippix uses both, lexicographical and structural chemical indexes. There are a number of names in the lexicographical index which have no chemical structure. Reasons for this are: ambiguous names, wrong names, missing brackets, harsh typos and similar. Cippix recognises also generic names that describe classes of compounds rather than single compounds, like aminoacids. Likewise, complex names of polymers and copolymers re recognised. While such names are recognised and included in the lexicographical index, they are not included in the structural index.

With Cippix you can research both, the lexicographical and the structural index with unique, specially designed methods.

 

Searching the lexicographical index

Searching for an exact name may appear a trivial task. But the name of a compound comes often with different spellings in different documents. In fact, it is hard to cover all different spellings exactly for a given IUPAC name. Cippix has developed a unique tool that allows for the search of similar names, which solves the problem in an elegant way.

Moreover, the chemical name similarity is used to generate Tentative Structures, i.e., reasonable structures for problematic names which cannot directly converted into chemical structures.

Different languages produce different compound names. Cippix Name Similarity identifies  related compound names even from different languages. An English compound name will thus retrieve alternative English spellings plus German and French spellings as well!

 

Searching the structural index

Cippix supports classical exact, simple and advanced substructure searches. Cippix  features advanced query atoms, flexible query bond types, fragment superstructures and even R-groups and Markush search. Moreover, Cippix provides special compound similarity searches, which can identify compounds related to the query.

The underlying technology has been optimised such that all kinds of Medicinal Chemistry alterations to a compounds are covered: bioisosteric ring systems, chain length alterations, cyclysations, isomers, and more.

Further, Cippix provides a unique fuzzy substructure search. This helps identifying compounds that have structural features similar but not necessarily identical to the query.

 

Simple queries give rich hit lists

Cippix search methods are perfectly suited for the occasional researcher. Keep your query simple and let Cippix care for the results. There is no need in defining complex R-group and Markush queries but you can do so in the advanced structure search.

All results are relevance-ranked and allow the adjustment of the similarity threshold.