muladharma
Rising Star
Starting this thread to discuss the problem of searching for chemicals, molecules.
There are many standards for representing the naming of molecules, especially in the form of strings of characters to be used by computers. There seems to be no worldwide standard for the problem, so any application that allows you to perform the task could be using a combination of multiple representations.
One such application is:
OPSIN: Open Parser for Systematic IUPAC Nomenclature OPSIN: Open Parser for Systematic IUPAC nomenclature
The OPSIN app produces CML file which is an .XML of the atom positions and bonds.
There could be more than a single name for a molecule, and the application deals well with this and other problems by detecting ambiguity.
The result given might be a combination of operations on the N-Grams of the input, which by some logic finds or constructs a match. That being said, without studying the code and/or docummentation one cannot know about the completeness (if all inputs can produce all outputs) and the inversibility (can produce an input given an output) of the operation.
It's not clear if this can be used to search for related compounds, but some ideas are: backtracking inputs, searching for names using structure and backtracking structures.
Examples:
Hydroxytryptamine gives in the result the hydroxy group on the amine, but 5-hydroxytryptamine fixes ambiguity. It could be that ambiguous parts are resolved in the order of priority of construction.
For multiple configurations, example dimethoxybenzene, the first ortho form is prefferred.
Some inputs cannot detect stereocenters, example L-Glycine, but it gives a sign that it can search for that.
Using neural networks can yield other insights because of the complex nature of the search space that is generated by combining natural language with structural geometry.
There are many standards for representing the naming of molecules, especially in the form of strings of characters to be used by computers. There seems to be no worldwide standard for the problem, so any application that allows you to perform the task could be using a combination of multiple representations.
One such application is:
OPSIN: Open Parser for Systematic IUPAC Nomenclature OPSIN: Open Parser for Systematic IUPAC nomenclature
The OPSIN app produces CML file which is an .XML of the atom positions and bonds.
There could be more than a single name for a molecule, and the application deals well with this and other problems by detecting ambiguity.
The result given might be a combination of operations on the N-Grams of the input, which by some logic finds or constructs a match. That being said, without studying the code and/or docummentation one cannot know about the completeness (if all inputs can produce all outputs) and the inversibility (can produce an input given an output) of the operation.
It's not clear if this can be used to search for related compounds, but some ideas are: backtracking inputs, searching for names using structure and backtracking structures.
Examples:
Hydroxytryptamine gives in the result the hydroxy group on the amine, but 5-hydroxytryptamine fixes ambiguity. It could be that ambiguous parts are resolved in the order of priority of construction.
For multiple configurations, example dimethoxybenzene, the first ortho form is prefferred.
Some inputs cannot detect stereocenters, example L-Glycine, but it gives a sign that it can search for that.
Using neural networks can yield other insights because of the complex nature of the search space that is generated by combining natural language with structural geometry.