Investigating the Impacts of Misspellings in Patent Search by Combining Natural Language Tools and Rule-Based Approaches

Among all sources of technical information, patent information is one of the richest and most comprehensive. Knowing how to search in this mass of documents is becoming increasingly crucial. However, many users have limited knowledge of patents and search strategies, so they must use intuitive, often approximate approaches that can lead to highly inaccurate searches and be timeconsuming. To address this problem, there are tools that help expand queries to increase recall so as not to miss good documents, however, it remains an open problem dealing with misspellings based strategies. Typically, the problem of the presence of misspellings in patent text is underestimated even by experts in the field, and there is no specific functionality to handle it in the tools available, both free and paid. The goal of the article is to raise awareness about the difficulties in making a proper patent strategy that also takes into account the possible presence of misspellings. It is important to know where we expect to find them and how much these may affect the final result. In particular, it is chosen to divide misspellings into categories, distinguishing between misspellings associated with a generic keyword or multiword from misspellings in acronyms, chemical formulas, names of applicants, inventors, or names of specific formulas or theorems. At least one example case is given for each category, showing when and how it may affect the result. Finally, an integrated approach combining word and contextual embedding models based on deep learning with a rule‐based algorithm based on wild cards and truncation operators is suggested for correcting the query, automatically suggesting the most consistent misspellings, thus achieving a more accurate and reliable result.

(2022). Investigating the Impacts of Misspellings in Patent Search by Combining Natural Language Tools and Rule-Based Approaches [journal article - articolo]. In KNOWLEDGE. Retrieved from http://hdl.handle.net/10446/227849

Investigating the Impacts of Misspellings in Patent Search by Combining Natural Language Tools and Rule-Based Approaches

Russo, Davide;Spreafico, Christian;Avogadri, Simone;Precorvi, Andrea

2022-09-07

Abstract

Among all sources of technical information, patent information is one of the richest and most comprehensive. Knowing how to search in this mass of documents is becoming increasingly crucial. However, many users have limited knowledge of patents and search strategies, so they must use intuitive, often approximate approaches that can lead to highly inaccurate searches and be timeconsuming. To address this problem, there are tools that help expand queries to increase recall so as not to miss good documents, however, it remains an open problem dealing with misspellings based strategies. Typically, the problem of the presence of misspellings in patent text is underestimated even by experts in the field, and there is no specific functionality to handle it in the tools available, both free and paid. The goal of the article is to raise awareness about the difficulties in making a proper patent strategy that also takes into account the possible presence of misspellings. It is important to know where we expect to find them and how much these may affect the final result. In particular, it is chosen to divide misspellings into categories, distinguishing between misspellings associated with a generic keyword or multiword from misspellings in acronyms, chemical formulas, names of applicants, inventors, or names of specific formulas or theorems. At least one example case is given for each category, showing when and how it may affect the result. Finally, an integrated approach combining word and contextual embedding models based on deep learning with a rule‐based algorithm based on wild cards and truncation operators is suggested for correcting the query, automatically suggesting the most consistent misspellings, thus achieving a more accurate and reliable result.

Scheda breve

Scheda completa

Scheda completa (DC)

	Tipo di articolo
	
				articolo
			
	Data di pubblicazione
	
				7-set-2022
			
	Rivista in ANCE
	
				KNOWLEDGE
			
	Tutti gli autori
	
						Russo, Davide; Spreafico, Christian; Avogadri, Simone; Precorvi, Andrea
					
	Citazione
	
				(2022). Investigating the Impacts of Misspellings in Patent Search by Combining Natural Language Tools and Rule-Based Approaches  [journal article - articolo]. In KNOWLEDGE. Retrieved from http://hdl.handle.net/10446/227849
			
	Nelle collezioni:
	
				1.1.01 Articoli/Saggi in rivista - Journal Articles/Essays

File allegato/i alla scheda:

File	Dimensione del file	Formato
knowledge-02-00029.pdf accesso aperto Versione: publisher's version - versione editoriale Licenza: Creative commons Dimensione del file 948.8 kB Formato Adobe PDF Visualizza/Apri	948.8 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

Aisberg ©2008 Servizi bibliotecari, Università degli studi di Bergamo | Terms of use/Condizioni di utilizzo

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10446/227849

Citazioni

ND

ND

social impact