Campus Manaus Centro
URI permanente desta comunidadehttps://ri.ifam.edu.br/handle/4321/8
Notícias
Acesso aberto a documentos
Navegar
Item Data Cleaner Service - Web Service para apoiar na limpeza de dados(2023-06-19) Picanço, Gabriel Rodrigues; Carminé, Rogério Luiz Araújo; http://lattes.cnpq.br/7537210268624278; Carminé, Rogério Luiz Araújo; http://lattes.cnpq.br/7537210268624278; Azevedo, Renildo Viana; http://lattes.cnpq.br/0601215754206476; Santos, Valclides Kid Fernandes dos; http://lattes.cnpq.br/7902062389125321The large increase in the amount of data and information recently is notorious. Wrong data generates wrong information, which can significantly impact the management of organizations and people's lives. Organizations need to obtain reliable and timely information for effective business decision-making and, for that, it is important to invest in solutions that ensure data quality. Data Cleaning techniques make it possible to identify and correct non-compliant values based on defined rules and actions. These techniques are widely used in data preparation activities in BI (Business Intelligence) and Data Science processes to contribute to the quality of results. There are several tools on the market for data cleaning (e.g., Excel, OpenRefine and Data Wrangler), which have specific characteristics and functions, and normally allow the creation of programming scripts to carry out cleaning tasks. These scripts can take a lot of time and effort to write, and can be difficult to reuse across different tools. A solution based on a web service (Data Cleaner Service) capable of integrating with tools (e.g., Web Applications) and cleaning data through the reuse of scripts (e.g., Python) was developed. To demonstrate the solution, applications were developed using scripts with the PANDAS tool and a Web application that consumes the service. With this solution, it is expected to make a positive contribution in carrying out data cleaning tasks in several areas (e.g., finance, sales and health), reducing effort and time in these activities, promoting the exchange of experience between users and developers, and impacting the generation of effective information for decision-making.