Identifying semantic characteristics of user interaction datasets through application of a data analysis

In an evaluation of a decision, the analyzed fact need to receive inputs from multiple data sources – structuring, integrating, storing, and processing collected data into an output that supports a better understanding of the fact from data, allowing new dimensions of analysis.
The goal of this study is to identify the semantics characteristics of data attributes at the moment of collecting, from dataset’s structures found on data export interfaces on user’s interactions analysis tools, on Internet communication channels, and on web analytics data tools involved in a scientific journal management, through an application of a process of data analysis and data modeling techniques.
The research was delimited to exportable dataset’s available in interfaces from Open Journal Systems, Google Analytics and Search Console, Twitter Analytics, and Facebook Insights.
It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on these data resources. Entity-Relationship Modeling concepts were applied to design and to store data collected from the services, resources, datasets, and attributes.
Also, the collected data was processed into another data structure, adopting the online analytical processing cube as a three-dimensional representation of elements, acting as perspectives of analysis.
This data analysis identified semantic dissonances on definitions of attributes on entities, that may interfering with the development process of relationships between attributes from different datasets, decreasing the potential of interoperability.

Keywords: Data Analysis. Data Collecting. Data. Online Social Networks. User data.

Authors

  1. Fernando de Assis Rodrigues
  2. Pedro Henrique Santos Bisi
  3. Ricardo César Gonçalves Sant’Ana

Full text available at

  1. ISKO
  2. Research Gate

Actions to make government datasets available in Linked Open Data

The principles of Linked Open Data (LOD) establish a new way of sharing datasets opened by the Internet, aiming to promote the wide distribution of structured data in languages, such as eXtensible Markup Language (XML) and in compliance with the recommendations of the Resource Description Framework (RDF).

In this scenario, government datasets play a prominent role: they represent 18.58% of the total number of existing LOD datasets and 41.54% of these government datasets have at least one relationship with ontologies or controlled vocabularies, according to the results of the mapping developed by Linking Open Data cloud diagram.

However,  there are still characteristics in the LOD dataset structures at the moment of data retrieval that is not considered ideal nor adopted good practices, such as the absence of metadata and licenses information. Actions to make public government data accessible are an integral part of discussions on trends in the modernization of public administration models, which seek to redistribute skills and resources among different intra-governmental and extra-governmental organizations, allowing greater institutional pluralism in public functions.

Development of strategy for quality measurement criteria in datasets retrieval available on government websites

This paper describes a study conducted by Open Data for Development documents, in the retrieval phase and focusing on data quality, especially in analyzing structures found on government datasets pages, in order to identify ways to measure data quality aspects.