Identifying semantic characteristics of user interaction datasets through application of a data analysis

In an evaluation of a decision, the analyzed fact need to receive inputs from multiple data sources – structuring, integrating, storing, and processing collected data into an output that supports a better understanding of the fact from data, allowing new dimensions of analysis.
The goal of this study is to identify the semantics characteristics of data attributes at the moment of collecting, from dataset’s structures found on data export interfaces on user’s interactions analysis tools, on Internet communication channels, and on web analytics data tools involved in a scientific journal management, through an application of a process of data analysis and data modeling techniques.
The research was delimited to exportable dataset’s available in interfaces from Open Journal Systems, Google Analytics and Search Console, Twitter Analytics, and Facebook Insights.
It was adopted an exploratory analysis methodology to identify characteristics about how data are available and structured on these data resources. Entity-Relationship Modeling concepts were applied to design and to store data collected from the services, resources, datasets, and attributes.
Also, the collected data was processed into another data structure, adopting the online analytical processing cube as a three-dimensional representation of elements, acting as perspectives of analysis.
This data analysis identified semantic dissonances on definitions of attributes on entities, that may interfering with the development process of relationships between attributes from different datasets, decreasing the potential of interoperability.

Keywords: Data Analysis. Data Collecting. Data. Online Social Networks. User data.

Authors

  1. Fernando de Assis Rodrigues
  2. Pedro Henrique Santos Bisi
  3. Ricardo César Gonçalves Sant’Ana

Full text available at

  1. ISKO
  2. Research Gate

Domain Analysis of scientific production about Data Collecting on Institute of Electrical and Electronics Engineers context

The goal of this study is to identify scientific studies about the thematic of data collecting. For this purpose, it was adopted the domain analysis method on the scientific papers, by an application of Citation and Co-citation Analysis.
The identification of representatives from the thematic of data collecting and the existent dialog among them were obtained by authors and papers metadata sets processing, available on IEEE Xplore(r) Digital Library. As search strategy, it was used on advanced search the terms ‘Data Collecting’, ‘Data Collect’, and ‘Data Gathering’, concatenated by the boolean operator ‘OR’. This process recovered 2,278 scientific papers and the sample was set only by scientific papers published in scientific journals between the years 1954 and 2018, with a total of 281 papers.
For each paper, the reference section was collected in HTML document format. It was applied an algorithm to convert formats from HTML documents to CSV files and also to serialize the IEEE Editorial Style found on collected reference data. The algorithm processed 5,867 references and discarded 270 because they not fit into the IEEE Editorial Style standards adopted on serialization.
From this references, was identified a total of 8,267 authors. In Citation and Co-citation Analysis, it was applied the Price’s square root law to delimit the authors’ group to 91 participants, rounded to 94 participants because of the 91st participant had the same total of citation of his 3 successors.
After that, the “Cited and Who cited” and the “Absolute Frequency of Co-citation” matrices was generated from an application of an algorithm. By those data, the identification of nationality and the institutional affiliation were obtained by a manual process. Was calculated the social networks indexes i) Network Density, representing the relationship intensity between authors on the network and ii) Centrality Degree, representing the number of relationships received by an author.
The analyzed data resulted in a Network Density value of 3.20 with a standard deviation of 3.34, that is, each researcher has approximately 3 relationships with other network nodes. Also, the resulted value of Centrality Degree was 20.93%, demonstrating dispersion on the network, once that each node has 20.93% of probability to receive some interaction from the network.
This dispersion is associated with the analyzed domain amplitude, once that Data Collecting is a recurrent theme on distinct knowledge areas, but still adherent to IEEE scientific journals context. When results of the Centrality Degree of each author are analyzed, it is possible to observe a relationship between the results of received citations, indicating that the 13 best-ranked authors by Centrality Degree are also the most cited ones, representing 25.16% of all citations from the network. Also in this group was identified an average of 7.69% from the total of cites to each author, with amplitude varying between 6.12% and 11.76%.
It was concluded that this thematic, although widely cited, shows an American core, related to the institutions UC, USC, and MIT.

Keywords: Data Collecting. Domain Analysis. IEEE.

Authors

  1. Fernando de Assis Rodrigues
  2. Fábio Mosso Moreira
  3. Ricardo César Gonçalves Sant’Ana

Full text available at

  1. X EIICA
  2. Research Gate

Expanded Publication in Thesis and Dissertations context

The objective of this research is to study aspects involved in the context of Enhanced Publication, especially for the cases of Thesis and Dissertations and the several documents that come to compose the new spectrum of elements of these results, providing subsidies for the development of a conceptual base that supports proposals of structures for the collection, storage and retrieval of this new set of documents that compose them, based on concepts already established in Information Science.

The methodology used was the exploratory and descriptive methodological triangulation, composed of (i) an identification of a theoretical reference, through a bibliographic survey on Portuguese, of studies available in the Google Scholar and SciELO databases about Enhanced Publications and results in Google Search, (ii) an analysis of requirements, using the phases and goals from the Data Life Cycle for Information Science (CVD), and (iii) a case study from the document sets of dissertations and thesis in Information Science Graduate Program of UNESP – São Paulo State University.

The results of this research are demonstrated in requirements analysis form, based on the use of graphs to elaborate of relations between the documents of the Enhanced Publication; and requirements in the collection, storage, and retrieval phases.

It is concluded that, explicit the requirements of the collection, storage, and retrieval phases, more concerns begin to show in the way in which the Enhanced Publications will be presented at the moment of the implementation of functionalities, such as the requirements for explicit aspects about privacy, as well as a greater detailing and explanation of actions in this area, and the delimitation required elements for the publication of Enhanced Publications.

Keywords: Enhanced Publication. Data. Thesis. Dissertation. Information Science.

Authors

  1. Fernando de Assis Rodrigues
  2. Ricardo César Gonçalves Sant’Ana

Full text available* at

* Only in Brazilian Portuguese.

  1. ITEC Journal
  2. Research Gate

Information and Technology: Thematic Course of IG 08

The thematic interest group course IG 08 was identified from the analysis of the papers presented in oral communication and poster, during the years 2008-2016. The qualitative and quantitative study took place from the domain analysis, to identify how the approach of technologies inside the IG 08 from seven categories of structured analysis from the menu of the IG and the proposal of Santos et al. (2013), identifying themes, authors, and institutions in each of them. This step of the study resulted in the mapping of the IG 08 – Information and Technology, with the display of the main approaches about the technologies in the context of IG 08, your distribution in terms of presentation and the rankings of authorship and of institutions.

Contextualizing theoretical concepts on Online Social Networks data collecting process

The use of Online Social Network services raise concerns in the way information from individuals is shared, such as starting from the process of collecting data from users that are stored in the institutions that own this services. The purpose of this study is to establish a contextualization of the concepts involved in the data collection available at online social network services, based on the analysis of content in technical-operational documents and in Terms of Use, and by an exploration of the characteristics of the data collection interfaces.