Domain Analysis of scientific production about Data Collecting on Institute of Electrical and Electronics Engineers context

The goal of this study is to identify scientific studies about the thematic of data collecting. For this purpose, it was adopted the domain analysis method on the scientific papers, by an application of Citation and Co-citation Analysis.
The identification of representatives from the thematic of data collecting and the existent dialog among them were obtained by authors and papers metadata sets processing, available on IEEE Xplore(r) Digital Library. As search strategy, it was used on advanced search the terms ‘Data Collecting’, ‘Data Collect’, and ‘Data Gathering’, concatenated by the boolean operator ‘OR’. This process recovered 2,278 scientific papers and the sample was set only by scientific papers published in scientific journals between the years 1954 and 2018, with a total of 281 papers.
For each paper, the reference section was collected in HTML document format. It was applied an algorithm to convert formats from HTML documents to CSV files and also to serialize the IEEE Editorial Style found on collected reference data. The algorithm processed 5,867 references and discarded 270 because they not fit into the IEEE Editorial Style standards adopted on serialization.
From this references, was identified a total of 8,267 authors. In Citation and Co-citation Analysis, it was applied the Price’s square root law to delimit the authors’ group to 91 participants, rounded to 94 participants because of the 91st participant had the same total of citation of his 3 successors.
After that, the “Cited and Who cited” and the “Absolute Frequency of Co-citation” matrices was generated from an application of an algorithm. By those data, the identification of nationality and the institutional affiliation were obtained by a manual process. Was calculated the social networks indexes i) Network Density, representing the relationship intensity between authors on the network and ii) Centrality Degree, representing the number of relationships received by an author.
The analyzed data resulted in a Network Density value of 3.20 with a standard deviation of 3.34, that is, each researcher has approximately 3 relationships with other network nodes. Also, the resulted value of Centrality Degree was 20.93%, demonstrating dispersion on the network, once that each node has 20.93% of probability to receive some interaction from the network.
This dispersion is associated with the analyzed domain amplitude, once that Data Collecting is a recurrent theme on distinct knowledge areas, but still adherent to IEEE scientific journals context. When results of the Centrality Degree of each author are analyzed, it is possible to observe a relationship between the results of received citations, indicating that the 13 best-ranked authors by Centrality Degree are also the most cited ones, representing 25.16% of all citations from the network. Also in this group was identified an average of 7.69% from the total of cites to each author, with amplitude varying between 6.12% and 11.76%.
It was concluded that this thematic, although widely cited, shows an American core, related to the institutions UC, USC, and MIT.

Keywords: Data Collecting. Domain Analysis. IEEE.


  1. Fernando de Assis Rodrigues
  2. Fábio Mosso Moreira
  3. Ricardo César Gonçalves Sant’Ana

Full text available at

  1. X EIICA
  2. Research Gate

Fernando de Assis Rodrigues, B.S., M.S., Ph.D., is a researcher at São Paulo State University (UNESP), Brazil.

Leave a Reply