Repository for academic documents, algorithms and data: A utility in an educational context


Ivan Jaramillo , Geovanny Brito , Anthony Pachay , Duval Carvajal

Universidad Técnica Estatal de Quevedo (Ecuador)

Received May 2022

Accepted March 2023


Data repositories currently constitute essential programs within institutions. In fact, universities are the primary institutions that promote the creation, management and storage for the safekeeping of a variety of documents, data and/or projects. This work is carried out within the framework of institutional need and the application of knowledge by the students and professors in the software degree program at a university in Ecuador. This project centers on a web application that makes it possible to manage, upload and download resources. This initial version focuses on the resources related to academic projects, data sets and algorithms. The scalability characteristics have been carefully introduced to ensure a progressive evolution of other types of resources and functionalities, in accordance with educational needs. Several aspects of the web interfaces are based on the characteristics of other existing repositories, and a series of tests related to functionality have been conducted in order to move beyond the introductory stage of use, to begin with, by the university community. This project is carried out according to a conception based on contributions, in which the agents, in principle, are members of the institution. The intensive and collaborative use of this utility will contribute to the growth of a data culture within the institution. Furthermore, the collection and organization of data for the purposes of study and research by the different university degree programs makes it possible to analyze and contrast them from different perspectives.


Keywords – Repository, Academic projects, University.

To cite this article:

Jaramillo, I., Brito, G., Pachay, A., & Carvajal, D. (2023). Repository for academic documents, algorithms and data: A utility in an educational context. Journal of Technology and Science Education, 13(3), 761-774.



    1. 1. Introduction

The evolution of technologies related to computer science has had a significant influence on innovation in organizations. The open access to information has been one of the main pillars; it has permitted the dissemination and sharing of research and scientific advances in different disciplines. However, this is more noticeable in developed countries, at prestigious universities around the world and in multinational industries. The invention of new data collection and organization models has allowed these countries to lead the evolutionary process.

The importance of data in our country is a source of concern, although there are strategic sectors associated with the government that already have specific data from past decades. These are not enough, and both public and private institutions must establish the concept of a future based on “data” within their vision. Universities are called upon to develop and exploit these technologies associated with the data-based revolution. It is here where this work takes on its importance. The implementation of this tool in an academic institution will allow different types of data to be collected, in addition to a collaborative focus, with which sustained growth can be achieved in terms of both budget and technology.

In addition, this work is set within the context of academic projects, with the participation of both the professor and the students in developing each of its phases. The main objective is the implementation of a digital resource repository that fulfills the requirements set out by both the institution itself and pertaining to the general aspects of this service.

Following the introduction, the second section presents the background information and concepts associated with “repositories”. These conceptions are closely related to the emergence of the so-called “Open access” movement, which are topics of interest during the process of building this type of resources. The methodology is described in the third section, based on a basic functional model of a digital repository, including its architecture and the set of software tests applied. Finally, the results and conclusions section presents the diagrams produced, as well as certain interfaces and evaluation tests obtained. These are used to draft the conclusions, which among other aspects, emphasize the operability of the application once the corrections are applied with respect to the results of an initial evaluation.

2. Related Works

The administration and use of digital information is the reason why repositories exist. The concept goes beyond an infrastructure based on hardware and software. Along these lines, (Abbott, 2006) indicates that the policies, procedures, people and services are what help make these resources sustainable, reliable, well organized and supported; however, the evolution of communication has given rise to changes in the regulatory concepts and availability policies for scientific contents. The Budapest Open Access Initiative (BOAI) (, 2012) originally gathered together different contributors from the field of academics, research, institutions and other entities in order to promote the so-called Open Access movement.

The Open Access initiative originated with the purpose of expanding and accelerating scientific progress, providing greater access to, and in some cases a venue for, the publication of research. The universities have been elemental agents in this trend, and around the world different topics have been the subjects of the modeling of robust academic repositories. Open Access Institutional Repositories (OAIR) (Farida, Tjakraatmadja, Firman & Basuki, 2015) is one model proposed for constructing a repository with an academic focus; its three basic pillars are focused on cooperation, technologies and processes. There is a noticeable focus on comprehensive repositories in institutions of higher education; in fact, this has pushed academic institutions toward new strategies to preserve their heritage collections (Sabharwal & Natal, 2017).

The term “open educational resources” (OER) is a formal designation for the different free forms in which educational institutions began to apply the concept of shared resources (Stacey, 2010). Among the main characteristics cited regarding OERs are accessibility, reusability, interoperability, sustainability and support for metadata. These characteristics must be analyzed in minute detail in order to prevent violations of intellectual property as much as possible, particularly when the resources are characterized as being reusable. In this sense, open licenses have become a basic mechanisms to ensure rights in web content-based environments (UNESCO, 2011). As an immediate effect of this, greater accessibility has given rise to the diversification of applications, primarily by educational institutions. The creation of software based on the models applied to different problems, designed by the same institutional agents, is a visible trend, particularly in universities (Jaramillo, Pico & de la Plata, 2017).

According to data reviewed on the OpenDOAR portal (Lund University, University of Nottingham & Sherpa Project, 2005), since 2005 the number of repositories around the world has grown exponentially. The portal collects registries of these repositories with open access documents, updating them periodically. In this manner, it is possible to have a vision of their evolution and situation at the time they are consulted. The number of repositories has grown from 78 in 2005 to 5753 in October 2021. Listing the countries by order in terms of the number of repositories, the USA comes in first with 913, followed by Japan and the United Kingdom; and from South America, Peru, Colombia, Brazil and Argentina are among the top twenty. Among the most prominent platforms are Dspace, with 39%; Eprints, with 11%; WEKO, with 9%; and the remaining 41% are divided among a variety of manufacturers (see Table 1). On a similar note, the types of contents are primarily journal articles, theses and dissertations, books, conferences, patent data, software, etc.

Software platform

Percent of worldwide usage







Digital Commons




Table 1. Distribution of repositories by platform (OpenDOAR)

The explosion of contents from the repositories complicates a taxonomic study of them from a thematic perspective, however, the most common is the division by type between institutional repositories (which are attributed to the digital contents of an organization) or thematic repositories (those that focus their content on a particular subject, such as ArXiv in physics, RePec in economics, E-LIS for documentation, etc.). Other authors add a third typology: ELIS in format repositories, which are limited to collecting documents in specific formats, such as doctoral theses, research data, digital images, etc. (Nicholas, Rowlands, Watkinson, Brown & Jamali, 2012). An adjustment was made to establish a classification that considers four types of repositories: thematic, institutional, research repositories (sponsored by financing institutions, for the purpose of revealing and sharing results of their research), and repositories from national systems, which are created to collect the academic results in a more general sense, and not merely for their preservation (Jain, 2011). Another mode is orphan repositories, which are used for the storage of digital objects by authors who have no other place to do so.

The repositories in an educational context may originate from both the different agents and the policies and regulations of the academic institutions themselves; however, it is important to stress the structural organization and the organization of contents as a common factor (Santos-Hermosa, Ferrán-Ferrer & Abadal, 2012). There is a lack of consensus on the criteria for the association or grouping of topics in the repository; in fact, the academic institutions in this sense may consider technological, economic or even operational resources.

Table 2 shows an alternative for classification and a representative association of the educational repositories, although it does not constitute a standard. However, the characteristics of the resources maintain this grouping.

The conception of shared and divided responsibilities for the uploading, maintenance and sustainability of an institutional repository is an effective strategy for educational settings that register resources other than scientific articles or theses reports (Miller, 2017). The addition of intelligent techniques to identify associations and/or interest groups is a necessary strategy. In fact, data mining mechanisms have evolved in order to adapt to the diversity of data types (Abaszade & Effati, 2019; Jaramillo, Garzás & Redchuk, 2021; Segura, Vidal-Castro, Menéndez-Domínguez, Campos & Prieto, 2011).

Type of repository

Associated institutions

Promoting or financing institutions



Educational institutions



Content format



Geo-political coverage




Deposited content

OCW (exclusively)

OERs by topic

OER search engines

Table 2. Classification and examples of OER repositories (Keefer, 2007)

A variety of tools have been created in accordance with diverse needs in terms of shared resources (Bashir, Gul, Bashir, Nisa & Ganaie, 2021; Boyd, 2021; Fan, Xia, Lo, Hassan & Li, 2021). Some of the tools used under the concept of free distribution software are DSpace, VuFind, Evergreen ILS, Fedora Commons, Invenio, Greenstone Digital Library, Ambra, E-Prints and RODA (Bankier & Gleason, 2014; Beazley, 2011; Pyrounakis, Nikolaidou & Hatzopoulos, 2014).

3. Methodology

From the perspective of educational projects, the creation of this utility is contextualized within products derived from integrative and research works, both in the classroom and autonomously. Once associated with methodological standards according to their nature, the work gathers certain conceptions of the software construction (Pressman, 2009a,b), while the adjustment of the development stages in theory is the product of a sustained administration of the resources typical of academic projects. The analysis, design, implementation and testing phases define the comprehensive development and these stages are generally found within the classic software creation methodologies. However, for reasons of simplicity, each is considered in a general manner, adapted to the institutional need.

3.1. Requirements Research

This section provides a summarized description of the requirement specifications for the application, also showing certain important elements of design that must be considered.

This utility consists of four domains: a resource registry, storage, validation for preservation and recovery. Figure 1 illustrates the relationship between each of these abstract representations of the problem.

Resources correspond to the elements of interest; they include academic projects, data set and algorithms.

  • Data sets are files in “.csv” format that contain data pertaining to a field of research, and which are made available to the registry by the user. The data must be valid and have the aforementioned format. A validation process is also necessary to check for the required elements that make it possible to describe the contribution in a summarized manner. 

  • Academic projects are the products resulting from a research or technological development work that may occur in different areas, in which students, professors and other professionals take part who are directly or indirectly related to the field of study. The project may consist of a set of documents and/or software. 

  • Algorithms are small programs written in a programming language that solve a specific problem related to a mathematical model. It also allows the storage of algorithms belonging to other authors under the license conditions established by intellectual property rights. 


Figure 1. Architectural view of the utility

The repository is accessible to both internal and external users, with internal users understood as anyone belonging to the institution, while external users are those persons outside the institution who are interested in contributing any of the resources. The users interact with the repository through a set of web interfaces.

Storage is associated with the contributions from users who provide resources, which constitute items made up by identifying data and a characterization of the resource; metadata also play a part for organizational purposes and recovery within the architecture.

  • Metadata. The definition of “metadata” is introduced for the purpose of building a mapping structure on collections associated with each type of resource. This contributes to the characteristic of scalability that we wish to establish in the architecture. 

  • Archive. The data referring to the resources are organized according to the basic file structure, even though no verification of the formats and structure exists. This favors the classic extensions typical of common programs for projects, data and code segments alike, using programming languages. 

  • Resource. The definition of a “resource” is a logically organized structure between the file and the associated metadata. 

  • Collection. This is the organized set of resources within a database structure. 

The preservation of information is defined as the permanent storage of the resources. In order for the elements to have this status, they must go through a process of verification and validation, which must encompass a set of actions carried out by the users with an administrative role. The verifications in this function are associated with ensuring:

  • Complete information from the resource provider 

  • Acceptance of responsibility for the data provided 

  • Consistent formats and structure of the data 

Validating users may perform three actions, depending on the information found in the registry.

  1. 1.Acceptance for the permanent registry and authorized access 

  2. 2.Elimination of the registry request as the result of inconsistencies in the data and resources 

  3. 3.Feedback with recommendations for better data consistency 

Data recovery is divided for two segments of users: public users, who do not need to register on the site and who can have access to a maximum of five resources; and registered users, who have full access to all three types of resources available.

The definition of user levels is a basic requirement in the application, and some of the authority levels have already been mentioned in the preceding paragraphs. Four user levels are necessary: root, administrators, registered users and unregistered public users (see Figure 2).

Figure 3 shows one of the usage cases developed in the analysis stage of the application. This case is an example that is related to the donation of a resource. In the detailed study, the following is specified for each type of resource, for user registries, recovery and preservation of data:

Figure 2. Levels of application users

Figure 3. Usage case related to resource donation

Note that the agents involved are the Root, Administrator and Users. The basic purpose is to register a resource. It begins when the user has the intention of providing a resource. To do this, the user must access the resource donation functions (according to the type) and enter the data on the form corresponding to the resource class (including the actions of uploading the required files). Next, the system conducts verifications of the requirements of both the form and the resource (conditions established according to the pre-defined rules that govern an initial verification phase). Finally, once this internal validation has been completed, the system proceeds to register the resource, establishing a status for verification and validation by a user with an administrator role.

Figure 4 shows the diagram of the blocks, noting the divisions in blocks for the data module, algorithms and projects, which have access to the connection block for database access. In addition, the server connection for the application is also indicated by means of a block corresponding to the web server.

Figure 4. Block diagram

In accordance with the design standards, in addition to the usage cases, the sequence diagrams, block diagrams, class diagrams and database diagrams are also defined.

Implementation and Testing

The modern “Java” programming language was used for both implementation on the server side and the client side, web services, data access and REST services as an intermediate layer between the web server and the database engine based on PostgreSQL.

The functionality of the application is one of the basic criteria within the evaluation of a software product. In principle, in order to introduce the exploitation of the application, functionality tests are planned and the following elements are determined:

  • Any action element located in the application must comply with standard interactivity and accessibility (links and controls)   

  • User accounts (creation, management, roles, password recovery) 

  • “Data set” module (donation, status management, downloading) 

  • “Projects” module (donation, status management, downloading) 

  • “Algorithms module” (donation, status management, downloading) 

For the functionality evaluation, a sample of approximately 20 users was used, who are completely unfamiliar with the application. The data collection instrument used contains a set of criteria and indicators (see Table 3); each criterion groups together a set of indicators (user actions in the application). In each indicator, the users register the number of successful attempts and the number of failed attempts.


Description of the indicator


Distinction among elements that execute actions and non-actions (the user is intuitively guided with regard to these elements)


The links provide the corresponding information (About us, Policies, Donate resource and Contact)


The quantitative information about the number of data sets (D), integration projects (P) and algorithms (A) is consistent.


It permits the free download (without registering) of a maximum of five products from each set (DPA)




User creation (intuitive registration)


Secure User and Password


Password recovery option


Information about requirements in the definition of users and passwords


Captcha option for human validation


Login with registered username


Password change


Registered users have full visibility of all products (DPA)


Registered users are allowed to download files associated with one of the products (DPA)


Registered users can see their contributions


The “Data set” module is reviewed


Enters the donation form


Adds Donor and specifies the product authors


Defines a data set name (maximum of 30 characters)


Defines a Summary (maximum of 150 words)


Defines a reference field (research that has been done with the data set)


Defines properties (types of attributes, possible and applied analysis task, format, area, etc.)


Defines a number of observations, number of attributes, year of creation


Defines information on the attributes


The upload of the file (specify the type, one or more, extension, size) is completed


Required fields are correctly defined


Completes the donation action


“Integration projects” module reviewed


Enters the donation form


Adds donor and specifies the project authors and coordinator


Defines a short title (maximum of 5 words)


Defines a long title (maximum of 15 words)


Defines the module in which the project is developed


There is a summary field (maximum of 150 words)


Defines a field to specify the objective of the project


Allows the entry of a maximum of 3 key words


There is a field for the year the project was created


The upload of the file (specify the type, one or more, extension, size) is completed


Required fields are correctly defined


Completes the donation action


“Algorithms” module is reviewed


Enters the donation form


Adds the donor and specifies authors


Defines a short name for the algorithm (maximum of 3 words)


Defines a long name (maximum of 7 words)


Defines a field for the description (maximum of 150 words)


Defines a field for relevant references pertaining to the algorithm


Specifies a programming language, as well as whether it is only pseudo code


Permits uploading several algorithm files


The upload of the file (Specifies the type, one or more, extension, size) is completed


Required fields are correctly defined


Donation action is completed

Table 3. Indicators of functionalities to evaluate the application

With the data collected in the #attempts, #successes and #failures variables for each indicator, calculations can be made that determine the success and failure rate. Equation 1 describes a calculation formula for the evaluation, where Ek,Ik is the number of successes and the number of attempts for the indicator k, respectively.



Alternatively, as needed, the failure rate can be deduced either by the complement of the success rate or also based on the collected variables. The variables that are identified in the equation can be the result of the evaluation by n users, in which case Ek is obtained based on Equation 2.



The same procedure is applied to both attempts and failures.

4. Results

The requirements and procedural stages described in the previous section have made it possible to obtain a design that meets the needs of the users. The cases of use cover the four main domains of the problem, which are user registry, sharing, preservation and recovery of resources.

The definition of classes was organized into four layers (Model, Data access, drivers and services), with class denominations related to the following elements of the problem domain:

  • Data sets 

  • Algorithms 

  • Projects 

  • Resources (files) 

  • Connection 

  • Person 

A set of twenty-one tables is used for data storage. The entities for informational data are aggregate, such as institution, college, degree program, and area, among others. These add identifying information for users at the time of registration.

The physical database is in PostgreSQL, and the physical implementation on the database service was achieved using assisted generation tools.

The home interface contains three main sections (see Figure 5): one is dedicated to the link options for each of the resources (projects, algorithms and data), another informs about the latest relevant contributions and one section provides public access to a specific number of shared resources.


Figure 5. Application home interface

The operational logic covers programming-related aspects for both the web application and the repository manager.

Figure 6. Navigation diagram

Several dynamic pages form part of the application. Figure 6 illustrates a map of nodes that represent important pages on the site. A description of the functions associated with the page has been provided in the additional boxes.

The functionality tests applied according to the indicators described in the Methodology section generated a data set with 6 variables. Table 4 describes each attribute and in the numerical values the calculations of the minimum, maximum and mean are added.




Unique identifier for the indicator; text-type

Description of the indicator

Text that indicates the definition of the indicator.

Number of tries

Discrete whole number variable that records the number of times the user tries the requested action

Min. 30, Max. 137, Mean 47.27

Number of successes

Discrete whole number variable that records the number of times the user was successful in the requested test

Min. 11, Max. 100, Mean 38.94

Number of failures

Discrete whole number variable that records the number of times the user was unsuccessful in the requested test

Min. 0, Max. 58, Mean 8.33


User comments on the indicator test

Table 4. Description of the variables of the data set generated for the test

Figure 7. Distribution of successful and failed attempts by evaluated module

Figure 7 shows the distribution of the success and failure rates on a bar graph. The values shown on the bars correspond to the success rate, while the darker section represents the failures (quantitatively, the difference between the total number of attempts and the total number of failures). A numerical view is shown in Table 5. In this case, the values have been calculated in relation to failure.


Total attempts

Failed attempts

% failures









Data set












Table 5. Failed attempts calculated by evaluated module

The group of selected users for the evaluation has a behavior pattern in the evaluation of between 75% and 100% on most of the criteria. The “general aspects” is observed to have the pattern with the greatest variance with regard to the rest.


Figure 8. Success rate of the criteria evaluated by a sample of users

The general aspects have the lowest success rates in a larger number of users who tried the application, and there are some less significant findings with regard to low scores on the “User management” and “Data set module” criteria (see Figure 8).

5. Conclusions

Through their substantial roles, universities generate a variety of resources, which require organization and secure storage on technologies that are conceived of as “repositories,” which enables the cooperative participation of the entire institutional community.

This work is a repository prototype developed with a modern service-based architecture, with the use of technologies to develop web applications. The documentation and scalable characteristics will make it possible to provide continuity to the project and gradually consolidate an identity in this product.

The primary evaluations were performed to introduce the product to academic use. Five criteria were created that gave rise to 48 indicators referring to the functionality of the application. Of the approximately 2270 evaluation actions performed by the users, 82.37% were successful, with the most failures occurring with regard to the general characteristics of the application and aspects related to user registration.

Finally, the digital organization of academic documents, data sets and algorithms is achievable with the use of the application. Furthermore, the cooperative participation of students, professors, researchers and members of the university community is promoted in order to share the resource, so that these are used for research purposes and studies with specific focuses on a particular profession.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.


The authors received no financial support for the research, authorship, and/or publication of this article.


Abaszade, M., & Effati, S. (2019). A New Method for Classifying Random Variables Based on Support Vector Machine. Journal of Classification, 36(1), 152-174.

Abbott, K. (2006). A Review of Employment Relations Theories and Their Application. Problems and Perspectives in Management, 4.

Bankier, J.G., & Gleason, K. (2014). Institutional repository software comparison. Unesco.

Bashir, S., Gul, S., Bashir, S., Nisa, N.T., & Ganaie, S.A. (2021). Evolution of institutional repositories: Managing institutional research output to remove the gap of academic elitism. Journal of Librarianship and Information Science, 0(0), 09610006211009592.

Beazley, M.R. (2011). Eprints Institutional Repository Software: A Review. Partnership: The Canadian Journal of Library and Information Practice and Research, 5(2).

Boyd, C. (2021). Understanding Research Data Repositories as Infrastructures. Proceedings of the Association for Information Science and Technology, 58(1), 25-35.

Fan, Y., Xia, X., Lo, D., Hassan, A.E., & Li, S. (2021). What makes a popular academic AI repository? Empirical Software Engineering, 26(1), 2.

Farida, I., Tjakraatmadja, J.H., Firman, A., & Basuki, S. (2015). A conceptual model of Open Access Institutional Repository in Indonesia academic libraries. Library Management, 36(1/2), 168-181.

Jain, P. (2011). New trends and future applications/directions of institutional repositories in academic institutions. Library Review, 60(2), 125-141.

Jaramillo, I.F, Pico, R.B., & de la Plata, C.V.M. (2017). A model for faculty evaluation in higher education ecuadorian through multi-criteria decision Analysis. Indian Journal of Science and Technology, 10(18), 1-8.

Jaramillo, I.F., Garzás, J., & Redchuk, A. (2021). Numerical Association Rule Mining from a Defined Schema Using the VMO Algorithm. Applied Sciences, 11(13). (2012). Budapest Open Access Initiative (2002). JLIS.It, 3(2).

Keefer, A. (2007). Los repositorios digitales universitarios y los autores. Anales de Documentación, 10, 205‑214.

Lund University, University of Nottingham, & Sherpa (Project) (2005). OpenDOAR: Directory of Open Access Repositories. OpenDOAR Statistics.

Miller, A. (2017). A case study in institutional repository content curation: A collaborative partner approach to preserving and sustaining digital scholarship. Digital Library Perspectives, 33, 63-76.

Nicholas, D., Rowlands, I., Watkinson, A., Brown, D., & Jamali, H.R. (2012). Digital repositories ten years on: What do scientific researchers think of them and how do they use them? Learned Publishing, 25, 195‑206.

Pressman, R. (2009a). Agile development. In Software Engineering: A Practitioner’s Approach (7th ed., 65-93). McGraw-Hill, Inc.

Pressman, R. (2009b). Process models. In Software Engineering: A Practitioner’s Approach (7th ed., 30-63). McGraw-Hill, Inc.

Pyrounakis, G., Nikolaidou, M., & Hatzopoulos, M. (2014). Building Digital Collections Using Open Source Digital Repository Software: A Comparative Study. International Journal of Digital Library Systems (IJDLS), 4(1), 10-24. Available at:

Sabharwal, A., & Natal, G. (2017). Integrating the IR into Strategic Goals at the University of Toledo: Case Study. Digital Library Perspectives, 33, 0.

Santos-Hermosa, G., Ferrán-Ferrer, N., & Abadal, E. (2012). Recursos educativos abiertos: repositorios y uso. Profesional de La Información, 21(2), 136-145.

Segura, A., Vidal-Castro, C., Menéndez-Domínguez, V., Campos, P.G., & Prieto, M. (2011). Using data mining techniques for exploring learning object repositories. The Electronic Library, 29(2), 162-180.

Stacey, P. (2010). Foundation funded OER vs. tax payer funded OER-A tale of two mandates.

UNESCO (2011). ICT in teacher education: Policy, open educational resources and partnership. Proceedings of the International Conference IITE-2010. St. Petersburg, Russian Federation.

Licencia de Creative Commons 

This work is licensed under a Creative Commons Attribution 4.0 International License

Journal of Technology and Science Education, 2011-2024

Online ISSN: 2013-6374; Print ISSN: 2014-5349; DL: B-2000-2012

Publisher: OmniaScience