Development of Data Collection, Review and Quality Assessment Interface and Tools for LLM Studies

Yüksek İrtifa

Talent Program

Artificial Intelligence

It is important that the review of data used in HLT trainings is not manual and can be done efficiently. HLT trainings require large amounts of data from various sources and not all of this data is of the same quality. The data collected needs to be formatted and categorized according to the training. In addition, quality and toxicity models are used to determine the quality of the data. Within the scope of the project, tools and interfaces for analyzing and evaluating LLM training data will be developed to meet these needs and new data will be processed for training. In addition, the developed interfaces are expected to have functions such as LLM model selection, review and DPO data collection for rapid testing of LLM models.