[wikidocs transformers] 1. 자연어처리와 transformer의 pipeline()

[wikidocs transformers] 1. 자연어처리와 transformer

자연어처리(nlp)란 말그대로 human language와 관련된 모든 것을 이해하는데 중점을 둔 언어학, 기계학습의 분야이다. 이때 자연어처리는 단일 단어를 개별적으로 이해하는 것은 물론 문장의 문맥을 이해하는 것도 목표로 한다.

자연어처리 작업의 종류로는 문장 분류(감성 분석, 스팸판단 등), 단일 문장 내 단어 분류, 텍스트 생성(번역, 요약, 마스킹 단어 완성 등), 텍스트 내 정답 추출 등이 있다.

또한 text에만 국한되는 것이 아닌 오디오 스크립트나 이미지 설명과 같은 음성인식, CV 영역의 문제도 해결할 수 있다.

이러한 자연어처리 작업을 수행하는데 있어 매우 유용한 라이브러리가 바로 HuggingFace의 transformers이다.

transformers가 할 수 있는 일들

transformers에서 이용자들은 자신이 만든 모델을 공유하고, 다른 사람이 공유한 모델을 사용할 수 있다. 또한 모델 뿐만이 아니라 모델 학습에 사용한 데이터셋도 공유할 수 있다.

pipeline() 함수

transformers의 가장 기본적인 객체

특정 모델의 동작에 필요한 데이터의 전처리 후처리 단계를 연결한다.

이를 통해 텍스트를 직접 입력해도 모델의 결과를 받을 수 있다. 이때 여러 문장을 동시에 전달할 수도 있다.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(["What are you doing now?", "I'm studying artificial intelligence"])

# return값
"""
[{'label': 'NEGATIVE', 'score': 0.9444786310195923},
 {'label': 'NEGATIVE', 'score': 0.7709186673164368}]
"""

위 예시의 코드를 통해 pipeline() 함수의 사용 과정을 쉽게 확인할 수 있다. pipeline 함수 내부에 "sentiment analysis" task를 파라미터로 넣고 두 문장에 대해 돌린 결과 두 문장 모두 negative로 예측하였다. 성능은 별로 좋지 않은듯하다.

pipeline task에는

['audio-classification', 'automatic-speech-recognition', 'conversational', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-segmentation', 'image-to-text', 'mask-generation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'text2text-generation', 'token-classification', 'translation', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection', 'translation_XX_to_YY']

와 같이 다양한 값을 사용할 수 있다

출처

NLP Frameworks: Hugging Face 기본 사용법

NLP Frameworks: Hugging Face 세상에는 하루가 갈수록 수많은 NLP 모델들이 쏟아져 나오고 있다. 이러한 모델들을 직접 짜보는 것은 실력 향상에 도움이 되지만, 시간적/자원적으로 매우 힘든 일이다. 그

itmaster98.tistory.com

1. 자연어처리 (Natural Language Processing)

트랜스포머 모델에 대해서 본격적으로 공부하기 전에, 자연어 처리(Natural Language Processing)가 무엇이고, 왜 우리가 이 기술에 대해서 관심을 가져야 하는지…

wikidocs.net

'practical AI > Hugging Face' 카테고리의 다른 글

NLP Frameworks: Hugging Face 기본 사용법 (0)	2023.09.14

Too scarce, still filling it up

[wikidocs transformers] 1. 자연어처리와 transformer의 pipeline()