[Deep Learning, Pytorch] Custom Model 제작

2024-09-18

이미 만들어진 모델들을 활용하는 능력도 중요하지만, 이를 위해서는 모델링 관련 이론과 개념을 잘 알고 있어야 한다. 이 포스트에서는 Custom 모델을 설계할 수 있도록 Pytorch 에서 제공하는 클래스와 함수들을 정리하고 실제로 모델을 제작, 수정하는데 필요한 기술들을 정리해보자.

앞으로 여기서 정리한 내용들을 토대로 논문을 정리하면서 직접 구현해 볼 예정이다.

torch.nn

앞선 포스트에서 간단히 살펴본 torch.nn 은 공식문서에 basic building block 이라 소개되어 있다. 즉 Pytorch 에서 모델링을 위한 building block 들을 미리 만들어두고 torch.nn 에 묶어놓은 것이다.
공식문서에 보면 linear 부터 convolution, recurrent, transformer, normalization, dropout 등등 여러 기능을 하는 layer 들이 모여 있다.
앞으로 task 별 카테고리에 포스트를 정리하면서 논문을 구현할 것이고, 그 속에서 사용되는 layer 는 그 때 그 때 다뤄볼 예정이다.
여기서는 간단하게 Linear Layers 하위의 nn.Linear 와 nn.Identity 에 대해 알아보자.
torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None) 은 affine linear transformation 을 수행한다.
간단한 Regression 이나 Classification 에서 마지막 fully connected layer 에 사용된다.
\[y = xA^\top + b\]

X = torch.Tensor([[1, 2],
                  [3, 4]]) # torch.Size([2, 2])

linear = nn.Linear(2, 5)
output = linear(X)
output.size() # torch.Size([2, 5])

in_features 는 input 으로 들어오는 sample 수이고 out_features 는 일반적으로 class 의 개수로 표현되는 output sample 수다.
\[\begin{aligned} \text{Input} &: \; (*, H_{\text{in}}) \; \quad \text{where} \quad \text{* means any number of dimensions including none} \\ \text{Output} &: \; (*, H_{\text{out}}) \quad \text{where} \quad \text{all but the last dimension are the same shape as the input} \end{aligned}\]
위처럼 $H_{\text{in}}$ 은 in_features 이고 $H_{\text{out}}$ 은 out_features 다. 이 두가지 argument 만 건네줘도 되는 이유는 input 의 batch 만큼 동일하게 output 으로 나오기 때문이다.
nn.Linear 의 weight 는 $(\text{out_features, in_features})$ 의 형태이며 uniform distribution 으로 초기화된다.
\[\mathcal{U}(-\sqrt{k}, \sqrt{k}) \quad \text{where} \quad k = \frac{1}{\text{in_features}}\]
bias 의 경우 $(\text{out_features})$ 의 형태이며 bias=True 인 경우 weight 와 같은 방식으로 initialization 된다.
nn.Identity 는 argument-insensitive 한 layer 다. skip connection 기법에 사용된다.

X = torch.Tensor([[1, 2],
                  [3, 4]])

identity = nn.Identity()
output = identity(X)
# tensor([[1., 2.],
#         [3., 4.]])

추가적으로 nn.Linear 와 nn.LazyLinear 가 있는데 무슨 차이가 있는 것일까?
torch.nn.Linear
- 미리 입력값과 출력값의 크기를 지정해야 한다. torch.nn.Linear(5, 10)
- torch.nn.parameter 를 사용한다.
- layer 생성 시에 parameter(W, b) 를 초기화한다.
torch.nn.LazyLinear
- 출력값의 크기만 지정한다. 입력값의 크기는 첫 forward 를 진행할 때 자동으로 확인하여 지정된다. torch.nn.LazyLinear(10)
- torch.nn.UninitializedParameter 사용
- layer 생성 후 첫 forward 진행 중에 parameter(W, b) 가 초기화된다.
즉 nn.LazyLinear 는 입력값의 크기가 변화해도 스스로 그 크기 변화에 맞추어 적절한 크기의 Parameter(W, b)로 초기화하기 때문에 입력값의 크기에 유연하게 대응한다는 장점이 있다. 그러나 실제로 사용되는 것은 드물다.
그럼에도 Lazy 의 의미와 Uninitialized Parameter 에 대해서 알아두면 언젠가 쓰일 날이 있을지도 모른다.
지금은 forward 단계에서 뒤늦게 parameter 초기화를 할 수 있기 때문에 입력값에 유연하게 변화하는 모델 제작에 쓸 수 있다는 사실만 알지만 나중에 custom 모델을 만들다가 매우 기발한 활용처를 발견하게 될 수도 있다.
실제로 사용되지 않아서 모를 뿐 PyTorch 의 Documentation 을 직접 읽다보면 이처럼 숨어있는 기능들을 발견할 수 있다.

Module, Sequential, ModuleList, ModuleDict

torch.nn 에 있는 Module, Sequential, ModuleList, ModuleDict 는 모두 Network block 을 쌓기 위해 사용되는 클래스, Container 다.
즉 이를 이용하면 위에서 본 nn 에 정의되어 있는 layer 의 기능들을 한 곳에 모아 하나의 모델로 추상화할 수 있다.

import torch.nn as nn

# nn.Module
# nn.Sequential
# nn.ModuleList
# nn.ModuleDict

먼저 각 Container 들은 아래의 경우에 사용한다.
- Module : 여러 개의 작은 블록으로 구성된 큰 블록이 있을 때
- Sequential : layer 에서 작은 블록을 만들고 싶을 때
- ModuleList : 일부 layer 또는 빌딩 블록을 반복하면서 어떤 작업을 해야 할 때
- ModuleDict : 모델의 일부 블록을 parameterize 해야하는 경우 (예 : activation 기능)
이제 각 Container 에 대해 자세히 알아보자.

Module : The main building block

Module 은 가장 기본이 되는 block 단위다. 따라서 모든 Pytorch 의 기본 block 들은 Module 로부터 상속 받아서 사용되어 Network 를 만들 때 반드시 사용된다.
nn.Module 클래스는 여러 기능들을 한 곳에 모아놓는 상자 역할을 한다. 또한 nn.Module 은 다른 nn.Module 을 포함할 수도 있다.
어떻게 사용햐느냐에 따라 nn.Module 은 다른 의미를 가진다.
- nn.Module 에 기능들을 가득 모아놓은 경우 basic building block 이다.
- nn.Module 에 basic building block 인 nn.Module 들을 모아놓은 경우 딥러닝 모델 이 된다.
- nn.Module 에 딥러닝 모델 인 nn.Module 들을 모아놓은 경우 더욱 큰 딥러닝 모델 이 된다.
이처럼 nn.Module 은 빈 상자일 뿐 이를 어떻게 사용할지는 온전히 설계자의 몫이다. 기능 과 basic building block 과 딥러닝 모델 을 혼재해서 설계할 수도 있고, 기능 은 기능 끼리 block 은 block 끼리 계층적으로 담을 수도 있다.
nn.Module 를 이용해 간단한 모델을 제작해보고 어떻게 구성되어 있는지 분석해보자. 앞으로의 모델 예제들은 (1, 28, 28) 이미지인 MNIST 숫자 이미지 데이터셋을 대상으로 하는 모델들이다.

import torch
import torch.nn as nn
import torch.nn.functional as F

import torch
import torch.nn as nn
import torch.nn.functional as F

class CNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes=10):
        super().__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(in_c, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 28 * 28, 1024)
        self.fc2 = nn.Linear(1024, n_classes)
        
    def forward(self, x):
        # Convolutional layers
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))

        x = x.view(x.size(0), -1) # Flatten

        x = F.relu(self.fc1(x))
        x = self.fc2(x)          # No activation on the output layer (raw logits)
        return x

참고로 super().__init()__ 을 해야하는 이유는 해당 글을 참고하자.
- 간단히 말하면, super().__init()__ 은 custom 모델이 상속받는 nn.Module 의 기능을 불러오는 역할을 한다.
위 모델의 forward 를 보면 Convolution → Batch Normalization → ReLU 로 이어지는 블록을 차례대로 이은 구조다.
nn.Conv2d, nn.BatchNorm2d 과 같이 __init__ 에서 선언된 각 객체들이 Module block 이다.
그러나 위 코드를 보면 Convolution → BatchNorm → ReLu 블록이 이어져서 사용됨에도 불구하고 함수처럼 사용하지 못하는 것은 다소 비효율적으로 보인다. 이것을 개선하기 위해 Sequential 과 ModuleList 를 사용할 수 있다.

Sequential: stack and merge layers

Sequential 은 Module 들을 하나로 묶어 순차적으로 실행할 때 사용한다. Sequential 에 쌓은 순서대로 Module 이 실행되고 같은 Sequential 에 쌓인 Module 들은 한 단위처럼 실행된다.
따라서 Module 중에서 동시에 쓰이는 것을 Sequential 로 묶어서 사용하면 코드가 가독성이 좋아진다.
예를 들어 위 Module 로 설계한 custom model 에서 Convolution → Batch Normalization → ReLU 는 3개의 Module 이 연달아 사용되기 때문에 Sequential 로 묶어 하나의 단위처럼 생각할 수 있다.

import torch.nn as nn

class CNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes=10):
        super().__init__()
        self.conv_block1 = nn.Sequential(
            nn.Conv2d(in_c, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU()
        )
        
        self.conv_block2 = nn.Sequential(
            nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU()
        )
        
        self.decoder = nn.Sequential(
            nn.Linear(64 * 28 * 28, 1024),
            nn.ReLU(),
            nn.Linear(1024, n_classes)
        )

    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)

        x = x.view(x.size(0), -1) # Flatten

        x = self.decoder(x)
        return x

위 코드를 보면 __init__ 에서 Sequential 을 통해 단위별로 묶어 표현했다. 이를 통해 forward 에서 코드가 훨씬 간결해 진 것을 확인할 수 있다.
이 때 conv_block1 과 conv_block2 에서 중복되는 코드를 함수로 빼면 더 간결하게 쓸 수 있다.

def conv_block(in_f, out_f, *args, **kwargs):
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        nn.ReLU()
    )

class CNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes=10):
        super().__init__()
        self.conv_block1 = conv_block(in_c, 32, kernel_size=3, padding=1)
        
        self.conv_block2 = conv_block(32, 64, kernel_size=3, padding=1)

        
        self.decoder = nn.Sequential(
            nn.Linear(64 * 28 * 28, 1024),
            nn.ReLU(),
            nn.Linear(1024, n_classes)
        )
        
    def forward(self, x):
        x = self.conv_block1(x)
        x = self.conv_block2(x)

        x = x.view(x.size(0), -1) # Flatten
        
        x = self.decoder(x)
        return x

이제 더 큰 네트워크를 쌓기 위해서 코드를 더 깔끔하게 만들어보자. 이러한 기법들은 큰 네트워크를 쌓을 때 상당히 도움이 된다.

def conv_block(in_f, out_f, *args, **kwargs):
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        nn.ReLU()
    )

class CNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes=10):
        super().__init__()
        self.encoder = nn.Sequential(
            conv_block(in_c, 32, kernel_size=3, padding=1),
            conv_block(32, 64, kernel_size=3, padding=1)
        )

        self.decoder = nn.Sequential(
            nn.Linear(64 * 28 * 28, 1024),
            nn.ReLU(),
            nn.Linear(1024, n_classes)
        )

        
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # Flatten
        x = self.decoder(x)
        return x

만약 위 코드에서 사용된 self.encoder 부분이 계속 늘어난다면 어떻게 해야할까? 아래와 같이 단순히 코드를 나열하는 것은 좋은 방법이 아니다.

self.encoder = nn.Sequential(
            conv_block(in_c, 32, kernel_size=3, padding=1),
            conv_block(32, 64, kernel_size=3, padding=1),
            conv_block(64, 128, kernel_size=3, padding=1),
            conv_block(128, 256, kernel_size=3, padding=1),
        )

이 경우 반복문을 이용하여 코드를 간결하게 작성할 수 있다. 이 때 반복문을 진행하면서 변경해야 할 것은 input 과 output 의 channel 수다.
input 과 output 의 channel 수는 list 를 이용하여 정의하는 방법을 많이 사용한다. 핵심은 반복문을 사용하되 channel 의 크기는 미리 저장해두고 사용하는 것이다.

class CNNClassifier(nn.Module):
    def __init__(self, in_c, n_classes=10):
        super().__init__()

        self.enc_sizes = [in_c, 32, 64, 128, 256] # channel 크기
        conv_blocks = [conv_block(in_f, out_f, kernel_size=3, padding=1) for in_f, out_f in zip(self.enc_sizes, self.enc_sizes[1:])]

        self.encoder = nn.Sequential(*conv_blocks)
        
        self.decoder = nn.Sequential(
            nn.Linear(self.enc_sizes[-1] * 28 * 28, 1024),
            nn.ReLU(),
            nn.Linear(1024, n_classes)
        )
        
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # Flatten
        x = self.decoder(x)
        
        return x

위 코드에서 conv_blocks 은 self.enc_sizes list 에서 정의한 input / output channel 을 이용한 convolution 블록들을 담고 있다.
$n$ 번째 block 의 output channel 수가 $n+1$ 번째 block 의 input channel 수가 되기 때문에, 이러한 성질을 이용하여 list 를 zip 으로 교차해서 접근한다.
* 연산자는 unpack 을 뜻하며, 위 코드와 같이 list 와 같이 사용하면 편리하게 사용할 수 있다.

a = [1, 2, 3, 4, 5]
b = [10, *a]
print(b) # [10, 1, 2, 3, 4, 5]

최종적으로 encoder 와 decoder 를 분리하고 * 를 이용하여 코드를 간결하게 하면 아래와 같다.

def conv_block(in_f, out_f, *args, **kwargs):
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        nn.ReLU()
    )

def dec_block(in_f, out_f):
    return nn.Sequential(
        nn.Linear(in_f, out_f),
        nn.ReLU()
    )

class CNNClassifier(nn.Module):
    def __init__(self, in_c, enc_sizes, dec_sizes, n_classes=10):
        super().__init__()
        self.enc_sizes = [in_c, *enc_sizes]
        self.dec_sizes = [self.enc_sizes[-1] * 28 * 28, *dec_sizes]

        conv_blocks = [conv_block(in_f, out_f, kernel_size=3, padding=1) for in_f, out_f in zip(self.enc_sizes, self.enc_sizes[1:])]
        self.encoder = nn.Sequential(*conv_blocks)
        
        dec_blocks = [dec_block(in_f, out_f) for in_f, out_f in zip(self.dec_sizes, self.dec_sizes[1:])]
        self.decoder = nn.Sequential(*dec_blocks)
        
        self.last = nn.Linear(self.dec_sizes[-1], n_classes)
        
    def forward(self, x):
        x = self.encoder(x)
        x = x.view(x.size(0), -1) # Flatten
        x = self.decoder(x)
        return x

ModuleList : when we need to iterate

위의 Sequential 은 묶어놓은 Module 들을 차례대로 수행하기 때문에 실행 순서가 정해져있는 기능들을 하나로 묶어두기 좋다. 그러나 list 처럼 모아두기만 하고 원하는 것만 indexing 을 통해 쓰고 싶으면 어떻게 할까?
ModuleList 는 Module 을 list 형태로 담는다. Sequential 처럼 저장한 Module 을 차례대로 접근하거나 원하는 기능을 꺼내 실행할 때 사용할 수 있다.
ModuleList 와 Sequential 의 차이는 내부적으로 forward 연산의 발생하는지 유무다.
- Sequential 의 경우 Sequential 로 묶은 단위에서는 자동적으로 forward 연산이 발생하기 때문에 완전한 한 단위로 움직인다. 즉, Sequential 내부의 각 Module 에 접근하여 어떤 작업을 하는 것에 어려움이 있다.
- 반면 ModuleList 는 list 형태로 각 Module 에 접근하여 사용할 수 있다. 따라서 forward 함수에서 for 문을 통하여 iterate 하면서 Module 들을 실행한다.

import torch

class MyModule(nn.Module):
    def __init__(self, sizes):
        super().__init__()
        self.layers = nn.ModuleList([nn.Linear(in_f, out_f) for in_f, out_f in zip(sizes, sizes[1:])])
        self.trace = []
        
    def forward(self,x):
        # ModuleList 에서는 각 Module 에 접근 가능
        for layer in self.layers:
            x = layer(x)
            self.trace.append(x)
        return x

model = MyModule([1, 16, 32])
model(torch.rand((4,1)))

for trace in model.trace:
    print(trace.shape)
# torch.Size([4, 16])
# torch.Size([4, 32])

Python 의 list 와 Pytorch 의 Modulelist 를 굳이 분리한 이유는 아래의 코드를 통해 확인할 수 있다.

class Add(nn.Module):
    def __init__(self, value):
        super().__init__()
        self.value = value

    def forward(self, x):
        return x + self.value


class PythonList(nn.Module):
    """Python List"""
    def __init__(self):
        super().__init__()

        # Python List
        self.add_list = [Add(2), Add(3), Add(5)]

    def forward(self, x):
        x = self.add_list[1](x)
        x = self.add_list[0](x)
        x = self.add_list[2](x)

        return x

class PyTorchList(nn.Module):
    """PyTorch List"""
    def __init__(self):
        super().__init__()

        # Pytorch ModuleList
        self.add_list = nn.ModuleList([Add(2), Add(3), Add(5)])

    def forward(self, x):
        x = self.add_list[1](x)
        x = self.add_list[0](x)
        x = self.add_list[2](x)

        return x

x = torch.tensor([1])

python_list = PythonList()
pytorch_list = PyTorchList()

print(python_list(x), pytorch_list(x)) # tensor([11]) tensor([11])

python_list # PythonList()
pytorch_list 
# PyTorchList(
#   (add_list): ModuleList(
#     (0-2): 3 x Add()
#   )
# )

위 코드를 보면 알겠지만, 기능적으로는 완전히 동일하지만 nn.Module 이 submodule 로 등록이 되느냐 마느냐의 차이가 있다.
- Python list 및 list 에 담아놓은 모듈들 전체가 nn.Module 의 submodule 로 등록이 되지 않는다.
- PyTorch ModuleList 는 물론 ModuleList 내부에 담긴 Module 들이 nn.Module 의 submodule 로 등록된다.
따라서 Python list 를 사용할 경우, 모델을 출력해도 nn.Module 내부의 어떤 module 도 출력되지 않는다. 그 이유가 뭘까?
nn.Module 내부에서 새로운 변수를 만들 때 “변수 = 값” 의 형태로 코드를 적으면 __setattr__ 특수 메서드가 호출된다. 즉 self.add_list = nn.ModuleList([Add(2), Add(3), Add(5)]) 이렇게 변수를 할당하면 특수 메서드가 호출된다는 것이다.
nn.Module 클래스의 __setattr__ 함수에는 “값”의 타입을 체크해서 Module 인지 아닌지 체크하는 과정이 있다.
만약 Module 이면 submodule 로 등록하고 아니면 무시하고 넘어간다. 따라서 Python list 는 Module 이 아니기 때문에 무시되어 등록이 되지 않은 것이다.
이처럼 Python list 를 써서 submodule 로 등록이 안되면 모델을 저장할 때 list 에 포함된 어떠한 module 들도 저장되지 않는다.
PyTorch 의 nn.Module 들은 Graph 의 Node 처럼 서로 연결되어 유기적으로 동작하고 관리되기 때문에, 단순 list 를 사용하여 module 간에 연결고리가 끊어지는 일이 없도록 조심해야 한다.

ModuleDict: when we need to choose

ModuleDict 을 이용하면 Module 을 Dictionary 형태로 사용할 수 있다.

def conv_block(in_f, out_f, activation='relu', *args, **kwargs):
    
    activations = nn.ModuleDict([
                ['lrelu', nn.LeakyReLU()],
                ['relu', nn.ReLU()]
    ])
    
    return nn.Sequential(
        nn.Conv2d(in_f, out_f, *args, **kwargs),
        nn.BatchNorm2d(out_f),
        activations[activation]
    )

print(conv_block(1, 32,'lrelu', kernel_size=3, padding=1))
# Sequential(
#   (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#   (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#   (2): LeakyReLU(negative_slope=0.01)
# )
print(conv_block(1, 32,'relu', kernel_size=3, padding=1))
# Sequential(
#   (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#   (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#   (2): ReLU()
# )

조건문

Pytorch 는 동적 계산 그래프를 사용하기 때문에 if else 와 같은 조건문을 쉽게 사용할 수 있는 장점이 있다.
동적 계산 그래프는 반복할 때마다 변경이 가능한 그래프로, Pytorch 모델이 학습 중 hidden layer 을 추가하거나 제거해서 정확성과 일반성을 개선할 수 있도록 한다.
이는 Pytorch 가 각 epoch 단계에서 즉석으로 Computational Graph 를 재생성한다는 의미다.
아래의 코드를 보고 그 유용성을 확인해보자.

class Add(nn.Module):
    def __init__(self, value):
        super().__init__()
        self.value = value

    def forward(self, x):
        return x + self.value

class Sub(nn.Module):
    def __init__(self, value):
        super().__init__()
        self.value = value

    def forward(self, x):
        return x - self.value

class Calculator(nn.Module):
    def __init__(self, cal_type):
        super().__init__()
        self.cal_type = cal_type
        self.add = Add(3)
        self.sub = Sub(3)

    def forward(self, x):
        if self.cal_type == "add":
            x = self.add(x)
        elif self.cal_type == "sub":
            x = self.sub(x)
        else:
            raise ValueError(f"cal_type should be add or sub. entered {self.cal_type}")

        return x

x = torch.tensor([5])

calculator = Calculator("add")
add_output = calculator(x) # 8

calculator = Calculator("sub")
sub_output = calculator(x) # 2

Module 의 흐름

nn.Module 을 통해 만든 모델 곳곳에 print() 를 이용하면 Module 간의 연결 흐름을 알 수 있다.

# function
class Function_A(nn.Module):
    def __init__(self):
        super().__init__()
        print(f"        Function A Initialized")

    def forward(self, x):
        print(f"        Function A started")
        print(f"        Function A done")

class Function_B(nn.Module):
    def __init__(self):
        super().__init__()
        print(f"        Function B Initialized")

    def forward(self, x):
        print(f"        Function B started")
        print(f"        Function B done")

class Function_C(nn.Module):
    def __init__(self):
        super().__init__()
        print(f"        Function C Initialized")

    def forward(self, x):
        print(f"        Function C started")
        print(f"        Function C done")

class Function_D(nn.Module):
    def __init__(self):
        super().__init__()
        print(f"        Function D Initialized")

    def forward(self, x):
        print(f"        Function D started")
        print(f"        Function D done")

# layer
class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = Function_A()
        self.b = Function_B()
        print(f"    Layer AB Initialized")

    def forward(self, x):
        print(f"    Layer AB started")
        self.a(x)
        self.b(x)
        print(f"    Layer AB done")

class Layer_CD(nn.Module):
    def __init__(self):
        super().__init__()
        self.c = Function_C()
        self.d = Function_D()
        print(f"    Layer CD Initialized")

    def forward(self, x):
        print(f"    Layer CD started")
        self.c(x)
        self.d(x)
        print(f"    Layer CD done")

# Model
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.ab = Layer_AB()
        self.cd = Layer_CD()
        print(f"Model ABCD Initialized\n")

    def forward(self, x):
        print(f"Model ABCD started")
        self.ab(x)
        self.cd(x)
        print(f"Model ABCD done\n")

위처럼 nn.Module 을 가지고 function, layer, model 을 설계한 뒤, 모델에 input 을 넣어보자. 이와 같은 Module 실행 흐름을 이해하는 것은 복잡한 모델을 이해할 때 도움이 된다.

x = torch.tensor([7])

model = Model()
model(x)
#         Function A Initialized
#         Function B Initialized
#     Layer AB Initialized
#         Function C Initialized
#         Function D Initialized
#     Layer CD Initialized
# Model ABCD Initialized

# Model ABCD started
#     Layer AB started
#         Function A started
#         Function A done
#         Function B started
#         Function B done
#     Layer AB done
#     Layer CD started
#         Function C started
#         Function C done
#         Function D started
#         Function D done
#     Layer CD done
# Model ABCD done

Parameter

딥러닝은 loss function 을 이용해 최적의 parameter 를 찾는 것을 목적으로 한다. 그렇다면 모델에서 parameter 는 어떤 형태로 존재할까?
nn.Module 을 상속하여 Custom 모델을 정의하면, 모델 내에 선언한 PyTorch layer(nn.Linear, nn.Conv2d 등)의 Parameter(weight, bias 등)는 PyTorch 가 자동으로 생성해준다. 따라서 별도로 생성할 필요는 없다.
주요 동작 원리는 다음과 같다.
- 모듈 선언: 모델의 __init__() 메서드에서 nn.Linear, nn.Conv2d 등 PyTorch 의 layer 를 선언하면, 각 layer 에 필요한 parameter 들이 자동으로 초기화된다. 예를 들어, nn.Linear(10, 20) 를 선언하면 내부적으로 weight 와 bias tensor 가 자동으로 생성된다.
- parameter 관리: PyTorch 의 nn.Module 은 내부적으로 선언된 layer 의 parameter 를 모두 추적하고 관리한다. model.parameters() 또는 model.named_parameters() 를 호출하면 모델의 모든 parameter 를 확인할 수 있다.

import torch
import torch.nn as nn

class CustomModel(nn.Module):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.fc1 = nn.Linear(10, 20)
        self.fc2 = nn.Linear(20, 1)

    def forward(self, x):
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        return x

model = CustomModel()
for name, param in model.named_parameters():
    print(f"{name}: {param.shape}")
# fc1.weight: torch.Size([20, 10])
# fc1.bias: torch.Size([20])
# fc2.weight: torch.Size([1, 20])
# fc2.bias: torch.Size([1])

이처럼 각 layer 의 weight 행렬이 자동으로 생성된다. 또한 각 layer 의 편향(bias)도 기본적으로 생성되며, bias=False 옵션으로 비활성화할 수 있다.
기본적으로 PyTorch 는 weight 와 bias 를 Kaiming initialization 과 같이 특정한 초기화 방식으로 자동 설정한다. 필요하면 초기화 방식을 커스터마이징할 수도 있다.
따라서 nn.Module 에서 선언된 layer 는 내부적으로 필요한 parameter 들을 자동으로 생성하고 관리하므로, 사용자는 따로 생성하지 않아도 된다.
다만, parameter 를 수동으로 추가하려면 nn.Parameter 를 사용해 직접 정의할 수도 있다. 아래는 linear transformation 의 parameter 인 $w$ 와 $b$ 를 정의한 예제다.

class Linear(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()

        self.W = nn.Parameter(torch.ones((out_features, in_features))) # 1 로 초기화
        self.b = nn.Parameter(torch.ones(out_features)) # 1 로 초기화

    def forward(self, x):
        output = torch.addmm(self.b, x, self.W.T)
        return output

x = torch.Tensor([[1, 2],
                  [3, 4]])

linear = Linear(2, 3)
output = linear(x)
# torch.Tensor([[4, 4, 4],
#             [8, 8, 8]])

Tensor vs. Parameter vs. Buffer

위 예제들을 보면 모델의 parameter 는 단순 tensor 가 아닌 <class 'torch.nn.parameter.Parameter'> type 의 별개의 클래스를 사용한다. 왜 그럴까?

class Linear_Tensor(nn.Module):
    def __init__(self, in_features, out_features):
        super().__init__()

        # torch.Tensor
        self.W = torch.ones((out_features, in_features))
        self.b = torch.ones(out_features)

    def forward(self, x):
        output = torch.addmm(self.b, x, self.W.T)

        return output

x = torch.Tensor([[1, 2],
                  [3, 4]])

linear_parameter = Linear(2, 3) # 위 예제
linear_tensor = Linear_Tensor(2, 3)

output_parameter = linear_parameter(x)
output_tensor = linear_tensor(x)

print(output_parameter)
# tensor([[4., 4., 4.],
#         [8., 8., 8.]], grad_fn=<AddmmBackward0>)
print(output_tensor)
# tensor([[4., 4., 4.],
#         [8., 8., 8.]])

위 코드 출력 결과를 보면, 값은 동일하게 계산되지만 nn.Parameter 를 이용해서 parameter 를 만든 경우만 output tensor 에 gradient 를 계산하는 함수인 grad_fn 이 생성된다.

linear_parameter.state_dict()
# OrderedDict([('W',
#               tensor([[1., 1.],
#                       [1., 1.],
#                       [1., 1.]])),
#              ('b', tensor([1., 1., 1.]))])

linear_tensor.state_dict()
# OrderedDict()

또한 torch.Tensor 로 만든 $w$ 와 $b$ 는 저장되지 않는다.
이처럼 기능적으로는 완전히 동일하지만, nn.Parameter 로 지정된 tensor 의 경우 back propagation 에서 gradient 값을 계산하여 값을 업데이트 해주고 모델을 저장할 때 값을 저장해준다.
nn.Parameter 로 지정하지 않으면 tensor 의 값은 업데이트 되지 않고 모델을 저장할때도 철저히 무시된다.
이처럼 다 같은 tensor 지만 일종의 표식을 남겨두어 특별 취급을 해주는 것이다.
그러나 앞서 본 것 처럼 Custom 모델을 만들 때 대부분 torch.nn 에 구현된 layer 들을 가져다 사용하기 때문에 nn.Parameter 를 직접 다룰 일은 드물다. 즉, 직접 새로운 layer 를 작성할게 아니라면 nn..Parameter 를 사용할 일이 거의 없다.
다시 한 번, 일반적인 torch.Tensor 는 nn.Parameter 와 다르게 gradient 를 계산하지 않아 값이 업데이트 되지 않고, 모델을 저장할 때 무시된다. 그러나 nn.Parameter 로 지정하지 않아서 값이 업데이트 되지 않는다 하더라도 저장하고 싶은 tensor 가 있을 수도 있다.
그럴 때는 buffer 에 tensor 를 등록해주면 된다. 그렇게 되면 모델을 저장할 때 nn.Parameter 뿐만 아니라 buffer 로 등록된 tensor 들도 같이 저장된다.
buffer 를 등록할 때는 self.register_buffer(name, tensor, persistent=True) 로 등록하면 된다. persistent 는 buffer 가 module 의 state_dict 에서 저장할 것인지를 뜻하는 옵션이다.

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.parameter = Parameter(torch.Tensor([7]))
        self.tensor = torch.Tensor([7])
        self.register_buffer('buffer', torch.Tensor([7]), persistent=True)

model = Model()
model.state_dict()
# OrderedDict([('parameter', tensor([7.])), ('buffer', tensor([7.]))])

정리하면 다음과 같다.
- Tensor
  - ( X ) gradient 계산
  - ( X ) 값 업데이트
  - ( X ) 모델 저장 시 값 저장
- Parameter
  - ( O ) gradient 계산
  - ( O ) 값 업데이트
  - ( O ) 모델 저장 시 값 저장
- Buffer
  - ( X ) gradient 계산
  - ( X ) 값 업데이트
  - ( O ) 모델 저장 시 값 저장
이러한 buffer 는 사용되는 일이 드물지만, BatchNorm 등에서 사용될 수 있다. Pytorch 의 BatchNorm 소스 코드를 보면 mean 과 var 를 buffer 를 통해 저장한다.

모델 분석

우리가 Custom 모델을 만들었거나, pre-trained 된 모델을 쓸 때, 어떤 module 과 parameter 를 썼는지 분석하는 것이 필요하다.
다음 섹션들에서는 아래의 예제 코드를 활용하여 모델에 사용된 module 과 parameter 를 확인하는 방법을 정리한다.

# Function
class Function_A(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name

    def forward(self, x):
        x = x * 2
        return x

class Function_B(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = nn.Parameter(torch.Tensor([10]))
        self.W2 = nn.Parameter(torch.Tensor([2]))

    def forward(self, x):
        x = x / self.W1
        x = x / self.W2
        return x

class Function_C(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer('buffer', torch.Tensor([7]), persistent=True)

    def forward(self, x):
        x = x * self.buffer
        return x

class Function_D(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = nn.Parameter(torch.Tensor([3]))
        self.W2 = nn.Parameter(torch.Tensor([5]))
        self.c = Function_C()

    def forward(self, x):
        x = x + self.W1
        x = self.c(x)
        x = x / self.W2
        return x

# Layer
class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = Function_A('func_A')
        self.b = Function_B()

    def forward(self, x):
        x = self.a(x) / 5
        x = self.b(x)
        return x

class Layer_CD(nn.Module):
    def __init__(self):
        super().__init__()
        self.c = Function_C()
        self.d = Function_D()

    def forward(self, x):
        x = self.c(x)
        x = self.d(x) + 1
        return x

# Model
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.ab = Layer_AB()
        self.cd = Layer_CD()

    def forward(self, x):
        x = self.ab(x)
        x = self.cd(x)
        return x

x = torch.tensor([7])

model = Model()
model(x) # tensor([6.5720], grad_fn=<AddBackward0>)

named_children, named_modules, get_submodule

모델 내부의 module 목록을 보기 위해서 named_children 과 named_modules 를 사용할 수 있다.
children 은 한 단계 아래의 submodule 까지만 표시하고, modules 는 자신에게 속하는 모든 submodule 들을 표시해준다.

for name, module in model.named_modules():
    print(f"{name}: \n{module}")
    print("-" * 30)
# : 
# Model(
#   (ab): Layer_AB(
#     (a): Function_A()
#     (b): Function_B()
#   )
#   (cd): Layer_CD(
#     (c): Function_C()
#     (d): Function_D(
#       (c): Function_C()
#     )
#   )
# )
# ------------------------------
# ab: 
# Layer_AB(
#   (a): Function_A()
#   (b): Function_B()
# )
# ------------------------------
# ab.a: 
# Function_A()
# ------------------------------
# ab.b: 
# Function_B()
# ------------------------------
# cd: 
# Layer_CD(
#   (c): Function_C()
#   (d): Function_D(
#     (c): Function_C()
#   )
# )
# ------------------------------
# cd.c: 
# Function_C()
# ------------------------------
# cd.d: 
# Function_D(
#   (c): Function_C()
# )
# ------------------------------
# cd.d.c: 
# Function_C()
# ------------------------------

named_modules, named_children 는 module 의 이름도 같이 출력하는데, 단순 module 만 필요한 경우는 modules, children 를 사용하면 된다.
get_submodule 은 원하는 특정 module 만 가져올 수 있다.

submodule = model.get_submodule('ab.a')
submodule # Function_A()

named_parameters, get_parameter, named_buffers

위에서 본 parameter 와 buffer 또한 module 처럼 찾아낼 수 있다.
named_parameters() 를 이용하면 이름과 함께 해당 layer 에 속한 parameter 를 알 수 있기 때문에 편리하다. 마찬가지로 모든 parameter 를 보고 싶다면 parameters() 를 이용하자.

for name, parameter in model.named_parameters():
    print(f"{name} : \n{parameter}")
    print("-" * 30)

# ab.b.W1 : 
# Parameter containing:
# tensor([10.], requires_grad=True)
# ------------------------------
# ab.b.W2 : 
# Parameter containing:
# tensor([2.], requires_grad=True)
# ------------------------------
# cd.d.W1 : 
# Parameter containing:
# tensor([3.], requires_grad=True)
# ------------------------------
# cd.d.W2 : 
# Parameter containing:
# tensor([5.], requires_grad=True)

get_submodule 과 마찬가지로 위 parameter 의 name 과 get_parameter 를 사용하면 특정 layer 에 속한 parameter 를 불러올 수 있다.

parameter = model.get_parameter('ab.b.W1')
parameter
# Parameter containing:
# tensor([10.], requires_grad=True)

또한 예제 Model 의 Function_C 에는 register_buffer 를 통해 buffer 가 추가되어 있다. 이를 named_buffers() 로 불러올 수도 있다.

for name, buffer in model.named_buffers():
    print(f"{name} :\n{buffer}")
    print("-" * 30)
# cd.c.buffer :
# tensor([7.])
# ------------------------------
# cd.d.c.buffer :
# tensor([7.])

마찬가지로 get_buffer 로 특정 buffer 만 불러올 수 있다. 또한 모든 buffer 는 buffers() 로 접근할 수 있다.

buffer = model.get_buffer('cd.c.buffer')
buffer # tensor([7.])

Docstring

추가적으로 PyTorch 에서 제공하는 함수나 클래스들은 모두 Docstring 이 작성되어 있다. 그러나 Documentation 을 보는 것이 더 편리하기 때문에 굳이 이용하지는 않는다.
그럼에도 custom 모델을 만들 때 이 모델을 사용할 다른 개발자들과 미래의 자신을 위해서 Docstring 작성은 필수다. Docstring 을 확인할 때는 __doc__ 메서드로 접근한다.
또한 Documentation 이 없는 모델을 사용 중이라면 Docstring 을 Documentation 처럼 여기고 꼼꼼히 봐야 한다.

from torch import nn

layer = nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1)
print(layer.__doc__)
# Applies a 2D convolution over an input signal composed of several input 
#     planes.

#     In the simplest case, the output value of the layer with input size
#     :math:`(N, C_{\text{in}}, H, W)` and output :math:`(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})`
#     can be precisely described as:

#     .. math::
#         \text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) +
#         \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)

Docstring 을 추가하려면 Python 의 Docstring 방식을 사용하면 된다. 해당 글을 참조하자.

모델 수정

위에서 예제로 든 custom 모델은 Function_D 가 Function_C 를 포함하고 있는 구조였다. 아래 코드를 보자.
- 참고로, 모델의 하위 layer 에 접근할 때는 메서드에 접근하는 것처럼 . 을 통해 접근할 수 있다.

for name, module in model.cd.named_children():
    print(f'{name} :\n{module}')
    print("-" * 30)
# c :
# Function_C()
# ------------------------------
# d :
# Function_D(
#   (c): Function_C()
# )

이는 Function_D 에서 __init__ 내부에 Function_C 를 선언하고, forward 에서 연산에 활용하기 때문이다. 기본 단위인 module 끼리 참조되는 것을 제거하려면 어떻게 해야할까? 아래와 같이 수정할 수 있다.

class Function_C(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer('buffer', torch.Tensor([7]), persistent=True)

    def forward(self, x):
        x = x * self.buffer
        return x

class Function_D(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = nn.Parameter(torch.Tensor([3]))
        self.W2 = nn.Parameter(torch.Tensor([5]))
        self.c = Function_C() # 수정 전

    def forward(self, x):
        x = x + self.W1
        x = self.c(x)
        x = x / self.W2
        return x

class Function_D_modified(nn.Module):
    def __init__(self):
        super().__init__()
        self.W1 = nn.Parameter(torch.Tensor([3]))
        self.W2 = nn.Parameter(torch.Tensor([5]))
        self.register_buffer('buffer', torch.Tensor([7]), persistent=True) # 수정 후

    def forward(self, x):
        x = x + self.W1
        x = x * self.bffer
        x = x / self.W2
        return x

위처럼 Function_D 에도 buffer 를 통해 Function_C 와 같이 등록하고 연산에 활용하면 전체 연산의 결과는 동일해도 module 끼리 참조되는 것을 제거할 수 있다.
또한 원래 모델에서 Function_A 의 경우 instance 를 생성할 때 name 을 받도록 했었다. 그러나 모델을 출력하면 Function_A 의 이름이 나오지 않는다.

class Function_A(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name

    def forward(self, x):
        x = x * 2
        return x

class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = Function_A('func_A')
        self.b = Function_B()

    def forward(self, x):
        x = self.a(x) / 5
        x = self.b(x)
        return x
# Model(
#   (ab): Layer_AB(
#     (a): Function_A()
#     (b): Function_B()
#   )
#   (cd): Layer_CD(
#     (c): Function_C()
#     (d): Function_D(
#     )
#   )
# )

모델의 이름이 나오게 하려면 nn.Module 클래스의 __repr__ 관련 메서드인 extra_repr() 를 지정해주면 된다.

class Function_A(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name

    def forward(self, x):
        x = x * 2
        return x

    def extra_repr(self):
        return f'name={self.name}'

class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()

        self.a = Function_A('func_A')
        self.b = Function_B()

    def forward(self, x):
        x = self.a(x) / 5
        x = self.b(x)
        return x

# Model(
#   (ab): Layer_AB(
#     (a): Function_A(name=func_A)
#     (b): Function_B()
#   )
#   (cd): Layer_CD(
#     (c): Function_C()
#     (d): Function_D()
#   )
# )

유용한 기능

nn.Module 이 제공하는 유용한 기능들에 대해서 살펴보자.

hook

hook 이란 패키지화된 코드에서 다른 프로그래머가 custom 코드를 중간에 실행시킬 수 있도록 만들어 놓은 인터페이스다.
즉 프로그램의 실행 로직을 분석하거나, 프로그램에 추가적인 기능을 제공하고 싶을 때 hook 을 사용한다. Pytorch 에서도 이를 제공하고, Object Detection 이나 Segmentation 관련 라이브러리 모음인 MMdetection, MMsegmentation 등 MMFamily 에서도 hook 기능을 제공한다.
hook 의 작동 원리는 아래 코드를 통해 살펴볼 수 있다.

def program_A(x):
    print('program A processing!')
    return x + 3

def program_B(x):
    print('program B processing!')
    return x - 3

class Package(object):
    """
    프로그램 A 와 B 를 묶어놓은 패키지 코드
    """
    def __init__(self):
        self.programs = [program_A, program_B]
        self.hooks = []

    def __call__(self, x):
        for program in self.programs:
            x = program(x)
            # Package 를 사용하는 사람이 자신만의 custom program 을 등록할 수 있도록 미리 만들어 놓은 인터페이스 hook
            if self.hooks:
                for hook in self.hooks:
                    output = hook(x)
                    # return 값이 있는 hook 의 경우에만 x 를 업데이트
                    if output:
                        x = output
        return x

package = Package()
input = 3
output = package(input)
# program A processing!
# program B processing!
print(f"Package Process Result! [ input {input} ] [ output {output} ]")
# Package Process Result! [ input 3 ] [ output 3 ]

Package 는 하나의 예시로서 self.hooks 라는 변수를 가지고 있다. Package 를 실행하면 패키지에 포함된 프로그램 self.programs 을 하나씩 실행하면서 self.hooks 에 등록된 함수가 있는지 체크하게 된다. 즉 self.hooks 에 등록된 함수가 있으면 실행되고 등록된 함수가 없으면 무시된다.
이를 통해 Package 를 사용하는 개발자들이 자신의 custom 코드를 Package 중간에 실행할 수 있도록 Package 를 제작한 개발자들이 미리 만들어놓은 인터페이스인 것이다.
그렇다면 다른 여러 패키지들도 custom 코드를 패키지 내부에 추가하여 실행시킬 수 있도록 hook 이라 불리는 인터페이스를 가지고 있었을 수 있다.
실제로 CAM(Class Activation Mapping) 이라 불리는 CNN feature map 시각화 기술에도 hook 이 사용된다. 아래는 hook 의 사용 예시를 나타내는 예제 코드다.

# Hook - 프로그램의 실행 로직 분석 사용 예시
def hook_analysis(x):
    print(f'hook for analysis, current value is {x}')

# 생성된 패키지에 hook 추가
package.hooks = []
package.hooks.append(hook_analysis)

input = 3
output = package(input)
# program A processing!
# hook for analysis, current value is 6
# program B processing!
# hook for analysis, current value is 3
print(f"Package Process Result! [ input {input} ] [ output {output} ]")
# Package Process Result! [ input 3 ] [ output 3 ]

위 코드 결과를 보면 self.programs 안에 있는 프로그램들이 실행되면서 hook 도 같이 실행되는 것을 확인할 수 있다. 이러한 hook 에서 연산을 진행하여 결과를 다르게 할 수도 있다.

# Hook - 프로그램에 기능 추가 예시
def hook_multiply(x):
    print('hook for multiply')
    return x * 3

# 생성된 패키지에 hook 추가
package.hooks = []
package.hooks.append(hook_multiply)

input = 3
output = package(input)
# program A processing!
# hook for multiply
# program B processing!
# hook for multiply
print(f"Package Process Result! [ input {input} ] [ output {output} ]")
# Package Process Result! [ input 3 ] [ output 45 ]

당연히 이러한 hook 을 여러 개 적용시킬 수도 있다.

package.hooks = []
package.hooks.append(hook_multiply)
package.hooks.append(hook_analysis)

input = 3
output = package(input)
# program A processing!
# hook for multiply
# hook for analysis, current value is 18
# program B processing!
# hook for multiply
# hook for analysis, current value is 45
print(f"Package Process Result! [ input {input} ] [ output {output} ]")
# Package Process Result! [ input 3 ] [ output 45 ]

위 예제들은 프로그램이 실행되고 나서 custom 한 hook 을 실행할 수 있었다. 이 때 아래와 같이 Package 를 설계할 때 프로그램 실행 앞 뒤로 hook 을 넣어둔다면 프로그램 실행 전과 후 모두 custom 함수를 실행할 수 있다.

class Package(object):
    """
    프로그램 A와 B를 묶어놓은 패키지 코드
    """
    def __init__(self):
        self.programs = [program_A, program_B]
        # hooks
        self.pre_hooks = []
        self.hooks = []

    def __call__(self, x):
        for program in self.programs:
            # pre_hook
            if self.pre_hooks:
                for hook in self.pre_hooks:
                    output = hook(x)
                    if output:
                        x = output
            x = program(x)
            # hook
            if self.hooks:
                for hook in self.hooks:
                    output = hook(x)
                    if output:
                        x = output
        return x

이처럼 hook 을 어디에 심어놓을 것인지는 설계자의 마음이다. 위 예제들에서는 Package 를 __call__ 할 때 hook 인터페이스를 만들었지만 program 내부에 hook 인터페이스를 만들 수도 있다. 그렇게 되면 프로그램 별로 다른 hook 을 사용할 수 있기 때문에 더욱 custom 하기 좋다.

Pytorch hook

그렇다면 Pytorch 에는 어떤 hook 이 있을까? 크게 Tensor 에 적용하는 hook 과 Module 에 적용하는 hook 2 가지가 있다.
먼저 hook 을 추가하는 방법은 앞서 buffer 와 비슷하게 register_hook() 을 사용한다.
Tensor 에 등록하는 hook 은 backward hook 만 가능하다. 이러한 hook 은 해당 Tensor 가 gradient 를 계산할 때마다 호출된다. 아래 예제를 보자.

v = torch.tensor([0., 0., 0.], requires_grad=True)
h = v.register_hook(lambda grad: grad * 2)  # double the gradient
v.backward(torch.tensor([1., 2., 3.]))
v.grad # tensor([2., 4., 6.])

h.remove()  # removes the hook

register_hook(hook) 을 호출하게 되면 torch.utils.hooks.RemovableHandle 클래스의 handle 이라 불리는 것을 return 하게 된다. 이를 이용해서 위처럼 tensor 나 module 에 등록된 hook 을 제거할 수 있다.
또한 tensor 에 등록된 hook 을 확인하기 위해서 _backward_hooks 를 사용할 수 있다.

tensor = torch.rand(1, requires_grad=True)

def tensor_hook(grad):
    pass

h = tensor.register_hook(tensor_hook)

print(tensor._backward_hooks) # OrderedDict([(22, <function tensor_hook at 0x11f82fac0>)])
h.remove() # removes the hook
print(tensor._backward_hooks) # OrderedDict()

nn.Module 에 등록하는 모든 hook 은 __dict__ 을 이용하면 한번에 확인이 가능하다. __dict__ 에는 module 의 모든 변수와 parameter, hook 등 중요한 정보가 담겨있다. module 이 정보의 저장소로 이용하는 공간인 만큼 잊지 말고 적절하게 사용해보자.
nn.Module 에서 hook 과 관련한 함수는 공식문서에서 확인할 수 있다.

class Model(nn.Module):
    def __init__(self):
        super().__init__()

def module_hook(grad):
    pass

model = Model()
model.register_forward_pre_hook(module_hook)
model.register_forward_hook(module_hook)
model.register_full_backward_hook(module_hook)

model.__dict__
# {'training': True,
#  '_parameters': {},
#  '_buffers': {},
#  '_non_persistent_buffers_set': set(),
#  '_backward_pre_hooks': OrderedDict(),
#  '_backward_hooks': OrderedDict([(25, <function __main__.module_hook(grad)>)]),
#  '_is_full_backward_hook': False,
#  '_forward_hooks': OrderedDict([(24, <function __main__.module_hook(grad)>)]),
#  '_forward_hooks_with_kwargs': OrderedDict(),
#  '_forward_hooks_always_called': OrderedDict(),
#  '_forward_pre_hooks': OrderedDict([(23,
#                <function __main__.module_hook(grad)>)]),
#  '_forward_pre_hooks_with_kwargs': OrderedDict(),
#  '_state_dict_hooks': OrderedDict(),
#  '_state_dict_pre_hooks': OrderedDict(),
#  '_load_state_dict_pre_hooks': OrderedDict(),
#  '_load_state_dict_post_hooks': OrderedDict(),
#  '_modules': {}}

sorted([i for i in list(nn.Module.__dict__) if i.endswith("hook")])
# ['_maybe_warn_non_full_backward_hook',
#  '_register_load_state_dict_pre_hook',
#  '_register_state_dict_hook',
#  'register_backward_hook',
#  'register_forward_hook',
#  'register_forward_pre_hook',
#  'register_full_backward_hook',
#  'register_full_backward_pre_hook',
#  'register_load_state_dict_post_hook',
#  'register_state_dict_pre_hook']

__dict__ 의 출력 결과를 통해 아래와 같은 것을 확인할 수 있다.
- forward_pre_hook
- forward_hook
- full_backward_pre_hook
- full_backward_hook
- backward_hook : register_backward_hook 은 register_full_backward_pre_hook 으로 대체될 예정이다.(deprecated)
- state_dict_hook : used internally
이름을 보면 언제 사용되는 hook 인지 유추할 수 있다. 크게 forward 와 backward 시에 호출된다. forward 시에는 pre_hook 과 hook 이 있고 backward 에는 hook 만 존재한다.
state_dict 의 경우도 hook 이 있는데, 이를 직접 사용할 일은 거의 없고 load_state_dict() 함수가 내부적으로 사용한다고 한다. 아래 링크에 그 내용이 쓰여있다.
- Invoking Time of nn.Module _register_state_dict_hook() - PyTorch Forum
위에서 예시로 들었던 Package 와 같이 nn.Module 또한 매번 module 을 실행할 때마다 등록된 hook 이 있는지 없는지 체크하고 실행한다. 여기서 잘 정리해두고 앞으로 hook 을 활용해야 할 때 적재적소에 활용해보자.

forward hook

tensor 에는 적용할 수 없고, module 에만 적용할 수 있는 forward hook 이다. forward 전에 실행되는 forward_pre_hook 과 forward 후에 실행되는 forward_hook 이 있다.

class Add(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x1, x2):
        output = torch.add(x1, x2)
        return output

add = Add()
answer = []

def pre_hook(module, input):
    answer.extend(input)

def hook(module, input, output):
    answer.extend(output)

add.register_forward_pre_hook(pre_hook)
add.register_forward_hook(hook)

x1 = torch.rand(1)
x2 = torch.rand(1)
output = add(x1, x2)

print(answer) # [tensor([0.6022]), tensor([0.1822]), tensor(0.7844)]
print(answer == [x1, x2, output]) # True

위 예제처럼 hook 을 이용해서 모델을 통해 전파되는 값들을 저장할 수 있다. 또한 아래와 같이 전파되는 값을 수정하는 것도 간단하다.

add = Add()

def hook(module, input, output):
    return output + 5

add.register_forward_hook(hook)

x1 = torch.rand(1)
x2 = torch.rand(1)

output = add(x1, x2)
print(output) # tensor([6.1229])
print(output == x1 + x2 + 5) # True

추가적으로 forward_pre_hook 과 forward_hook 에서 hook 함수의 signature 는 아래의 형태를 따라야 한다.

# forward_pre_hook
hook(module, args) -> None or modified input
hook(module, args, kwargs) -> None or a tuple of modified input and kwargs

# forward_hook
hook(module, args, output) -> None or modified output
hook(module, args, kwargs, output) -> None or modified output

backward hook

앞선 forward hook 은 module 에만 적용할 수 있었지만 backward hook 은 Tensor 와 module 2 가지에 적용할 수 있다.
이러한 backward hook 은 보통 gradient 값에 추가적인 연산을 추가할 때 많이 사용한다. 그러나 module 단위의 backward hook 은 module 을 기준으로 input, output gradient 값만 가져와서 module 내부 tensor 의 gradient 값은 알아낼 수 없다.
그래서 Model 의 Parameter $W$ 의 gradient 값을 알고 싶다면 module 단위 backward hook 이 아닌 tensor 단위의 backward hook 을 사용해야 한다.
이를 통해 모델이 가진 어떤 tensor 에서도 원하는 gradient 값을 얻어낼 수 있고, gradient 값에도 연산을 취할 수 있다.

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.W = Parameter(torch.Tensor([5]))

    def forward(self, x1, x2):
        output = x1 * x2
        output = output * self.W
        return output

answer = []
model = Model()

def module_hook(module, grad_input, grad_output):
    answer.extend(grad_input)
    answer.extend(grad_output)

model.register_full_backward_hook(module_hook)

x1 = torch.rand(1, requires_grad=True)
x2 = torch.rand(1, requires_grad=True)

output = model(x1, x2)
output.retain_grad() # Enables this Tensor to have their grad populated during backward()
output.backward()
answer == [x1.grad, x2.grad, output.grad] # True

PyTorch 에서는 Computational Graph 를 기준으로 leaf tensor(보통 requires_grad=True 로 설정된 input tensor)에 대해서만 grad 를 저장한다. 위에서 output 은 중간 결과로 계산된 tensor 이기 때문에 기본적으로 grad 속성이 저장되지 않는다. 이에 따라 retain_grad() 를 해주어야 한다.
즉 output 은 중간 결과 tensor 이기 때문에 output.backward() 를 호출해도 output.grad 는 기본적으로 None 이다. 하지만, retain_grad() 를 호출하면 output 의 gradient 도 저장되도록 PyTorch 에 명시할 수 있다.
이렇게 하면, backward 연산 이후 output.grad 를 통해 output tensor 에 대한 gradient 를 확인할 수 있다.
이제 module 내부 tensor 의 gradient 값을 알아내기 위해 parameter 에 대해 tensor 단위의 backward hook 을 적용해보자.

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.W = Parameter(torch.Tensor([5]))

    def forward(self, x1, x2):
        output = x1 * x2
        output = output * self.W
        return output

answer = []
model = Model()

def tensor_hook(grad):
    answer.extend(grad)

model.W.register_hook(tensor_hook)

x1 = torch.rand(1, requires_grad=True)
x2 = torch.rand(1, requires_grad=True)

output = model(x1, x2)
output.backward()
answer == [model.W.grad] # True

마지막으로 back propagation 에서 gradient 에 연산을 취해보자.
여기서 주의할 점은 Module 에 적용된 backward hook 의 경우, hook 함수의 argument 인 input 과 output 을 inplace 로 수정해서 출력하면 오류를 발생하게 된다. 자세한 것은 공식문서를 확인해보자.

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.W = Parameter(torch.Tensor([5]))

    def forward(self, x1, x2):
        output = x1 * x2
        output = output * self.W
        return output

model = Model()

def module_hook(module, grad_input, grad_output):
    x1_grad, x2_grad = grad_input
    total_grad = x1_grad + x2_grad

    return x1_grad/total_grad, x2_grad/total_grad

model.register_full_backward_hook(module_hook)

x1 = torch.rand(1, requires_grad=True)
x2 = torch.rand(1, requires_grad=True)

output = model(x1, x2)
output.backward()
x1.grad + x2.grad == 1 # True

위에서 module_hook 의 grad_input 은 x1_grad, x2_grad 로 output 을 각 변수로 미분한 값이 되고, grad_output 은 output 이 스칼라 값(단일 값)일 경우 기본적으로 1.0 이 된다.
마지막으로 tensor 와 Module 에 취하는 backward hook 은 아래와 같은 형태의 signature 를 따라야한다.

# Tensor.register_hook (backward hook)
hook(grad) -> Tensor or None

# Module full_backward_pre_hook
hook(module, grad_output) -> tuple[Tensor] or None

# Module full_backward_hook
hook(module, grad_input, grad_output) -> tuple(Tensor) or None

hook 사용 예제

이제는 hook 을 통해 gradient 값의 변화를 시각화하거나, gradient 값이 특정 임계값을 넘으면 gradient exploding 경고 알림을 주거나, 특정 tensor 의 gradient 값이 너무 커지거나 작아지는 현상이 관측되면 해당 tensor 한정으로 gradient clipping 을 하는 것들을 시도할 수 있다.
아래 링크들에서는 hook 의 이용 사례를 잘 설명하고 있다.
- How to Use PyTorch Hooks - Medium
- PyTorch 101, Part 5: Understanding Hooks - Paperspace blog
먼저 모델 실행 중 출력(Verbose Model Execution)에 사용될 수 있다. 우리가 error 의 원인을 잡기 위해 print() 를 코드 중간 중간 넣어두곤 하는데, 이 때 hook 을 사용하면 코드가 더 가독성이 좋아지고 깔끔한 상태로 유지할 수 있다.
가장 큰 장점은 직접 설계한 모델 뿐 아니라 이미 만들어진 Pytorch Module 들에도 적용이 가능하다는 것이다.

import torch
from torchvision.models import resnet50

class VerboseExecution(nn.Module):
    def __init__(self, model: nn.Module):
        super().__init__()
        self.model = model

        # Register a hook for each layer
        for name, layer in self.model.named_children():
            layer.__name__ = name
            layer.register_forward_hook(
                lambda layer, _, output: print(f"{layer.__name__}: {output.shape}")
            )

    def forward(self, x: Tensor) -> Tensor:
        return self.model(x)

verbose_resnet = VerboseExecution(resnet50())
dummy_input = torch.ones(10, 3, 224, 224)

_ = verbose_resnet(dummy_input)
# conv1: torch.Size([10, 64, 112, 112])
# bn1: torch.Size([10, 64, 112, 112])
# relu: torch.Size([10, 64, 112, 112])
# maxpool: torch.Size([10, 64, 56, 56])
# layer1: torch.Size([10, 256, 56, 56])
# layer2: torch.Size([10, 512, 28, 28])
# layer3: torch.Size([10, 1024, 14, 14])
# layer4: torch.Size([10, 2048, 7, 7])
# avgpool: torch.Size([10, 2048, 1, 1])
# fc: torch.Size([10, 1000])

다음으로 Feature Extraction 에 사용될 수 있다. 일반적으로 backbone 을 pre-trained model 로 많이 사용하는데, hook 을 이용하면 backbone 으로부터 만들어진 feature map 을 추출해낼 수 있다. 이를 이용해서 CAM 이나 Visualization 에 사용할 수 있다.

from typing import Dict, Iterable, Callable

class FeatureExtractor(nn.Module):
    def __init__(self, model: nn.Module, layers: Iterable[str]):
        super().__init__()
        self.model = model
        self.layers = layers
        self._features = {layer: torch.empty(0) for layer in layers}

        for layer_id in layers:
            layer = dict([*self.model.named_modules()])[layer_id]
            layer.register_forward_hook(self.save_outputs_hook(layer_id))

    def save_outputs_hook(self, layer_id: str) -> Callable:
        def fn(_, __, output):
            self._features[layer_id] = output
        return fn

    def forward(self, x: Tensor) -> Dict[str, Tensor]:
        _ = self.model(x)
        return self._features

resnet_features = FeatureExtractor(resnet50(), layers=["layer4", "avgpool"])
features = resnet_features(dummy_input)

print({name: output.shape for name, output in features.items()})
# {'layer4': torch.Size([10, 2048, 7, 7]), 'avgpool': torch.Size([10, 2048, 1, 1])}

위 코드에서 dict([*self.model.named_modules()])[layer_id] 이 테크닉은 잘 기억해두자. * unpack 을 통해 (name, module) 형태를 만들고, dict() 을 통해 layer 의 이름으로 module 에 접근하는 테크닉이다.
이를 통해 _features 변수에 feature map 을 담게 된다.
마지막으로 Gradient Clipping 이 가능하다. Gradient Clipping 은 exploding gradient 를 완화하기 위한 방식이다. Pytorch 에서 gradient clipping 을 위한 여러 기능들을 제공하지만, hook 으로도 적용할 수 있다. 아래 예제를 보자.

def gradient_clipper(model: nn.Module, val: float) -> nn.Module:
    for parameter in model.parameters():
        parameter.register_hook(lambda grad: grad.clamp_(-val, val))
    
    return model

clipped_resnet = gradient_clipper(resnet50(), 0.01)
pred = clipped_resnet(dummy_input)
loss = pred.log().mean()
loss.backward()

print(clipped_resnet.fc.bias.grad[:25])
# tensor([-0.0010, -0.0047, -0.0010, -0.0009, -0.0015,  0.0027,  0.0017, -0.0023,
#          0.0051, -0.0007, -0.0057, -0.0010, -0.0039, -0.0100, -0.0018,  0.0062,
#          0.0034, -0.0010,  0.0052,  0.0021,  0.0010,  0.0017, -0.0100,  0.0021,
#          0.0020])

이는 Parameter Tensor 에 적용된 hook 으로써 loss.backward() 가 실행된 이후에 parameter gradient 에 적용된 것을 확인할 수 있다.

apply

지금까지 PyTorch 의 nn.Module 은 다른 module 을 포함할 수 있고 다른 module 속에 들어갈 수도 있음을 알았다.
하나의 module 에 다른 모든 module 들이 담기면 이 거대한 module 들의 집합을 모델이라고 부르게 된다. 즉 모델은 수많은 module 과 module 들이 서로 복잡하게 얽혀있는 트리(Tree) 혹은 그래프(Graph)라고 볼 수 있다.
모델에 무언가를 적용하면 맨 꼭대기의 module 하나가 아니라 모델을 구성하는 전체 module 에 모두 적용이 되어야 하고 nn.Module 의 method 들은 대부분 내부적으로 이를 지원한다.
예를 들어 .cpu() 를 맨 위 module 에 적용하면 module 이 그 아래에 존재하는 모든 module 에 .cpu() 를 적용하게 된다.
그러면 nn.Module 에 이미 구현되어 있는 method 가 아닌 custom 함수를 모델에 적용하고 싶다면 어떻게 하면 좋을까? 모델에 속하는 모든 module 에 일일이 함수를 적용해야 하는 것은 아니다. 이 때 사용하는 것이 바로 apply 다.

@torch.no_grad()
def init_weights(m):
    if type(m) == nn.Linear:
        m.weight.fill_(1.0) # inplace

net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net.apply(init_weights)

print(net)
# Sequential(
#   (0): Linear(in_features=2, out_features=2, bias=True)
#   (1): Linear(in_features=2, out_features=2, bias=True)
# )
for name, c in net.named_children():
    print(name)
    print(c.weight)
# 0
# Parameter containing:
# tensor([[1., 1.],
#         [1., 1.]], requires_grad=True)
# 1
# Parameter containing:
# tensor([[1., 1.],
#         [1., 1.]], requires_grad=True)

위 예제에서 볼 수 있듯 apply 를 통해 적용하는 함수는 module 을 입력으로 받는다. 모델의 모든 module 들을 순차적으로 입력받아서 처리하는 것이다.
이처럼 apply 함수는 일반적으로 가중치 초기화(Weight Initialization)에 많이 사용된다. 즉 Parameter 로 지정한 tensor 의 값을 원하는 값으로 지정해주는 것을 의미한다.
위 예제에서 @torch.no_grad() 는 데코레이터로, init_weights 함수가 모델의 weight 를 수정할 때 자동 미분(autograd)의 기록을 방지하기 위해서 사용한다.
- PyTorch 에서는 parameter 를 변경하면 기본적으로 이러한 연산을 기록하고 그래프를 생성하여 역전파를 준비한다.
- 그러나 weight 를 초기화하는 작업은 학습 중이 아닌, 모델 설정 단계에서 수행되는 연산으로서 이 과정에서는 역전파를 위한 그래프가 필요없다.
- 따라서 비효율적으로 추가 메모리를 사용하거나 불필요한 그래프를 생성하지 않도록 하기 위해서 데코레이터인 @torch.no_grad() 를 사용해서 감싸준다.
이전에 Function, Layer, Model 로 사용했던 예제에 apply 를 적용해보자.

def print_module(module):
    print(module)
    print("-" * 30)

returned_module = model.apply(print_module) # apply 에 전달된 함수가 적용된 module 을 return
# Function_A()
# ------------------------------
# Function_B()
# ------------------------------
# Layer_AB(
#   (a): Function_A()
#   (b): Function_B()
# )
# ------------------------------
# Function_C()
# ------------------------------
# Function_D()
# ------------------------------
# Layer_CD(
#   (c): Function_C()
#   (d): Function_D()
# )
# ------------------------------
# Model(
#   (ab): Layer_AB(
#     (a): Function_A()
#     (b): Function_B()
#   )
#   (cd): Layer_CD(
#     (c): Function_C()
#     (d): Function_D()
#   )
# )
# ------------------------------

위 출력 결과를 보면 알 수 있듯, apply 는 postorder traversal(후위순회) 방식으로 module 들에 함수를 적용하는 것을 알 수 있다.
이제 weight 를 초기화하거나 모델의 __repr__ 출력을 수정할 수도 있다.

model = Model()

# weight initialization
def weight_initialization(module):
    module_name = module.__class__.__name__

    if module_name.split('_')[0] == "Function":
        module.W.data.fill_(1.)

returned_module = model.apply(weight_initialization)

x = torch.rand(1)
output = model(x)
torch.isclose(output, x) # tensor([True])

# repr modification
from functools import partial

model = Model()

def function_repr(self):
    return f'name={self.name}'

def add_repr(module):
    module_name = module.__class__.__name__
    if module_name.split('_')[0] == "Function":
        module.extra_repr = partial(function_repr, module)

returned_module = model.apply(add_repr)
model_repr = repr(model)

위 예제에서 partial(func, *args, **kargs) 는 주어진 함수 func 와 동일한 기능을 하면서 항상 정해진 argument 를 가지는 함수를 return 한다.
- 이에 대해서는 해당 블로그를 작성하신 분이 잘 작성해주셨으니 참고해보자.
마지막으로 아래의 모델을 apply 와 hook 을 이용해서 linear transformation 으로 수행하게 만들 수 있다.

# Function
class Function_A(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name
        self.W = Parameter(torch.rand(2, 2))

    def forward(self, x):
        return x + self.W

class Function_B(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name
        self.W = Parameter(torch.rand(2, 2))

    def forward(self, x):
        return x - self.W

class Function_C(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name
        self.W = Parameter(torch.rand(2, 2))

    def forward(self, x):
        return x * self.W

class Function_D(nn.Module):
    def __init__(self, name):
        super().__init__()
        self.name = name
        self.W = Parameter(torch.rand(2, 2))

    def forward(self, x):
        return x / self.W

# Layer
class Layer_AB(nn.Module):
    def __init__(self):
        super().__init__()
        self.a = Function_A('plus')
        self.b = Function_B('substract')

    def forward(self, x):
        x = self.a(x)
        x = self.b(x)
        return x

class Layer_CD(nn.Module):
    def __init__(self):
        super().__init__()
        self.c = Function_C('multiply')
        self.d = Function_D('divide')

    def forward(self, x):
        x = self.c(x)
        x = self.d(x)
        return x

# Model
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.ab = Layer_AB()
        self.cd = Layer_CD()

    def forward(self, x):
        x = self.ab(x)
        x = self.cd(x)
        return x

위 모델을 function 단위에서 linear transformation 을 수행하게 만드려면, bias 를 추가하고 forward 에서 기존의 연산을 되돌리고 선형 변환을 수행해야 한다.
물론 Pytorch 에서 nn.Linear 로 간단하게 가능하지만, 이러한 작업 자체가 모델의 흐름 이해와 모델 설계, 모델 수정에 매우 도움이 된다.

model = Model()

def add_bias(module):
    """bias 추가"""
    module_name = module.__class__.__name__

    if module_name.split('_')[0] == "Function":
        module.register_parameter('b', nn.Parameter(torch.rand(2)))

def weight_initialization(module):
    """weight initialization 수행"""
    module_name = module.__class__.__name__
    
    if module_name.split('_')[0] == "Function":
        module.W.data.fill_(1.)
        module.b.data.fill_(1.)

def linear_transformation(module):
    module_name = module.__class__.__name__

    if module_name == "Function_A":
        def hook_A(module, input, output):
            W, b = module.W, module.b
            output = output - W # 연산 되돌리기
            output = torch.addmm(b, output, W.T) # 선형변환 수행
            return output

        module.register_forward_hook(hook_A) # forward 후 실행되는 hook

    elif module_name == "Function_B":
        def hook_B(module, input, output):
            W, b = module.W, module.b
            output = output + W
            output = torch.addmm(b, output, W.T)
            return output

        module.register_forward_hook(hook_B)

    elif module_name == "Function_C":
        def hook_C(module, input, output):
            W, b = module.W, module.b
            output = output / W
            output = torch.addmm(b, output, W.T)
            return output

        module.register_forward_hook(hook_C)

    elif module_name == "Function_D":
        def hook_D(module, input, output):
            W, b = module.W, module.b
            output = output * W
            output = torch.addmm(b, output, W.T)
            return output

        module.register_forward_hook(hook_D)

returned_module = model.apply(add_bias)
returned_module = model.apply(weight_initialization)
returned_module = model.apply(linear_transformation)

이처럼 apply 를 활용하면 원하는 method 를 모델의 원하는 module 에 추가할 수 있다. 이를 잘 활용하면 많이 사용하게 될 pre-trained 된 모델을 수정하는데 큰 도움이 된다.

License 및 Citation

앞으로 Custom 모델을 처음부터 설계할 일은 공부 목적 외에는 그리 많지 않을 것이다. Pre-trained 모델을 쓰거나, Github 에서 모델을 찾게 될텐데 이 때 주의해야 할 것이 License 와 Citation 이다.
Github 모델을 찾으면 먼저 라이센스를 확인해야 한다. 원작자가 이 모델을 자유롭게 쓸 수 있게 허용하는지, 아니면 코드를 가져다 쓰는데 제약을 걸어뒀는지 체크해야 후에 생길 법적 문제를 방지할 수 있다.
오픈소스 라이센스는 아래의 링크에서 확인할 수 있다.
- Choosing the right license - Github Docs
- Choose an open source license - Choose AI License
라이센스는 일반적으로 Github Repo 우측에서 확인할 수 있다. 만약 라이센스가 표시되지 않은 모델은 마음대로 사용하도록 허용된 것이 절대 아니다. 아래 링크를 참고하자.
- No License - Choose AI License
또한 Github 에서 레퍼런스(Reference) 모델을 성공적으로 찾은 이후 pip install, 복사 붙여넣기, 혹은 git clone 등으로 모델 코드를 가져온다.
이후 일반적으로 해당 레퍼런스의 링크를 남기거나 그냥 넘기는 경우가 있다. 그러나 원칙적으로 정확한 인용은 원작자에 대한 예의이기 때문에 인용(cite)을 정확하게 하는 방법을 알아보자.
PyTorch 나 Hugging Face 와 같이 유명하고 공식적인 Github 프로젝트들은 보통 Citation 을 제공해준다. 그러나 Reference 모델이 속해있는 각 Github 페이지별로 Citation 을 제공하지는 않는다. 따라서 Github 모델 인용을 Github 모델이 속해 있는 Repository 인용으로 대신해볼 수 있다.
아래는 Github Repository Citation 예시다.
- PyTorch/functorch - Github
- Huggingface/Transformers/citation - Github
위 예시들에서 알 수 있듯 Github Citation 의 대부분은 BibTex 형식을 따르기 때문에 아래 링크에서 text 로 변환해서 사용할 수 있다.
- BibTeX Online Converter
Citation 이 제공되지 않을 경우 직접 인용문을 만들어내는 것은 까다롭다. 인용 방법도 제각각이고 그 방법에 맞추는 것도 번거롭기 때문이다. 따라서 일반적으로 대충 인용하고 넘기는 경우가 많다.
그러나 제대로 인용하는 방법에 대해서 알아두는 것은 필수다. 아래 글들을 참고하자.
Citation 직접 작성
- How to Cite a GitHub Repository - Wiki How
자동 완성 Citation
- Free Harvard Citation Generator - Cite This For Me

Reference

네이버 부스트캠프 AI Tech

맨 위로 이동 ↑

Bkkhyunn