page to understand ai model as caffe

Setup : initialize the layer and its connections once at model initialization.

Forward : given input from bottom compute the output and send to the top

Backward : given the gradient w.r.t the top output compute the gradient w.r.t to the input and send to the bottom. A layer with parameters computes the gradient w.r.t to its parameters and stores it internally.

More specifically, there will be two Forward and Backward functions implemented, one for CPU and one for GPU. If you do not implement a GPU version, the layer will fall back to the CPU functions as a backup option.

specifically : 분명하게

implement : 구현하다

fall back : 후퇴하다? -> 대체되다

this may come handy if you would like to do quick experiments, although it may come with additional data transfer cost(its inputs will be copied from GPU to CPU, and its output will be copied back from CPU to GPU).

Layers have two key responsibilities for the operation of the network as a whole : a forward pass that takes the inputs and produces the outputs, and a backward pass that takes the gradient with respect to the output, and computes the gradients with respect to the parameters and to the inputs, which are in turn back-proagated to earlier layers. These passes are simply the composition of each layer's forward and backward.

Developing custom layers requires minimal effort by the compositionality of the network and modularity of hte code. Define the setup, forward, and backward for the layer and it is ready for inclusion in a net.

Net definition and operation

the net jointly defines a function and its gradient by composition and auto-differentiation. The composition of every layer's output computes the function to do a given task, and the composition of every layer's backward computes the gradient from the loss to learn the task. Caffe models ard end - to - end machine learning engines.

The net is a set of layers connected in a computation graph - a directed acyclic graph(DAG) to be exact. Caffe does all the bookkeeping for any DAG of layers to ensure correctness of the forward and backward passes. A typical net begins with a data layer that loads from disk and ends with a loss layer that computes the objective for a task such as classification or reconstruction.

acyclic : 비순환적인

The net is defined as a set of layers and their connections in a plaintext modeling language. A simple logistic regression classifier.

plaintext : 평문

is defined by

name: "LogReg"
layer {
	name: "mnist"
    type: "Data"
    top: "data"
	top: "label"
    data_param {
    	source: "input_leveldb"
        batch_size: 64
	}
}
layer {
	name: "ip"
    type: "InnerProduct"
    bottom: "data"
    top: "ip"
    inner_product_param {
		num_output: 2
	}
}
layer {
	name: "loss"
	type: "SoftmaxWithLoss"
    bottom: "ip"
    bottom: "label"
    top: "loss"
}

Model initialization is handled by Net::Init(). The initialization mainly does two things: scaffolding the overall DAG by creating the blobs and layers(for C++ geeks: the network will retain ownership of the blobs and layers during its lifetime), and calls the layers' Setup() function. It also does a set of other bookkeeping things, such as validating the correctness of the overall network architecture. Also, during initialization the Net explains its initialzation by logging to INFO as it goes:

scaffolding : (건축공사장의) 비계

ownership : 소유권

I0902 22:52:17.931977 2079114000 net.cpp:39] Initializing net from parameters:
name: "LogReg"
[...model prototxt printout...]
# construct the network layer-by-layer
I0902 22:52:17.932152 2079114000 net.cpp:67] Creating Layer mnist
I0902 22:52:17.932165 2079114000 net.cpp:356] mnist -> data
I0902 22:52:17.932188 2079114000 net.cpp:356] mnist -> label
I0902 22:52:17.932200 2079114000 net.cpp:96] Setting up mnist
I0902 22:52:17.935807 2079114000 data_layer.cpp:135] Opening leveldb input_leveldb
I0902 22:52:17.937155 2079114000 data_layer.cpp:195] output data size: 64,1,28,28
I0902 22:52:17.938570 2079114000 net.cpp:103] Top shape: 64 1 28 28 (50176)
I0902 22:52:17.938593 2079114000 net.cpp:103] Top shape: 64 (64)
I0902 22:52:17.938611 2079114000 net.cpp:67] Creating Layer ip
I0902 22:52:17.938617 2079114000 net.cpp:394] ip <- data
I0902 22:52:17.939177 2079114000 net.cpp:356] ip -> ip
I0902 22:52:17.939196 2079114000 net.cpp:96] Setting up ip
I0902 22:52:17.940289 2079114000 net.cpp:103] Top shape: 64 2 (128)
I0902 22:52:17.941270 2079114000 net.cpp:67] Creating Layer loss
I0902 22:52:17.941305 2079114000 net.cpp:394] loss <- ip
I0902 22:52:17.941314 2079114000 net.cpp:394] loss <- label
I0902 22:52:17.941323 2079114000 net.cpp:356] loss -> loss
# set up the loss and configure the backward pass
I0902 22:52:17.941328 2079114000 net.cpp:96] Setting up loss
I0902 22:52:17.941328 2079114000 net.cpp:103] Top shape: (1)
I0902 22:52:17.941329 2079114000 net.cpp:109]     with loss weight 1
I0902 22:52:17.941779 2079114000 net.cpp:170] loss needs backward computation.
I0902 22:52:17.941787 2079114000 net.cpp:170] ip needs backward computation.
I0902 22:52:17.941794 2079114000 net.cpp:172] mnist does not need backward computation.
# determine outputs
I0902 22:52:17.941800 2079114000 net.cpp:208] This network produces output loss
# finish initialization and report memory usage
I0902 22:52:17.941810 2079114000 net.cpp:467] Collecting Learning Rate and Weight Decay.
I0902 22:52:17.941818 2079114000 net.cpp:219] Network initialization done.
I0902 22:52:17.941824 2079114000 net.cpp:220] Memory required for data: 201476

Note that the construction of the network is device agnostic - recall our earlier explanation that blobs and layers hide implementation details from the model definition. After construction, the network is run on either CPU or GPU by setting a single switch defined in Caffe::mode() and set by Caffe::set_mode(). Layers come with corresponding CPU and GPU routines that produce identical results(up to numerical errors, and with tests to guard it). The CPU / GPU switch is seamless and independent of the model definition. For research and deployment alike it is best to divide model and implementation.

seamless : 매끄러운

deployment : 전개

Model format

The models are defined in plaintext protocol buffer schema(prototxt) while the learned models are serialized as binary protocol buffer (binaryproto) .caffemodel files.

The model format is defined by the protobuf schema in caffe.proto. The source file is mostly self-explanatory so one is encouraged to check it out.

Caffe speaks Google Protocol Buffer for the following strengths: minimal-size binary strings when serialized, efficient serialization, human-readable text format compatible with the binary version, and efficient interface implementations in multiple languages, most notably C++ and Python. This all contributes to the flexibility and extensibility of modeling in Caffe.

compatible : (컴퓨터 용어) 호환가능한

notably : 특히

extensibility : 확장가능성

저작자표시 (새창열림)

'필사적 필사' 카테고리의 다른 글

DEBUG와 RELEASE 컴파일의 차이 (0)	2019.12.11
여러가지 확장자에 관하여 (0)	2019.12.11
python-ArgumentParser (0)	2019.12.03
opencv-python에서 특정 프레임 불러오기 (0)	2019.11.12
Entry point (0)	2019.11.01

Dyson's Techbook

page to understand ai model as caffe

'필사적 필사' 카테고리의 다른 글

티스토리툴바

page to understand ai model as caffe

'필사적 필사' 카테고리의 다른 글

'필사적 필사' Related Articles

티스토리툴바