CS231n assignment2 Q5 TensorFlow on CIFAR-10

Part II:Barebone TensorFlow

首先实现一个flatten函数:

def flatten(x):
    """    
    Input:
    - TensorFlow Tensor of shape (N, D1, ..., DM)
    
    Output:
    - TensorFlow Tensor of shape (N, D1 * ... * DM)
    """
    N = tf.shape(x)[0]
    return tf.reshape(x, (N, -1))

完成一个两层的全连接网络并测试:

def two_layer_fc(x, params):
    """
    A fully-connected neural network; the architecture is:
    fully-connected layer -> ReLU -> fully connected layer.
    Note that we only need to define the forward pass here; TensorFlow will take
    care of computing the gradients for us.
    
    The input to the network will be a minibatch of data, of shape
    (N, d1, ..., dM) where d1 * ... * dM = D. The hidden layer will have H units,
    and the output layer will produce scores for C classes.

    Inputs:
    - x: A TensorFlow Tensor of shape (N, d1, ..., dM) giving a minibatch of
      input data.
    - params: A list [w1, w2] of TensorFlow Tensors giving weights for the
      network, where w1 has shape (D, H) and w2 has shape (H, C).
    
    Returns:
    - scores: A TensorFlow Tensor of shape (N, C) giving classification scores
      for the input data x.
    """
    w1, w2 = params  # Unpack the parameters
    x = flatten(x)   # Flatten the input; now x has shape (N, D)
    h = tf.nn.relu(tf.matmul(x, w1)) # Hidden layer: h has shape (N, H)
    scores = tf.matmul(h, w2)        # Compute scores of shape (N, C)
    return scores

def two_layer_fc_test():
    # TensorFlow's default computational graph is essentially a hidden global
    # variable. To avoid adding to this default graph when you rerun this cell,
    # we clear the default graph before constructing the graph we care about.
    tf.reset_default_graph()
    hidden_layer_size = 42

    # Scoping our computational graph setup code under a tf.device context
    # manager lets us tell TensorFlow where we want these Tensors to be
    # placed.
    with tf.device(device):
        # Set up a placehoder for the input of the network, and constant
        # zero Tensors for the network weights. Here we declare w1 and w2
        # using tf.zeros instead of tf.placeholder as we've seen before - this
        # means that the values of w1 and w2 will be stored in the computational
        # graph itself and will persist across multiple runs of the graph; in
        # particular this means that we don't have to pass values for w1 and w2
        # using a feed_dict when we eventually run the graph.
        #这里w1,w2用tf.zeros来初始化,就不用去feed data了。
        x = tf.placeholder(tf.float32)
        w1 = tf.zeros((32 * 32 * 3, hidden_layer_size))
        w2 = tf.zeros((hidden_layer_size, 10))
        
        # Call our two_layer_fc function to set up the computational
        # graph for the forward pass of the network.
        scores = two_layer_fc(x, [w1, w2])
    
    # Use numpy to create some concrete data that we will pass to the
    # computational graph for the x placeholder.
    x_np = np.zeros((64, 32, 32, 3))
    with tf.Session() as sess:
        # The calls to tf.zeros above do not actually instantiate the values
        # for w1 and w2; the following line tells TensorFlow to instantiate
        # the values of all Tensors (like w1 and w2) that live in the graph.
        sess.run(tf.global_variables_initializer())
        #运行了这句话之后,tf.zeros才真正得到赋值。
        
        # Here we actually run the graph, using the feed_dict to pass the
        # value to bind to the placeholder for x; we ask TensorFlow to compute
        # the value of the scores Tensor, which it returns as a numpy array.
        scores_np = sess.run(scores, feed_dict={x: x_np})
        print(scores_np.shape)

two_layer_fc_test()

完成一个3层的卷积网络并测试:

网络结构如下:

  1. A convolutional layer (with bias) with channel_1 filters, each with shape KW1 x KH1, and zero-padding of two
  2. ReLU nonlinearity
  3. A convolutional layer (with bias) with channel_2 filters, each with shape KW2 x KH2, and zero-padding of one
  4. ReLU nonlinearity
  5. Fully-connected layer with bias, producing scores for C classes.
def three_layer_convnet(x, params):
    """
    A three-layer convolutional network with the architecture described above.
    
    Inputs:
    - x: A TensorFlow Tensor of shape (N, H, W, 3) giving a minibatch of images
    - params: A list of TensorFlow Tensors giving the weights and biases for the
      network; should contain the following:
      - conv_w1: TensorFlow Tensor of shape (KH1, KW1, 3, channel_1) giving
        weights for the first convolutional layer.
      - conv_b1: TensorFlow Tensor of shape (channel_1,) giving biases for the
        first convolutional layer.
      - conv_w2: TensorFlow Tensor of shape (KH2, KW2, channel_1, channel_2)
        giving weights for the second convolutional layer
      - conv_b2: TensorFlow Tensor of shape (channel_2,) giving biases for the
        second convolutional layer.
      - fc_w: TensorFlow Tensor giving weights for the fully-connected layer.
        Can you figure out what the shape should be? (channel_2 * * *,10)
      - fc_b: TensorFlow Tensor giving biases for the fully-connected layer.
        Can you figure out what the shape should be? (10,1)
    """
    conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b = params
    scores = None
    ############################################################################
    # TODO: Implement the forward pass for the three-layer ConvNet.            #
    ############################################################################
    h1 = tf.nn.conv2d(input = x,filter = conv_w1,strides = [1,1,1,1],padding = 'SAME',name = 'conv1') + conv_b1
    h11 = tf.nn.relu(h1)
    h2 = tf.nn.conv2d(input = h11,filter = conv_w2,strides = [1,1,1,1],padding = 'SAME' ,name = 'conv2') + conv_b2
    h22 = tf.nn.relu(h2)
    h = flatten(h22)
    scores = tf.matmul(h,fc_w) + fc_b
    ############################################################################
    #                              END OF YOUR CODE                            #
    ############################################################################
    return scores

def three_layer_convnet_test():
    tf.reset_default_graph()

    with tf.device(device):
        x = tf.placeholder(tf.float32)
        conv_w1 = tf.zeros((5, 5, 3, 6))
        conv_b1 = tf.zeros((6,))
        conv_w2 = tf.zeros((3, 3, 6, 9))
        conv_b2 = tf.zeros((9,))
        fc_w = tf.zeros((32 * 32 * 9, 10))
        fc_b = tf.zeros((10,))
        params = [conv_w1, conv_b1, conv_w2, conv_b2, fc_w, fc_b]
        scores = three_layer_convnet(x, params)

    # Inputs to convolutional layers are 4-dimensional arrays with shape
    # [batch_size, height, width, channels]
    x_np = np.zeros((64, 32, 32, 3))
    
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        scores_np = sess.run(scores, feed_dict={x: x_np})
        print('scores_np has shape: ', scores_np.shape)

with tf.device('/gpu:0'):
    three_layer_convnet_test()

完成train step,在一个step中会做这些事:

  1. Compute the loss
  2. Compute the gradient of the loss with respect to all network weights
  3. Make a weight update step using (stochastic) gradient descent.
def training_step(scores, y, params, learning_rate):
    """
    Set up the part of the computational graph which makes a training step.

    Inputs:
    - scores: TensorFlow Tensor of shape (N, C) giving classification scores for
      the model.
    - y: TensorFlow Tensor of shape (N,) giving ground-truth labels for scores;
      y[i] == c means that c is the correct class for scores[i].
    - params: List of TensorFlow Tensors giving the weights of the model
    - learning_rate: Python scalar giving the learning rate to use for gradient
      descent step.
      
    Returns:
    - loss: A TensorFlow Tensor of shape () (scalar) giving the loss for this
      batch of data; evaluating the loss also performs a gradient descent step
      on params (see above).
    """
    # First compute the loss; the first line gives losses for each example in
    # the minibatch, and the second averages the losses acros the batch
    losses = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
    loss = tf.reduce_mean(losses) #计算loss

    # Compute the gradient of the loss with respect to each parameter of the the
    # network. This is a very magical function call: TensorFlow internally
    # traverses the computational graph starting at loss backward to each element
    # of params, and uses backpropagation to figure out how to compute gradients;
    # it then adds new operations to the computational graph which compute the
    # requested gradients, and returns a list of TensorFlow Tensors that will
    # contain the requested gradients when evaluated.
    grad_params = tf.gradients(loss, params) #计算梯度
    
    # Make a gradient descent step on all of the model parameters.
    new_weights = []   
    for w, grad_w in zip(params, grad_params):  #更新参数
        new_w = tf.assign_sub(w, learning_rate * grad_w)
        new_weights.append(new_w)

    # Insert a control dependency so that evaluting the loss causes a weight
    # update to happen; see the discussion above.
    with tf.control_dependencies(new_weights): #建立更新权重和loss之间的依赖关系
        return tf.identity(loss)

完成train loop:

def train_part2(model_fn, init_fn, learning_rate):
    """
    Train a model on CIFAR-10.
    
    Inputs:
    - model_fn: A Python function that performs the forward pass of the model
      using TensorFlow; it should have the following signature: 我们设计的网络模型
      scores = model_fn(x, params) where x is a TensorFlow Tensor giving a
      minibatch of image data, params is a list of TensorFlow Tensors holding
      the model weights, and scores is a TensorFlow Tensor of shape (N, C)
      giving scores for all elements of x.
    - init_fn: A Python function that initializes the parameters of the model.
      It should have the signature params = init_fn() where params is a list
      of TensorFlow Tensors holding the (randomly initialized) weights of the
      model.   初始化参数的函数
    - learning_rate: Python float giving the learning rate to use for SGD.
    """
    # First clear the default graph
    tf.reset_default_graph()
    is_training = tf.placeholder(tf.bool, name='is_training')
    # Set up the computational graph for performing forward and backward passes,
    # and weight updates.
    with tf.device(device):
        # Set up placeholders for the data and labels
        x = tf.placeholder(tf.float32, [None, 32, 32, 3])
        y = tf.placeholder(tf.int32, [None])
        params = init_fn()           # Initialize the model parameters
        scores = model_fn(x, params) # Forward pass of the model
        loss = training_step(scores, y, params, learning_rate)

    # Now we actually run the graph many times using the training data
    with tf.Session() as sess:
        # Initialize variables that will live in the graph
        sess.run(tf.global_variables_initializer())
        for t, (x_np, y_np) in enumerate(train_dset):
            # Run the graph on a batch of training data; recall that asking
            # TensorFlow to evaluate loss will cause an SGD step to happen.
            feed_dict = {x: x_np, y: y_np}
            loss_np = sess.run(loss, feed_dict=feed_dict)
            
            # Periodically print the loss and check accuracy on the val set
            if t % print_every == 0:
                print('Iteration %d, loss = %.4f' % (t, loss_np))
                check_accuracy(sess, val_dset, x, scores, is_training)

Kaiming's normalization:

def kaiming_normal(shape):
    if len(shape) == 2:
        fan_in, fan_out = shape[0], shape[1]
    elif len(shape) == 4:
        fan_in, fan_out = np.prod(shape[:3]), shape[3]
    return tf.random_normal(shape) * np.sqrt(2.0 / fan_in)

训练我们的两层网络:

def two_layer_fc_init():
    """
    Initialize the weights of a two-layer network, for use with the
    two_layer_network function defined above.
    
    Inputs: None
    
    Returns: A list of:
    - w1: TensorFlow Variable giving the weights for the first layer
    - w2: TensorFlow Variable giving the weights for the second layer
    """
    hidden_layer_size = 4000
    w1 = tf.Variable(kaiming_normal((3 * 32 * 32, 4000)))
    w2 = tf.Variable(kaiming_normal((4000, 10)))
    return [w1, w2]

learning_rate = 1e-2
train_part2(two_layer_fc, two_layer_fc_init, learning_rate)

Iteration 0, loss = 2.8053

Got 134 / 1000 correct (13.40%)

Iteration 100, loss = 1.9526

Got 383 / 1000 correct (38.30%)

Iteration 200, loss = 1.4617

Got 393 / 1000 correct (39.30%)

Iteration 300, loss = 1.7108

Got 372 / 1000 correct (37.20%)

Iteration 400, loss = 1.8420

Got 421 / 1000 correct (42.10%)

Iteration 500, loss = 1.8536

Got 429 / 1000 correct (42.90%)

Iteration 600, loss = 1.8949

Got 413 / 1000 correct (41.30%)

Iteration 700, loss = 1.9321

Got 424 / 1000 correct (42.40%)

训练我们的三层网络:

def three_layer_convnet_init():
    """
    Initialize the weights of a Three-Layer ConvNet, for use with the
    three_layer_convnet function defined above.
    
    Inputs: None
    
    Returns a list containing:
    - conv_w1: TensorFlow Variable giving weights for the first conv layer
    - conv_b1: TensorFlow Variable giving biases for the first conv layer
    - conv_w2: TensorFlow Variable giving weights for the second conv layer
    - conv_b2: TensorFlow Variable giving biases for the second conv layer
    - fc_w: TensorFlow Variable giving weights for the fully-connected layer
    - fc_b: TensorFlow Variable giving biases for the fully-connected layer
    """
    params = None
    ############################################################################
    # TODO: Initialize the parameters of the three-layer network.              #
    ############################################################################
    w1 = tf.Variable(kaiming_normal((5,5,3,6)))
    b1 = tf.Variable(kaiming_normal((1,6)))
    w2 = tf.Variable(kaiming_normal((3,3,6,9)))
    b2 = tf.Variable(kaiming_normal((1,9)))
    w = tf.Variable(kaiming_normal((32 * 32 * 9,10)))
    b = tf.Variable(kaiming_normal((1,10)))
    params = [w1,b1,w2,b2,w,b]
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################
    return params

learning_rate = 3e-3
train_part2(three_layer_convnet, three_layer_convnet_init, learning_rate)

Iteration 0, loss = 3.4851

Got 96 / 1000 correct (9.60%)

Iteration 100, loss = 1.8512

Got 323 / 1000 correct (32.30%)

Iteration 200, loss = 1.6490

Got 372 / 1000 correct (37.20%)

Iteration 300, loss = 1.8010

Got 360 / 1000 correct (36.00%)

Iteration 400, loss = 1.8237

Got 394 / 1000 correct (39.40%)

Iteration 500, loss = 1.8371

Got 412 / 1000 correct (41.20%)

Iteration 600, loss = 1.7767

Got 428 / 1000 correct (42.80%)

Iteration 700, loss = 1.6171

Got 430 / 1000 correct (43.00%)

Part III: Keras Model API

使用Module API构建一个两层的全连接网络:

class TwoLayerFC(tf.keras.Model): #定义为一个类
    def __init__(self, hidden_size, num_classes): #定义网络结构
        super().__init__()        
        initializer = tf.variance_scaling_initializer(scale=2.0)
        self.fc1 = tf.layers.Dense(hidden_size, activation=tf.nn.relu,
                                   kernel_initializer=initializer) #定义了全连接层,使用relu和初始化方法
        #tf.layers.Dense是一个类
        self.fc2 = tf.layers.Dense(num_classes,
                                   kernel_initializer=initializer)
    def call(self, x, training=None): #然后调用
        x = tf.layers.flatten(x) #拉直x
        x = self.fc1(x)
        x = self.fc2(x)
        return x


def test_TwoLayerFC():
    """ A small unit test to exercise the TwoLayerFC model above. """
    tf.reset_default_graph()
    input_size, hidden_size, num_classes = 50, 42, 10

    # As usual in TensorFlow, we first need to define our computational graph.
    # To this end we first construct a TwoLayerFC object, then use it to construct
    # the scores Tensor.
    model = TwoLayerFC(hidden_size, num_classes)
    with tf.device(device):
        x = tf.zeros((64, input_size))
        scores = model(x)

    # Now that our computational graph has been defined we can run the graph
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        scores_np = sess.run(scores)
        print(scores_np.shape)
        
test_TwoLayerFC()

使用Funtional API构建一个两层的全连接网络:

def two_layer_fc_functional(inputs, hidden_size, num_classes): #定义为一个函数
    initializer = tf.variance_scaling_initializer(scale=2.0)
    flattened_inputs = tf.layers.flatten(inputs)
    fc1_output = tf.layers.dense(flattened_inputs, hidden_size, activation=tf.nn.relu,
                                 kernel_initializer=initializer)
    #tf.layers.dense 是一个函数
    scores = tf.layers.dense(fc1_output, num_classes,
                             kernel_initializer=initializer)
    return scores

def test_two_layer_fc_functional():
    """ A small unit test to exercise the TwoLayerFC model above. """
    tf.reset_default_graph()
    input_size, hidden_size, num_classes = 50, 42, 10

    # As usual in TensorFlow, we first need to define our computational graph.
    # To this end we first construct a two layer network graph by calling the
    # two_layer_network() function. This function constructs the computation
    # graph and outputs the score tensor.
    with tf.device(device):
        x = tf.zeros((64, input_size))
        scores = two_layer_fc_functional(x, hidden_size, num_classes)

    # Now that our computational graph has been defined we can run the graph
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        scores_np = sess.run(scores)
        print(scores_np.shape)
        
test_two_layer_fc_functional()

使用Keras Model API构建一个三层卷积网络:

  1. Convolutional layer with 5 x 5 kernels, with zero-padding of 2
  2. ReLU nonlinearity
  3. Convolutional layer with 3 x 3 kernels, with zero-padding of 1
  4. ReLU nonlinearity
  5. Fully-connected layer to give class scores
class ThreeLayerConvNet(tf.keras.Model):
    def __init__(self, channel_1, channel_2, num_classes):
        super().__init__()
        ########################################################################
        # TODO: Implement the __init__ method for a three-layer ConvNet. You   #
        # should instantiate layer objects to be used in the forward pass.     #
        ########################################################################
        initializer = tf.variance_scaling_initializer(scale=2.0)
        self.conv1 = tf.layers.Conv2D(filters = channel_1,kernel_size = [5,5],
                                      strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                                     use_bias = True,kernel_initializer = initializer,
                                      bias_initializer = initializer,name = 'conv1')
        self.conv2 = tf.layers.Conv2D(filters = channel_2,kernel_size = [3,3],
                                      strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                                     use_bias = True,kernel_initializer = initializer,
                                      bias_initializer = initializer,name = 'conv1')
        self.fc = tf.layers.Dense(units = num_classes,use_bias = True,
                                  kernel_initializer = initializer,bias_initializer = initializer,
                                  name = 'fc')
        ########################################################################
        #                           END OF YOUR CODE                           #
        ########################################################################
        
    def call(self, x, training=None):
        scores = None
        ########################################################################
        # TODO: Implement the forward pass for a three-layer ConvNet. You      #
        # should use the layer objects defined in the __init__ method.         #
        ########################################################################
        x = self.conv1(x)
        x = self.conv2(x)
        x = tf.layers.flatten(x)
        scores = self.fc(x)
        ########################################################################
        #                           END OF YOUR CODE                           #
        ########################################################################        
        return scores

Keras Model API: Training Loop

def train_part34(model_init_fn, optimizer_init_fn, num_epochs=1):
    """
    Simple training loop for use with models defined using tf.keras. It trains
    a model for one epoch on the CIFAR-10 training set and periodically checks
    accuracy on the CIFAR-10 validation set.
    
    Inputs:
    - model_init_fn: A function that takes no parameters; when called it
      constructs the model we want to train: model = model_init_fn()
    - optimizer_init_fn: A function which takes no parameters; when called it
      constructs the Optimizer object we will use to optimize the model:
      optimizer = optimizer_init_fn()
    - num_epochs: The number of epochs to train for
    
    Returns: Nothing, but prints progress during trainingn
    """
    tf.reset_default_graph()    
    with tf.device(device):
        # Construct the computational graph we will use to train the model. We
        # use the model_init_fn to construct the model, declare placeholders for
        # the data and labels
        x = tf.placeholder(tf.float32, [None, 32, 32, 3])
        y = tf.placeholder(tf.int32, [None])
        
        # We need a place holder to explicitly specify if the model is in the training
        # phase or not. This is because a number of layers behaves differently in
        # training and in testing, e.g., dropout and batch normalization.
        # We pass this variable to the computation graph through feed_dict as shown below.
        is_training = tf.placeholder(tf.bool, name='is_training')
        
        # Use the model function to build the forward pass.
        scores = model_init_fn(x, is_training)

        # Compute the loss like we did in Part II
        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=scores)
        loss = tf.reduce_mean(loss)

        # Use the optimizer_fn to construct an Optimizer, then use the optimizer
        # to set up the training step. Asking TensorFlow to evaluate the
        # train_op returned by optimizer.minimize(loss) will cause us to make a
        # single update step using the current minibatch of data.
        
        # Note that we use tf.control_dependencies to force the model to run
        # the tf.GraphKeys.UPDATE_OPS at each training step. tf.GraphKeys.UPDATE_OPS
        # holds the operators that update the states of the network.
        # For example, the tf.layers.batch_normalization function adds the running mean
        # and variance update operators to tf.GraphKeys.UPDATE_OPS.
        optimizer = optimizer_init_fn()
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            train_op = optimizer.minimize(loss)

    # Now we can run the computational graph many times to train the model.
    # When we call sess.run we ask it to evaluate train_op, which causes the
    # model to update.
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        t = 0
        for epoch in range(num_epochs):
            print('Starting epoch %d' % epoch)
            for x_np, y_np in train_dset:
                feed_dict = {x: x_np, y: y_np, is_training:1}
                loss_np, _ = sess.run([loss, train_op], feed_dict=feed_dict)
                if t % print_every == 0:
                    print('Iteration %d, loss = %.4f' % (t, loss_np))
                    check_accuracy(sess, val_dset, x, scores, is_training=is_training)
                    print()
                t += 1

Keras Model API: Train a Two-Layer Network

hidden_size, num_classes = 4000, 10
learning_rate = 1e-2

def model_init_fn(inputs, is_training):
    return TwoLayerFC(hidden_size, num_classes)(inputs)

def optimizer_init_fn():
    return tf.train.GradientDescentOptimizer(learning_rate)

train_part34(model_init_fn, optimizer_init_fn)

Starting epoch 0

Iteration 0, loss = 2.9554

Got 147 / 1000 correct (14.70%)

Iteration 100, loss = 1.8660

Got 374 / 1000 correct (37.40%)

Iteration 200, loss = 1.5924

Got 391 / 1000 correct (39.10%)

Iteration 300, loss = 1.8491

Got 390 / 1000 correct (39.00%)

Iteration 400, loss = 1.7189

Got 430 / 1000 correct (43.00%)

Iteration 500, loss = 1.7548

Got 432 / 1000 correct (43.20%)

Iteration 600, loss = 1.8440

Got 418 / 1000 correct (41.80%)

Iteration 700, loss = 1.9507

Got 451 / 1000 correct (45.10%)

Keras Model API: Train a Two-Layer Network (functional API)

hidden_size, num_classes = 4000, 10
learning_rate = 1e-2

def model_init_fn(inputs, is_training):
    return two_layer_fc_functional(inputs, hidden_size, num_classes)

def optimizer_init_fn():
    return tf.train.GradientDescentOptimizer(learning_rate)

train_part34(model_init_fn, optimizer_init_fn)

Starting epoch 0

Iteration 0, loss = 3.2064

Got 113 / 1000 correct (11.30%)

Iteration 100, loss = 1.8935

Got 374 / 1000 correct (37.40%)

Iteration 200, loss = 1.5011

Got 384 / 1000 correct (38.40%)

Iteration 300, loss = 1.9119

Got 359 / 1000 correct (35.90%)

Iteration 400, loss = 1.8919

Got 416 / 1000 correct (41.60%)

Iteration 500, loss = 1.7257

Got 430 / 1000 correct (43.00%)

Iteration 600, loss = 1.9092

Got 414 / 1000 correct (41.40%)

Iteration 700, loss = 2.0570

Got 449 / 1000 correct (44.90%)

Keras Model API: Train a Three-Layer ConvNet

learning_rate = 3e-3
channel_1, channel_2, num_classes = 32, 16, 10

def model_init_fn(inputs, is_training):
    model = None
    ############################################################################
    # TODO: Complete the implementation of model_fn.                           #
    ############################################################################
    model = ThreeLayerConvNet(channel_1,channel_2,num_classes)
    ############################################################################
    #                           END OF YOUR CODE                               #
    ############################################################################
    return model(inputs)

def optimizer_init_fn():
    optimizer = None
    ############################################################################
    # TODO: Complete the implementation of model_fn.                           #
    ############################################################################
    optimizer = tf.train.MomentumOptimizer(learning_rate= learning_rate,momentum = 0.9,use_nesterov = True)
    ############################################################################
    #                           END OF YOUR CODE                               #
    ############################################################################
    return optimizer

train_part34(model_init_fn, optimizer_init_fn)

Starting epoch 0

Iteration 0, loss = 3.5594

Got 81 / 1000 correct (8.10%)

Iteration 100, loss = 1.6427

Got 394 / 1000 correct (39.40%)

Iteration 200, loss = 1.4471

Got 453 / 1000 correct (45.30%)

Iteration 300, loss = 1.4377

Got 472 / 1000 correct (47.20%)

Iteration 400, loss = 1.4059

Got 489 / 1000 correct (48.90%)

Iteration 500, loss = 1.5382

Got 535 / 1000 correct (53.50%)

Iteration 600, loss = 1.3765

Got 525 / 1000 correct (52.50%)

Iteration 700, loss = 1.4015

Got 518 / 1000 correct (51.80%)

Part IV: Keras Sequential API

Keras Sequential API: Two-Layer Network

learning_rate = 1e-2

def model_init_fn(inputs, is_training): 
    input_shape = (32, 32, 3)
    hidden_layer_size, num_classes = 4000, 10
    initializer = tf.variance_scaling_initializer(scale=2.0)
    layers = [ #需要在第一层给出input_shape
        tf.layers.Flatten(input_shape=input_shape),
        tf.layers.Dense(hidden_layer_size, activation=tf.nn.relu,
                        kernel_initializer=initializer),
        tf.layers.Dense(num_classes, kernel_initializer=initializer),
    ]
    model = tf.keras.Sequential(layers)
    return model(inputs)

def optimizer_init_fn():
    return tf.train.GradientDescentOptimizer(learning_rate)

train_part34(model_init_fn, optimizer_init_fn)

Starting epoch 0

Iteration 0, loss = 3.0599

Got 138 / 1000 correct (13.80%)

Iteration 100, loss = 1.9839

Got 363 / 1000 correct (36.30%)

Iteration 200, loss = 1.4431

Got 389 / 1000 correct (38.90%)

Iteration 300, loss = 1.8575

Got 375 / 1000 correct (37.50%)

Iteration 400, loss = 1.7719

Got 413 / 1000 correct (41.30%)

Iteration 500, loss = 1.7979

Got 438 / 1000 correct (43.80%)

Iteration 600, loss = 1.8587

Got 418 / 1000 correct (41.80%)

Iteration 700, loss = 1.9053

Got 442 / 1000 correct (44.20%)

Keras Sequential API: Three-Layer ConvNet

  1. Convolutional layer with 16 5x5 kernels, using zero padding of 2
  2. ReLU nonlinearity
  3. Convolutional layer with 32 3x3 kernels, using zero padding of 1
  4. ReLU nonlinearity
  5. Fully-connected layer giving class scores
def model_init_fn(inputs, is_training):
    model = None
    ############################################################################
    # TODO: Construct a three-layer ConvNet using tf.keras.Sequential.         #
    ############################################################################
    initializer = tf.variance_scaling_initializer(scale=2.0)
    layers = [
        tf.layers.Conv2D(input_shape = (32,32,3),filters = 16,kernel_size = [5,5],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv1'),
        tf.layers.Conv2D(filters = 32,kernel_size = [5,5],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv2'),
        tf.layers.Flatten(),
        tf.layers.Dense(units = 10,use_bias = True,
                            kernel_initializer = initializer,bias_initializer = initializer,
                            name = 'fc')]
    model = tf.keras.Sequential(layers)
    ############################################################################
    #                            END OF YOUR CODE                              #
    ############################################################################
    return model(inputs)

learning_rate = 5e-4
def optimizer_init_fn():
    optimizer = None
    ############################################################################
    # TODO: Complete the implementation of model_fn.                           #
    ############################################################################
    optimizer = tf.train.MomentumOptimizer(learning_rate = learning_rate,momentum = 0.9,use_nesterov = True)
    ############################################################################
    #                           END OF YOUR CODE                               #
    ############################################################################
    return optimizer

train_part34(model_init_fn, optimizer_init_fn)

Starting epoch 0

Iteration 0, loss = 2.5582

Got 103 / 1000 correct (10.30%)

Iteration 100, loss = 1.5996

Got 403 / 1000 correct (40.30%)

Iteration 200, loss = 1.4355

Got 461 / 1000 correct (46.10%)

Iteration 300, loss = 1.5550

Got 493 / 1000 correct (49.30%)

Iteration 400, loss = 1.4755

Got 484 / 1000 correct (48.40%)

Iteration 500, loss = 1.5330

Got 505 / 1000 correct (50.50%)

Iteration 600, loss = 1.5811

Got 523 / 1000 correct (52.30%)

Iteration 700, loss = 1.3541

Got 529 / 1000 correct (52.90%)

Part V: CIFAR-10 open-ended challenge

def model_init_fn(inputs, is_training):
    model = None
    ############################################################################
    # TODO: Construct a model that performs well on CIFAR-10                   #
    ############################################################################
    initializer = tf.variance_scaling_initializer(scale=2.0)
    layers = [
        tf.layers.Conv2D(input_shape = (32,32,3),filters = 64,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv1'),
        tf.layers.Conv2D(filters = 64,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv2'),
        tf.layers.Conv2D(filters = 128,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv3'),
        tf.layers.MaxPooling2D(pool_size = [2,2],strides = [2,2],name = 'pool1'),        
        tf.layers.Conv2D(filters = 128,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv4'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv5'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv6'),
        tf.layers.MaxPooling2D(pool_size = [2,2],strides = [2,2],name = 'pool2'),  
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv7'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv8'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv9'),
        tf.layers.MaxPooling2D(pool_size = [2,2],strides = [2,2],name = 'pool3'),  
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv10'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv11'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv12'),
        tf.layers.Conv2D(filters = 256,kernel_size = [3,3],
                            strides = [1,1],padding = 'SAME',activation = tf.nn.relu,
                            use_bias = True,kernel_initializer = initializer,
                            bias_initializer = initializer,name = 'conv13'),
        tf.layers.Flatten(),
        tf.layers.Dense(units = 1024,use_bias = True,
                            kernel_initializer = initializer,bias_initializer = initializer,
                            name = 'fc1'),
        tf.layers.Dense(units = 1024,use_bias = True,
                            kernel_initializer = initializer,bias_initializer = initializer,
                            name = 'fc2'),
        tf.layers.Dense(units = 10,use_bias = True,
                            kernel_initializer = initializer,bias_initializer = initializer,
                            name = 'fc3')
    ]
    model = tf.keras.Sequential(layers)
    ############################################################################
    #                            END OF YOUR CODE                              #
    ############################################################################
    return model(inputs)

def optimizer_init_fn():
    optimizer = None
    ############################################################################
    # TODO: Construct an optimizer that performs well on CIFAR-10              #
    ############################################################################
    optimizer = tf.train.AdamOptimizer()
    ############################################################################
    #                            END OF YOUR CODE                              #
    ############################################################################
    return optimizer

device = '/gpu:0'
print_every = 700
num_epochs = 10
train_part34(model_init_fn, optimizer_init_fn, num_epochs)

Starting epoch 0

Iteration 0, loss = 3.8694

Got 79 / 1000 correct (7.90%)

Iteration 700, loss = 1.6052

Got 484 / 1000 correct (48.40%)

Starting epoch 1

Iteration 1400, loss = 1.0688

Got 616 / 1000 correct (61.60%)

Starting epoch 2

Iteration 2100, loss = 0.9978

Got 643 / 1000 correct (64.30%)

Starting epoch 3

Iteration 2800, loss = 0.8107

Got 678 / 1000 correct (67.80%)

Starting epoch 4

Iteration 3500, loss = 0.6718

Got 717 / 1000 correct (71.70%)

Starting epoch 5

Iteration 4200, loss = 0.3733

Got 750 / 1000 correct (75.00%)

Starting epoch 6

Iteration 4900, loss = 0.8152

Got 697 / 1000 correct (69.70%)

Starting epoch 7

Iteration 5600, loss = 0.3667

Got 704 / 1000 correct (70.40%)

Starting epoch 8

Iteration 6300, loss = 0.4429

Got 753 / 1000 correct (75.30%)

Starting epoch 9

Iteration 7000, loss = 0.4751

Got 761 / 1000 correct (76.10%)

16层的一个模型,包括13个卷积层,3个池化层,3个全连接层,使用adam来训练。

最终10个epoch准确率76.10%,还有很大的进步空间。