Deep Learning with Pytorch (Example implementations)

undefined August 20, 2020 View/edit this page on Colab

Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

This post implements the examples and exercises in the book “Deep Learning with Pytorch” by Eli Stevens, Luca Antiga, and Thomas Viehmann.

What I love the most about this intro-level book is its interesting hand-drawing diagrams that illustrates different types of neural networks and machine learning pipeline, and it uses real-world, real-scale machine learning problems as the examples.

The difference between traditional machine learning (a.k.a feature engineering) and deep learning

dl_pytorch_c2_0

– From the book “Deep Learning with Pytorch”

Chapter I: Setting up pytorch environment

I like very much this lovely diagram showing the general end-to-end data flow pipeline of deep learning applications.

c1_diagram

– From the book “Deep Learning with Pytorch”

Input code

import os
import torch
torch.cuda.is_available()

Output results

False

Chapter II: Using pretrained models

It is only about the inference process when we only use the pretrained deep learning models. I think it is actually a better way to step into deep learning algorithms for novices because it draws users’ attention on what the network can achieve at first and it also separates the concern of data preprocessing from the training process, which both could be tedious and complicated.

The inference process of image classification using deep learning

dl_pytorch_c2_1

– From the book “Deep Learning with Pytorch”

Image classifications with AlexNet and ResNet101

Input code

import torchvision as vis

Input code

vis_models = vis.models
dir(vis_models)

Output results

['AlexNet',
 'DenseNet',
 'GoogLeNet',
 'GoogLeNetOutputs',
 'Inception3',
 'InceptionOutputs',
 'MNASNet',
 'MobileNetV2',
 'ResNet',
 'ShuffleNetV2',
 'SqueezeNet',
 'VGG',
 '_GoogLeNetOutputs',
 '_InceptionOutputs',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_utils',
 'alexnet',
 'densenet',
 'densenet121',
 'densenet161',
 'densenet169',
 'densenet201',
 'detection',
 'googlenet',
 'inception',
 'inception_v3',
 'mnasnet',
 'mnasnet0_5',
 'mnasnet0_75',
 'mnasnet1_0',
 'mnasnet1_3',
 'mobilenet',
 'mobilenet_v2',
 'quantization',
 'resnet',
 'resnet101',
 'resnet152',
 'resnet18',
 'resnet34',
 'resnet50',
 'resnext101_32x8d',
 'resnext50_32x4d',
 'segmentation',
 'shufflenet_v2_x0_5',
 'shufflenet_v2_x1_0',
 'shufflenet_v2_x1_5',
 'shufflenet_v2_x2_0',
 'shufflenetv2',
 'squeezenet',
 'squeezenet1_0',
 'squeezenet1_1',
 'utils',
 'vgg',
 'vgg11',
 'vgg11_bn',
 'vgg13',
 'vgg13_bn',
 'vgg16',
 'vgg16_bn',
 'vgg19',
 'vgg19_bn',
 'video',
 'wide_resnet101_2',
 'wide_resnet50_2']

Input code

# AlexNet, untrained
alex_net = vis_models.AlexNet()
# AlexNet, pretrained
alex_net = vis_models.alexnet(pretrained=True)
# ResNet101, pretrained
res_net_101 = vis_models.resnet101(pretrained=True)

Input code

# Check network structure
alex_net

Output results

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

The network structure of AlexNet

dl_pytorch_c2_2_alexnet

– From the book “Deep Learning with Pytorch” Input code

# Define preprocess transforms for input images
preprocess = vis.transforms.Compose([
    vis.transforms.Resize(256),
    vis.transforms.CenterCrop(224),
    vis.transforms.ToTensor(),
    vis.transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    )
])

Input code

# Preprocess a test image
import PIL as pil
import requests
from io import BytesIO

img_url = 'https://raw.githubusercontent.com/qutang/jupyter_notebook_articles/main/images/dog.jpg'
response = requests.get(img_url)
img = pil.Image.open(BytesIO(response.content))

img_t = preprocess(img)
img

Output results

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=275x183 at 0x7F17F1B02750>

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=275x183 at 0x7F17F1B02750>

Input code

# Transform it into input batches
batch_t = torch.unsqueeze(img_t, 0)
batch_t.size()

Output results

torch.Size([1, 3, 224, 224])

Input code

# Load ImageNet classes
class_url = "https://gist.githubusercontent.com/ageitgey/4e1342c10a71981d0b491e1b8227328b/raw/24d78ea09a31fdff540a8494886e0051e3ad68f8/imagenet_classes.txt"
response = requests.get(class_url)
content = BytesIO(response.content).getvalue().decode("utf-8")
lines = content.split('\n')[4:]
labels = [line.split(',')[1].strip() for line in lines]
print(labels)

Output results

['tench', 'goldfish', 'great_white_shark', 'tiger_shark', 'hammerhead', 'electric_ray', 'stingray', 'cock', 'hen', 'ostrich', 'brambling', 'goldfinch', 'house_finch', 'junco', 'indigo_bunting', 'robin', 'bulbul', 'jay', 'magpie', 'chickadee', 'water_ouzel', 'kite', 'bald_eagle', 'vulture', 'great_grey_owl', 'European_fire_salamander', 'common_newt', 'eft', 'spotted_salamander', 'axolotl', 'bullfrog', 'tree_frog', 'tailed_frog', 'loggerhead', 'leatherback_turtle', 'mud_turtle', 'terrapin', 'box_turtle', 'banded_gecko', 'common_iguana', 'American_chameleon', 'whiptail', 'agama', 'frilled_lizard', 'alligator_lizard', 'Gila_monster', 'green_lizard', 'African_chameleon', 'Komodo_dragon', 'African_crocodile', 'American_alligator', 'triceratops', 'thunder_snake', 'ringneck_snake', 'hognose_snake', 'green_snake', 'king_snake', 'garter_snake', 'water_snake', 'vine_snake', 'night_snake', 'boa_constrictor', 'rock_python', 'Indian_cobra', 'green_mamba', 'sea_snake', 'horned_viper', 'diamondback', 'sidewinder', 'trilobite', 'harvestman', 'scorpion', 'black_and_gold_garden_spider', 'barn_spider', 'garden_spider', 'black_widow', 'tarantula', 'wolf_spider', 'tick', 'centipede', 'black_grouse', 'ptarmigan', 'ruffed_grouse', 'prairie_chicken', 'peacock', 'quail', 'partridge', 'African_grey', 'macaw', 'sulphur-crested_cockatoo', 'lorikeet', 'coucal', 'bee_eater', 'hornbill', 'hummingbird', 'jacamar', 'toucan', 'drake', 'red-breasted_merganser', 'goose', 'black_swan', 'tusker', 'echidna', 'platypus', 'wallaby', 'koala', 'wombat', 'jellyfish', 'sea_anemone', 'brain_coral', 'flatworm', 'nematode', 'conch', 'snail', 'slug', 'sea_slug', 'chiton', 'chambered_nautilus', 'Dungeness_crab', 'rock_crab', 'fiddler_crab', 'king_crab', 'American_lobster', 'spiny_lobster', 'crayfish', 'hermit_crab', 'isopod', 'white_stork', 'black_stork', 'spoonbill', 'flamingo', 'little_blue_heron', 'American_egret', 'bittern', 'crane', 'limpkin', 'European_gallinule', 'American_coot', 'bustard', 'ruddy_turnstone', 'red-backed_sandpiper', 'redshank', 'dowitcher', 'oystercatcher', 'pelican', 'king_penguin', 'albatross', 'grey_whale', 'killer_whale', 'dugong', 'sea_lion', 'Chihuahua', 'Japanese_spaniel', 'Maltese_dog', 'Pekinese', 'Shih-Tzu', 'Blenheim_spaniel', 'papillon', 'toy_terrier', 'Rhodesian_ridgeback', 'Afghan_hound', 'basset', 'beagle', 'bloodhound', 'bluetick', 'black-and-tan_coonhound', 'Walker_hound', 'English_foxhound', 'redbone', 'borzoi', 'Irish_wolfhound', 'Italian_greyhound', 'whippet', 'Ibizan_hound', 'Norwegian_elkhound', 'otterhound', 'Saluki', 'Scottish_deerhound', 'Weimaraner', 'Staffordshire_bullterrier', 'American_Staffordshire_terrier', 'Bedlington_terrier', 'Border_terrier', 'Kerry_blue_terrier', 'Irish_terrier', 'Norfolk_terrier', 'Norwich_terrier', 'Yorkshire_terrier', 'wire-haired_fox_terrier', 'Lakeland_terrier', 'Sealyham_terrier', 'Airedale', 'cairn', 'Australian_terrier', 'Dandie_Dinmont', 'Boston_bull', 'miniature_schnauzer', 'giant_schnauzer', 'standard_schnauzer', 'Scotch_terrier', 'Tibetan_terrier', 'silky_terrier', 'soft-coated_wheaten_terrier', 'West_Highland_white_terrier', 'Lhasa', 'flat-coated_retriever', 'curly-coated_retriever', 'golden_retriever', 'Labrador_retriever', 'Chesapeake_Bay_retriever', 'German_short-haired_pointer', 'vizsla', 'English_setter', 'Irish_setter', 'Gordon_setter', 'Brittany_spaniel', 'clumber', 'English_springer', 'Welsh_springer_spaniel', 'cocker_spaniel', 'Sussex_spaniel', 'Irish_water_spaniel', 'kuvasz', 'schipperke', 'groenendael', 'malinois', 'briard', 'kelpie', 'komondor', 'Old_English_sheepdog', 'Shetland_sheepdog', 'collie', 'Border_collie', 'Bouvier_des_Flandres', 'Rottweiler', 'German_shepherd', 'Doberman', 'miniature_pinscher', 'Greater_Swiss_Mountain_dog', 'Bernese_mountain_dog', 'Appenzeller', 'EntleBucher', 'boxer', 'bull_mastiff', 'Tibetan_mastiff', 'French_bulldog', 'Great_Dane', 'Saint_Bernard', 'Eskimo_dog', 'malamute', 'Siberian_husky', 'dalmatian', 'affenpinscher', 'basenji', 'pug', 'Leonberg', 'Newfoundland', 'Great_Pyrenees', 'Samoyed', 'Pomeranian', 'chow', 'keeshond', 'Brabancon_griffon', 'Pembroke', 'Cardigan', 'toy_poodle', 'miniature_poodle', 'standard_poodle', 'Mexican_hairless', 'timber_wolf', 'white_wolf', 'red_wolf', 'coyote', 'dingo', 'dhole', 'African_hunting_dog', 'hyena', 'red_fox', 'kit_fox', 'Arctic_fox', 'grey_fox', 'tabby', 'tiger_cat', 'Persian_cat', 'Siamese_cat', 'Egyptian_cat', 'cougar', 'lynx', 'leopard', 'snow_leopard', 'jaguar', 'lion', 'tiger', 'cheetah', 'brown_bear', 'American_black_bear', 'ice_bear', 'sloth_bear', 'mongoose', 'meerkat', 'tiger_beetle', 'ladybug', 'ground_beetle', 'long-horned_beetle', 'leaf_beetle', 'dung_beetle', 'rhinoceros_beetle', 'weevil', 'fly', 'bee', 'ant', 'grasshopper', 'cricket', 'walking_stick', 'cockroach', 'mantis', 'cicada', 'leafhopper', 'lacewing', 'dragonfly', 'damselfly', 'admiral', 'ringlet', 'monarch', 'cabbage_butterfly', 'sulphur_butterfly', 'lycaenid', 'starfish', 'sea_urchin', 'sea_cucumber', 'wood_rabbit', 'hare', 'Angora', 'hamster', 'porcupine', 'fox_squirrel', 'marmot', 'beaver', 'guinea_pig', 'sorrel', 'zebra', 'hog', 'wild_boar', 'warthog', 'hippopotamus', 'ox', 'water_buffalo', 'bison', 'ram', 'bighorn', 'ibex', 'hartebeest', 'impala', 'gazelle', 'Arabian_camel', 'llama', 'weasel', 'mink', 'polecat', 'black-footed_ferret', 'otter', 'skunk', 'badger', 'armadillo', 'three-toed_sloth', 'orangutan', 'gorilla', 'chimpanzee', 'gibbon', 'siamang', 'guenon', 'patas', 'baboon', 'macaque', 'langur', 'colobus', 'proboscis_monkey', 'marmoset', 'capuchin', 'howler_monkey', 'titi', 'spider_monkey', 'squirrel_monkey', 'Madagascar_cat', 'indri', 'Indian_elephant', 'African_elephant', 'lesser_panda', 'giant_panda', 'barracouta', 'eel', 'coho', 'rock_beauty', 'anemone_fish', 'sturgeon', 'gar', 'lionfish', 'puffer', 'abacus', 'abaya', 'academic_gown', 'accordion', 'acoustic_guitar', 'aircraft_carrier', 'airliner', 'airship', 'altar', 'ambulance', 'amphibian', 'analog_clock', 'apiary', 'apron', 'ashcan', 'assault_rifle', 'backpack', 'bakery', 'balance_beam', 'balloon', 'ballpoint', 'Band_Aid', 'banjo', 'bannister', 'barbell', 'barber_chair', 'barbershop', 'barn', 'barometer', 'barrel', 'barrow', 'baseball', 'basketball', 'bassinet', 'bassoon', 'bathing_cap', 'bath_towel', 'bathtub', 'beach_wagon', 'beacon', 'beaker', 'bearskin', 'beer_bottle', 'beer_glass', 'bell_cote', 'bib', 'bicycle-built-for-two', 'bikini', 'binder', 'binoculars', 'birdhouse', 'boathouse', 'bobsled', 'bolo_tie', 'bonnet', 'bookcase', 'bookshop', 'bottlecap', 'bow', 'bow_tie', 'brass', 'brassiere', 'breakwater', 'breastplate', 'broom', 'bucket', 'buckle', 'bulletproof_vest', 'bullet_train', 'butcher_shop', 'cab', 'caldron', 'candle', 'cannon', 'canoe', 'can_opener', 'cardigan', 'car_mirror', 'carousel', "carpenter's_kit", 'carton', 'car_wheel', 'cash_machine', 'cassette', 'cassette_player', 'castle', 'catamaran', 'CD_player', 'cello', 'cellular_telephone', 'chain', 'chainlink_fence', 'chain_mail', 'chain_saw', 'chest', 'chiffonier', 'chime', 'china_cabinet', 'Christmas_stocking', 'church', 'cinema', 'cleaver', 'cliff_dwelling', 'cloak', 'clog', 'cocktail_shaker', 'coffee_mug', 'coffeepot', 'coil', 'combination_lock', 'computer_keyboard', 'confectionery', 'container_ship', 'convertible', 'corkscrew', 'cornet', 'cowboy_boot', 'cowboy_hat', 'cradle', 'crane', 'crash_helmet', 'crate', 'crib', 'Crock_Pot', 'croquet_ball', 'crutch', 'cuirass', 'dam', 'desk', 'desktop_computer', 'dial_telephone', 'diaper', 'digital_clock', 'digital_watch', 'dining_table', 'dishrag', 'dishwasher', 'disk_brake', 'dock', 'dogsled', 'dome', 'doormat', 'drilling_platform', 'drum', 'drumstick', 'dumbbell', 'Dutch_oven', 'electric_fan', 'electric_guitar', 'electric_locomotive', 'entertainment_center', 'envelope', 'espresso_maker', 'face_powder', 'feather_boa', 'file', 'fireboat', 'fire_engine', 'fire_screen', 'flagpole', 'flute', 'folding_chair', 'football_helmet', 'forklift', 'fountain', 'fountain_pen', 'four-poster', 'freight_car', 'French_horn', 'frying_pan', 'fur_coat', 'garbage_truck', 'gasmask', 'gas_pump', 'goblet', 'go-kart', 'golf_ball', 'golfcart', 'gondola', 'gong', 'gown', 'grand_piano', 'greenhouse', 'grille', 'grocery_store', 'guillotine', 'hair_slide', 'hair_spray', 'half_track', 'hammer', 'hamper', 'hand_blower', 'hand-held_computer', 'handkerchief', 'hard_disc', 'harmonica', 'harp', 'harvester', 'hatchet', 'holster', 'home_theater', 'honeycomb', 'hook', 'hoopskirt', 'horizontal_bar', 'horse_cart', 'hourglass', 'iPod', 'iron', "jack-o'-lantern", 'jean', 'jeep', 'jersey', 'jigsaw_puzzle', 'jinrikisha', 'joystick', 'kimono', 'knee_pad', 'knot', 'lab_coat', 'ladle', 'lampshade', 'laptop', 'lawn_mower', 'lens_cap', 'letter_opener', 'library', 'lifeboat', 'lighter', 'limousine', 'liner', 'lipstick', 'Loafer', 'lotion', 'loudspeaker', 'loupe', 'lumbermill', 'magnetic_compass', 'mailbag', 'mailbox', 'maillot', 'maillot', 'manhole_cover', 'maraca', 'marimba', 'mask', 'matchstick', 'maypole', 'maze', 'measuring_cup', 'medicine_chest', 'megalith', 'microphone', 'microwave', 'military_uniform', 'milk_can', 'minibus', 'miniskirt', 'minivan', 'missile', 'mitten', 'mixing_bowl', 'mobile_home', 'Model_T', 'modem', 'monastery', 'monitor', 'moped', 'mortar', 'mortarboard', 'mosque', 'mosquito_net', 'motor_scooter', 'mountain_bike', 'mountain_tent', 'mouse', 'mousetrap', 'moving_van', 'muzzle', 'nail', 'neck_brace', 'necklace', 'nipple', 'notebook', 'obelisk', 'oboe', 'ocarina', 'odometer', 'oil_filter', 'organ', 'oscilloscope', 'overskirt', 'oxcart', 'oxygen_mask', 'packet', 'paddle', 'paddlewheel', 'padlock', 'paintbrush', 'pajama', 'palace', 'panpipe', 'paper_towel', 'parachute', 'parallel_bars', 'park_bench', 'parking_meter', 'passenger_car', 'patio', 'pay-phone', 'pedestal', 'pencil_box', 'pencil_sharpener', 'perfume', 'Petri_dish', 'photocopier', 'pick', 'pickelhaube', 'picket_fence', 'pickup', 'pier', 'piggy_bank', 'pill_bottle', 'pillow', 'ping-pong_ball', 'pinwheel', 'pirate', 'pitcher', 'plane', 'planetarium', 'plastic_bag', 'plate_rack', 'plow', 'plunger', 'Polaroid_camera', 'pole', 'police_van', 'poncho', 'pool_table', 'pop_bottle', 'pot', "potter's_wheel", 'power_drill', 'prayer_rug', 'printer', 'prison', 'projectile', 'projector', 'puck', 'punching_bag', 'purse', 'quill', 'quilt', 'racer', 'racket', 'radiator', 'radio', 'radio_telescope', 'rain_barrel', 'recreational_vehicle', 'reel', 'reflex_camera', 'refrigerator', 'remote_control', 'restaurant', 'revolver', 'rifle', 'rocking_chair', 'rotisserie', 'rubber_eraser', 'rugby_ball', 'rule', 'running_shoe', 'safe', 'safety_pin', 'saltshaker', 'sandal', 'sarong', 'sax', 'scabbard', 'scale', 'school_bus', 'schooner', 'scoreboard', 'screen', 'screw', 'screwdriver', 'seat_belt', 'sewing_machine', 'shield', 'shoe_shop', 'shoji', 'shopping_basket', 'shopping_cart', 'shovel', 'shower_cap', 'shower_curtain', 'ski', 'ski_mask', 'sleeping_bag', 'slide_rule', 'sliding_door', 'slot', 'snorkel', 'snowmobile', 'snowplow', 'soap_dispenser', 'soccer_ball', 'sock', 'solar_dish', 'sombrero', 'soup_bowl', 'space_bar', 'space_heater', 'space_shuttle', 'spatula', 'speedboat', 'spider_web', 'spindle', 'sports_car', 'spotlight', 'stage', 'steam_locomotive', 'steel_arch_bridge', 'steel_drum', 'stethoscope', 'stole', 'stone_wall', 'stopwatch', 'stove', 'strainer', 'streetcar', 'stretcher', 'studio_couch', 'stupa', 'submarine', 'suit', 'sundial', 'sunglass', 'sunglasses', 'sunscreen', 'suspension_bridge', 'swab', 'sweatshirt', 'swimming_trunks', 'swing', 'switch', 'syringe', 'table_lamp', 'tank', 'tape_player', 'teapot', 'teddy', 'television', 'tennis_ball', 'thatch', 'theater_curtain', 'thimble', 'thresher', 'throne', 'tile_roof', 'toaster', 'tobacco_shop', 'toilet_seat', 'torch', 'totem_pole', 'tow_truck', 'toyshop', 'tractor', 'trailer_truck', 'tray', 'trench_coat', 'tricycle', 'trimaran', 'tripod', 'triumphal_arch', 'trolleybus', 'trombone', 'tub', 'turnstile', 'typewriter_keyboard', 'umbrella', 'unicycle', 'upright', 'vacuum', 'vase', 'vault', 'velvet', 'vending_machine', 'vestment', 'viaduct', 'violin', 'volleyball', 'waffle_iron', 'wall_clock', 'wallet', 'wardrobe', 'warplane', 'washbasin', 'washer', 'water_bottle', 'water_jug', 'water_tower', 'whiskey_jug', 'whistle', 'wig', 'window_screen', 'window_shade', 'Windsor_tie', 'wine_bottle', 'wing', 'wok', 'wooden_spoon', 'wool', 'worm_fence', 'wreck', 'yawl', 'yurt', 'web_site', 'comic_book', 'crossword_puzzle', 'street_sign', 'traffic_light', 'book_jacket', 'menu', 'plate', 'guacamole', 'consomme', 'hot_pot', 'trifle', 'ice_cream', 'ice_lolly', 'French_loaf', 'bagel', 'pretzel', 'cheeseburger', 'hotdog', 'mashed_potato', 'head_cabbage', 'broccoli', 'cauliflower', 'zucchini', 'spaghetti_squash', 'acorn_squash', 'butternut_squash', 'cucumber', 'artichoke', 'bell_pepper', 'cardoon', 'mushroom', 'Granny_Smith', 'strawberry', 'orange', 'lemon', 'fig', 'pineapple', 'banana', 'jackfruit', 'custard_apple', 'pomegranate', 'hay', 'carbonara', 'chocolate_sauce', 'dough', 'meat_loaf', 'pizza', 'potpie', 'burrito', 'red_wine', 'espresso', 'cup', 'eggnog', 'alp', 'bubble', 'cliff', 'coral_reef', 'geyser', 'lakeside', 'promontory', 'sandbar', 'seashore', 'valley', 'volcano', 'ballplayer', 'groom', 'scuba_diver', 'rapeseed', 'daisy', "yellow_lady's_slipper", 'corn', 'acorn', 'hip', 'buckeye', 'coral_fungus', 'agaric', 'gyromitra', 'stinkhorn', 'earthstar', 'hen-of-the-woods', 'bolete', 'ear', 'toilet_tissue']

Input code

# Run pretrained AlexNet on the test image and produce top 5 predictions
alex_net.eval()
out = alex_net(batch_t)
out_perc = torch.nn.functional.softmax(out, dim=1)[0] * 100

sorted_scores, sorted_indices = torch.sort(out, descending=True)

[(labels[i], out_perc[i].item()) for i in sorted_indices[0][:5]]

Output results

[('Mexican_hairless', 52.925865173339844),
 ('Staffordshire_bullterrier', 22.006467819213867),
 ('Labrador_retriever', 3.8847944736480713),
 ('American_Staffordshire_terrier', 3.3939406871795654),
 ('German_short-haired_pointer', 3.3705554008483887)]

Input code

# Run pretrained ResNet101 on the test image and produce top 5 predictions
res_net_101.eval()
out = res_net_101(batch_t)
out_perc = torch.nn.functional.softmax(out, dim=1)[0] * 100

sorted_scores, sorted_indices = torch.sort(out, descending=True)

[(labels[i], out_perc[i].item()) for i in sorted_indices[0][:5]]

Output results

[('Staffordshire_bullterrier', 48.7598762512207),
 ('pug', 21.433822631835938),
 ('French_bulldog', 19.77810287475586),
 ('bull_mastiff', 3.6309268474578857),
 ('Brabancon_griffon', 3.0445988178253174)]

Generating fake real-looking images using GAN networks (CycleGAN)

GAN game is an adversial process that two networks are competing each other in a cheating game, where the “generator” network is the cheater while the “discriminator” network is the wiseman. When succeeding, the “generator” network can produce real-looking images that the “discriminator” network cannot discern.

The process of GAN game

dl_pytorch_c2_3_gan

– From the book “Deep Learning with Pytorch”

CycleGAN

Different from normal GAN network, CycleGAN can learn to generate fake images in a different domain using training data from two different domains.

The process of CycleGAN game

– From the book “Deep Learning with PyTorch”


Input code

# The implementation of CycleGAN generator using ResNet
# See https://github.com/deep-learning-with-pytorch/dlwpt-code/blob/master/p1ch2/3_cyclegan.ipynb
import torch.nn as nn

class ResNetBlock(nn.Module): # <1>

    def __init__(self, dim):
        super(ResNetBlock, self).__init__()
        self.conv_block = self.build_conv_block(dim)

    def build_conv_block(self, dim):
        conv_block = []

        conv_block += [nn.ReflectionPad2d(1)]

        conv_block += [nn.Conv2d(dim, dim, kernel_size=3, padding=0, bias=True),
                       nn.InstanceNorm2d(dim),
                       nn.ReLU(True)]

        conv_block += [nn.ReflectionPad2d(1)]

        conv_block += [nn.Conv2d(dim, dim, kernel_size=3, padding=0, bias=True),
                       nn.InstanceNorm2d(dim)]

        return nn.Sequential(*conv_block)

    def forward(self, x):
        out = x + self.conv_block(x) # <2>
        return out


class ResNetGenerator(nn.Module):

    def __init__(self, input_nc=3, output_nc=3, ngf=64, n_blocks=9): # <3> 

        assert(n_blocks >= 0)
        super(ResNetGenerator, self).__init__()

        self.input_nc = input_nc
        self.output_nc = output_nc
        self.ngf = ngf

        model = [nn.ReflectionPad2d(3),
                 nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0, bias=True),
                 nn.InstanceNorm2d(ngf),
                 nn.ReLU(True)]

        n_downsampling = 2
        for i in range(n_downsampling):
            mult = 2**i
            model += [nn.Conv2d(ngf * mult, ngf * mult * 2, kernel_size=3,
                                stride=2, padding=1, bias=True),
                      nn.InstanceNorm2d(ngf * mult * 2),
                      nn.ReLU(True)]

        mult = 2**n_downsampling
        for i in range(n_blocks):
            model += [ResNetBlock(ngf * mult)]

        for i in range(n_downsampling):
            mult = 2**(n_downsampling - i)
            model += [nn.ConvTranspose2d(ngf * mult, int(ngf * mult / 2),
                                         kernel_size=3, stride=2,
                                         padding=1, output_padding=1,
                                         bias=True),
                      nn.InstanceNorm2d(int(ngf * mult / 2)),
                      nn.ReLU(True)]

        model += [nn.ReflectionPad2d(3)]
        model += [nn.Conv2d(ngf, output_nc, kernel_size=7, padding=0)]
        model += [nn.Tanh()]

        self.model = nn.Sequential(*model)

    def forward(self, input): # <3>
        return self.model(input)

Input code

# The structure of the generator is complicated.

netG = ResNetGenerator()
netG

Output results

ResNetGenerator(
  (model): Sequential(
    (0): ReflectionPad2d((3, 3, 3, 3))
    (1): Conv2d(3, 64, kernel_size=(7, 7), stride=(1, 1))
    (2): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
    (3): ReLU(inplace=True)
    (4): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (5): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (8): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
    (9): ReLU(inplace=True)
    (10): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (11): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (12): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (13): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (14): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (15): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (16): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (17): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (18): ResNetBlock(
      (conv_block): Sequential(
        (0): ReflectionPad2d((1, 1, 1, 1))
        (1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (2): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
        (3): ReLU(inplace=True)
        (4): ReflectionPad2d((1, 1, 1, 1))
        (5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1))
        (6): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
      )
    )
    (19): ConvTranspose2d(256, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (20): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
    (21): ReLU(inplace=True)
    (22): ConvTranspose2d(128, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), output_padding=(1, 1))
    (23): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)
    (24): ReLU(inplace=True)
    (25): ReflectionPad2d((3, 3, 3, 3))
    (26): Conv2d(64, 3, kernel_size=(7, 7), stride=(1, 1))
    (27): Tanh()
  )
)

Input code

# Load pretrained horse-to-zebra CycleGAN model parameters
# See https://github.com/deep-learning-with-pytorch/dlwpt-code/tree/master/data/p1ch2

model_url = 'https://github.com/deep-learning-with-pytorch/dlwpt-code/raw/master/data/p1ch2/horse2zebra_0.4.0.pth'
response = requests.get(model_url)
model_data = torch.load(BytesIO(response.content))
netG.load_state_dict(model_data)

Output results

<All keys matched successfully>

Input code

# Again, prepare the preprocessing transforms for input images
# A simple resize is good enough

preprocess = vis.transforms.Compose([
    vis.transforms.Resize(256),
    vis.transforms.ToTensor()
])

Input code

# Load our test images we downloaded

img_url = 'https://raw.githubusercontent.com/qutang/jupyter_notebook_articles/main/images/horse.jpg'
response = requests.get(img_url)
img = pil.Image.open(BytesIO(response.content))
img

Output results

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1120 at 0x7F17F1AB9B50>

<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1920x1120 at 0x7F17F1AB9B50>

Input code

# Prepare the image as network input
img_t = preprocess(img)
batch_t = torch.unsqueeze(img_t, 0)

Input code

# Run the pretrained generator on the image
netG.eval()
batch_out = netG(batch_t)

# Transform the output back to an image
out_t = (batch_out.data.squeeze() + 1.0) / 2.0
out_img = vis.transforms.ToPILImage()(out_t)
out_img

Output results

<PIL.Image.Image image mode=RGB size=440x256 at 0x7F17F1BA6410>

<PIL.Image.Image image mode=RGB size=440x256 at 0x7F17F1BA6410>

Generating scene description using captioning model

Unlike the CNN model used for image classification and GAN model used for image generation, the captioning model uses RNN (Recurrent neural network) model for text generation, prepended with a CNN model for image classification.

“Recurrent” means the output of previous forward pass will serve as the input of the current forward pass.

Concept of a captioning model

dl_pytorch_c2_5_caption

– From the Book “Deep Learning with Pytorch”

The NeuralTalk2 captioning model

The model[1] was proposed by Andrea Karpathy. Here I use the implementation[2] provided by the book, with mostly the default settings. Please visit the repo to see full available options. My forked repo[3] applied two fixes to the original code.

  1. Fixed the error when loading the infos pickle file on Windows.
  2. Support ~ in input file paths.

Input code

# run the pretrained model, it will run the two image examples we used before and also the zebra image we generated using the CycleGAN model
!cd ../ImageCaption.pytorch && python eval.py --model ./data/FC/fc-model.pth --infos_path ./data/FC/fc-infos.pkl --image_folder ../jupyter_notebook_articles/images/

Output results

/opt/venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
Done. Saved -1 bytes.
DataLoaderRaw loading images from folder:  ../jupyter_notebook_articles/images/
0
listing all images in directory ../jupyter_notebook_articles/images/
DataLoaderRaw found  19  images

Input code

# Display the results
import json
from IPython.display import display
with open('../ImageCaption.pytorch/vis/vis.json', mode='r') as f:
    result = json.load(f)
img_files = [os.path.join('../jupyter_notebook_articles/images/', name) for name in os.listdir('../jupyter_notebook_articles/images/')]

for caption, img_file in zip(result, img_files):
    img = pil.Image.open(img_file)
    display(img)
    print(caption['caption'])

Use models via pytorch hub

Here we use the RoBERTa transformer model[4] (based on BERT) as an example.


Input code

# Import the BERT transformer model using pytorch hub
import torch
roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')

Output results

Using cache found in /home/jovyan/.cache/torch/hub/pytorch_fairseq_master
/opt/venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

Input code

# Use the roberta model to transform a sentence to vector representation
tokens = roberta.encode('I am running!')
print(tokens.tolist())
# The vector representation can be converted back to the text
roberta.decode(tokens)

Output results

[0, 100, 524, 878, 328, 2]

Output results

'I am running!'

Input code

tokens2 = roberta.encode('I am walking!')
print(tokens2.tolist())
# The vector representation can be converted back to the text
roberta.decode(tokens2)

Input code

tokens3 = roberta.encode('I run very fast!')
print(tokens3.tolist())
# The vector representation can be converted back to the text
roberta.decode(tokens3)

Input code

# The RoBERTa model can also be used to classify if two sentences have similar or contradictory meanings
with torch.no_grad():
    inputs = ['I am running!', 'I am walking!']
    tokens = roberta.encode(*inputs)
    prediction = roberta.predict('mnli', tokens).argmax().item()
    print(prediction) # 0 means contradictory

    inputs = ['I am running!', 'I run very fast!']
    tokens = roberta.encode(*inputs)
    prediction = roberta.predict('mnli', tokens).argmax().item()
    print(prediction) # 1 means similar

Exercises

Input code

# Run horse2zebra on the dog image, we'll have to use the preprocess transforms for CycleGAN
preprocess = vis.transforms.Compose([
    vis.transforms.Resize(256),
    vis.transforms.ToTensor()
])

img_url = 'https://raw.githubusercontent.com/qutang/jupyter_notebook_articles/main/images/dog.jpg'
response = requests.get(img_url)
img = pil.Image.open(BytesIO(response.content))
img_t = preprocess(img)
batch_t = torch.unsqueeze(img_t, 0)

batch_out = netG(batch_t)
out_t = (batch_out.data.squeeze() + 1.0) / 2.0
vis.transforms.ToPILImage()(out_t)

Output results

<PIL.Image.Image image mode=RGB size=384x256 at 0x7F17F1CA3810>

<PIL.Image.Image image mode=RGB size=384x256 at 0x7F17F1CA3810>

Search to see how many files that contain hubconf on GitHub.

Until now (2020-08-22), there are 932 files that includes the word hubconf on GitHub.

Let’s see the Google Trend on Pytorch Hub.

Input code

# Load the google trend til today 
import datetime as dt
import pandas as pd
trend = pd.read_csv('https://raw.githubusercontent.com/qutang/jupyter_notebook_articles/main/data/google_trend_pytorch_hub.csv', parse_dates=[0])
# Plot the trend
trend.plot(x='Month', y=1)

Output results

<AxesSubplot:xlabel='Month'>

<Figure size 432x288 with 1 Axes>

The drop between the end of 2019 and the first half of 2020 is when the pandemic explodes. Input code

# Let's see which countries top the trend
import pandas as pd
region = pd.read_csv('https://raw.githubusercontent.com/qutang/jupyter_notebook_articles/main/data/google_trend_pytorch_hub_region.csv', header=0).fillna(0)
region.head(n=5)

Output results

[object Object]

Output results

                Country  pytorch hub: (1/1/04 - 8/22/20)
0                 China                            100.0
1                 India                             16.0
2         United States                             15.0
3               Georgia                              0.0
4  Bosnia & Herzegovina                              0.0

Output results

Country pytorch hub: (1/1/04 - 8/22/20)
0 China 100.0
1 India 16.0
2 United States 15.0
3 Georgia 0.0
4 Bosnia & Herzegovina 0.0

Chapter III: All as tensors

Tensors are the numerical abstraction about anything with arbitrary dimensions, including the inputs, the intermediate representations in the network, and the output. Tensors are multidimensional, floating-point array.

Deep learning may be viewed as numerical transformations of tensors.

dl_pytorch_c3_0

– From the book “Deep Learning with Pytorch”

Tensors serve as representations for data with arbitrary dimensions.

dl_pytorch_c3_1

– From the book “Deep Learning with Pytorch”

Comparison between Pytorch tensors and NumPy arrays.

  • Both can represent data with arbitrary dimensions.
  • Both support rich manipulations on the data and share almost the same APIs.
  • Pytorch supports GPU accelerated operations directly on the tensors.
  • Pytorch supports backtracing of the computational graph applied on the tensors.
  • NumPy has exellent companion extension libraries such as SciPy, Scikit-learn, and Pandas.
  • Pytorch tensors can be easily converted back and forth with NumPy arrays.

Basic operations and in-memory representation

The operations on tensors always return another tensor. And the returned sensor points to the same memory location as the orignal tensor. This avoids data copying in the memory, which hugely improves computational performance. However, users have to be careful that the operations may change the underlying values. Input code

import torch
# Create 1 x 4 all-one tensor
a = torch.tensor(range(4)) + 1
print(a)
# Indexing
a[0]
# Change values via assignment, notice how the value of b has also been changed.
b = a[1]
a[1] = 0
print(a)
print(b)
# Check type of the data, indexing returns a 0-dimensional tensor a.k.a the scalar
print(type(a[1]))
# Convert the 0-dimensional tensor to a python number
print(float(a[1]))
# check the memory address

# create another tensor in 2D
c = a.reshape(2, 2)
print(c.shape)

# Truly copy the tensor
d = a.clone()
d[1] = 2
print(d)
print(a)

Output results

tensor([1, 2, 3, 4])
tensor([1, 0, 3, 4])
tensor(0)
<class 'torch.Tensor'>
0.0
torch.Size([2, 2])
tensor([1, 2, 3, 4])
tensor([1, 0, 3, 4])

Tensors and NumPy arrays are contiguous byte blocks in memory, different from python list object, in which each item is stored separately in memory.

dl_pytorch_c3_2

– From the book “Deep Learning with Pytorch”

Tensors are views of the underlying torch.Storage instances. Therefore, two tensors may share the same Storage instance while indexing to it in different ways.

dl_pytorch_c3_3

– From the book “Deep Learning with Pytorch”


Input code

# You may access the underlying storage instance by the Storage method
# The storage is regardless of the tensor's dimensions and always stores all data as a single 1-dimensional contiguous array.
print(a.storage())
print(b.storage())
print(c.storage())
print(d.storage())
# The storage is indexable
print(a.storage()[0])
print(c.storage()[0])

Output results

 1
 0
 3
 4
[torch.LongStorage of size 4]
 1
 0
 3
 4
[torch.LongStorage of size 4]
 1
 0
 3
 4
[torch.LongStorage of size 4]
 1
 2
 3
 4
[torch.LongStorage of size 4]
1
1

Tensors use size, offset, and stride to define the indexing rules to the underlying storage instance.

dl_pytorch_c3_4

– From the book “Deep Learning with Pytorch”

“The size (or shape, in NumPy parlance) is a tuple indicating how many elements across each dimension the tensor represents. The storage offset is the index in the storage corresponding to the first element in the tensor. The stride is the number of elements in the storage that need to be skipped over to obtain the next element along each dimension.”

Most tensor shape or indexing transformations do not require memory reallocation of data, but simply math transformation on size, offset, and stride.

For example, when applying transpose to a tensor, there will be no memory reallocation of data. Instead, the values in the stride tuple will be swapped.

dl_pytorch_c3_5

– From the book “Deep Learning with Pytorch”


Input code

# Check the different size, offset, and stride in two tensors with the same underlying storage instance
print(a.shape, a.storage_offset(), a.stride())
print(c.shape, c.storage_offset(), c.stride())

Output results

torch.Size([4]) 0 (1,)
torch.Size([2, 2]) 0 (2, 1)

Name the tensor dimensions and dimension matching

This is a handy way for humans to keep track the tensors as we go through a lot of transformations. Input code

# A typical single RGB image tensor, 3 channels x 100 x 100
img_t = torch.randn(3, 5, 5)
weights = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels']) # These value will be applied to each channel
print(weights)
# A typical batch of RGB images tensor, 12 batches x 3 channels x 100 x 100
batch_t = torch.randn(2, 3, 5, 5)
# The index of the channel dimension is different in img_t and batch_t
# We better add consistent names to img_t and batch_t
img_t = img_t.refine_names('channels', 'width', 'height')
print(img_t.names)
batch_t = batch_t.refine_names(None, 'channels', 'width', 'height')
print(batch_t.names)
# Convert the RGB image to grayscale
# First make sure the weights tensor has the same dimension as img_t
weights_aligned = weights.align_as(img_t)
print(weights_aligned.shape, weights_aligned.names)
# Apply transformation
grayscale_t = (img_t * weights_aligned).sum('channels')
print(grayscale_t.shape, grayscale_t.names)
# Convert the grayscale to have the channel dimension
grayscale_t = grayscale_t.align_as(img_t)
print(grayscale_t.shape, grayscale_t.names)

# Note that not all operations support named tensors, so you have to know how to convert it back to unnamed tensors
grayscale_t = grayscale_t.rename(None)
print(grayscale_t.shape, grayscale_t.names)

Output results

tensor([0.2126, 0.7152, 0.0722], names=('channels',))
('channels', 'width', 'height')
(None, 'channels', 'width', 'height')
torch.Size([3, 1, 1]) ('channels', 'width', 'height')
torch.Size([5, 5]) ('width', 'height')
torch.Size([1, 5, 5]) ('channels', 'width', 'height')
torch.Size([1, 5, 5]) (None, None, None)
/opt/venv/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /pytorch/c10/core/TensorImpl.h:848.)
  This is separate from the ipykernel package so we can avoid doing imports until

Tensor data types and impact on model performance

Why pytorch needs dedicated data types?

  • Numbers in Python are object, occupying more memory than floating numbers.
  • Lists in Python are collections of objects, not just numbers.
  • Python is an interpreter and runs slow on math operations.

Pytorch supports floating and integer data types as NumPy supported. While most modern CPU uses at least 32-bit floating point numbers, modern GPUs can support 16-bit floating point numbers. Integer tensors will always be created with int64 type, this can support large scale indexing for tensors with millions of elements.

The performance impacted by data types.

  • 64-bit floating point numbers do not really improve the performance over 32-bit floating point numbers.
  • 16-bit floating point numbers can cut model size into half with slight sacrifice in the performance.

Contiguous tensors

This is something that I’m now getting it. Here’s some explanation from the PyTorch forum.

Contiguous is the term used to indicate that the memory layout of a tensor does not align with its advertised meta-data or shape information.

https://discuss.pytorch.org/t/contigious-vs-non-contigious-tensor/30107/7


This basically means the order of each data in the memory matches the order of the meta-data of the tensor (size, offset, stride). So a contiguous tensor typically is the tensor when being first created and has not been applied any transformations that change the values of stride, size or offset. Input code

### Check contiguous status
print(a.is_contiguous())
print(b.is_contiguous())
print(c.is_contiguous())
print(d.is_contiguous())
# transpose 
e = c.transpose(0, 1)
print(c.stride())
print(e.stride())
print(e.is_contiguous())

Output results

True
True
True
True
(2, 1)
(1, 2)
False

Interoperability on GPU, with NumPy

With GPU, all operations will be on GPU, but accessing and printing will wake up CPU. All APIs are available for gpu-enabled tensors. And the storage instance of the cpu tensors will be copied to the RAM of the GPU.

With NumPy, the storage instance will be the same for NumPy array and the tensor. Note that the floating point number in NumPy array is 64-bit, so after converting back from NumPy, make sure we cast the tensor to torch.float32 to save computational power. Input code

# To GPU, no GPU available so error raised
test = torch.rand(3)
try:    
    test.to(device='cuda:0')
except AssertionError as e:
    print(e)

# To NumPy
test_np = test.numpy()
print(test_np.dtype)

# From NumPy
import numpy as np
test_np = np.random.rand(3)
print(test_np.dtype)
test = torch.from_numpy(test_np)
print(test.dtype)
test = test.to(dtype=torch.float32)
print(test.dtype)

PyTorch uses dispatching mechanism to make the operations are interoperational over data stored on different backends.

dl_pytorch_c3_6

– From the book “Deep Learning with Pytorch”

Conditional indexing on tensors

We may use bool tensor as indices to index the tensors. Input code

# Create a tensor 
a = torch.tensor([[1, 2, 3], [4, 5, 6]]).float()
print(a.shape, a.dtype)
# Indexing the the first row using bool array
mask = torch.tensor([1, 0]).bool()
print(mask.shape, mask.dtype)
first_row = a[mask, :]
print(first_row)

# Create bool array using comparison operators
mask2 = a < 3
a_lt_2 = a[mask2]
print(a_lt_2)

Output results

torch.Size([2, 3]) torch.float32
torch.Size([2]) torch.bool
tensor([[1., 2., 3.]])
tensor([1., 2.])

Serializing tensors

Serializing could be to pickle files, to hdf5 format, or to csv readable format.

HDF5 is a portable, widely supported format for representing serialized multidimensional arrays, organized in a nested keyvalue dictionary. Input code

# Save as pickle
torch.save(a, 'a.t')
# Load pickle
a = torch.load('a.t')
print(a)
os.remove('a.t')

Output results

tensor([[1., 2., 3.],
        [4., 5., 6.]])

Input code

# Save as hdf5
import h5py
with h5py.File('a.hdf5', 'w') as f:
    dset = f.create_dataset('test', data=a.numpy())

# Load hdf5
with h5py.File('a.hdf5', 'r') as f:  
    test_h5 = f['test']
    print(torch.from_numpy(test_h5[:]))
os.remove('a.hdf5')

Output results

tensor([[1., 2., 3.],
        [4., 5., 6.]])

Excercise

Input code

# cos needs float pointing number
a_32 = a.to(torch.float32)
print(a_32.cos())
print(a_32.sqrt())
print(a_32.storage())
# inplace
torch.cos(a_32, out=a_32)
print(a_32.storage())
# or
a_32.cos_()
print(a_32.storage())

Output results

tensor([[ 0.5403, -0.4161, -0.9900],
        [-0.6536,  0.2837,  0.9602]])
tensor([[1.0000, 1.4142, 1.7321],
        [2.0000, 2.2361, 2.4495]])
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
[torch.FloatStorage of size 6]
 0.5403023362159729
 -0.416146844625473
 -0.9899924993515015
 -0.6536436080932617
 0.28366219997406006
 0.9601702690124512
[torch.FloatStorage of size 6]
 0.8575531840324402
 0.9146533012390137
 0.5486961603164673
 0.7938734292984009
 0.9600369334220886
 0.5733804702758789
[torch.FloatStorage of size 6]

Chapter IV: Turn real-world data into tensors

Dealing with images

Convert RGB images as tensors. Note how the dimensions in pytorch are formed as channels x height x width. But in tensorflow, the dimensions are by default formed as height x width x channels. Input code

# Read in an image
import imageio
import torch
img = imageio.imread('https://raw.githubusercontent.com/qutang/jupyter_notebook_articles/main/images/dog.jpg')
# shape is in height x width x channels
print(img.shape)
# pytorch tensors should have the dimensions formed as channels x height x width, so we need to rearrange the dimensions
img_t = torch.from_numpy(img)
print(img_t.shape)
# rearrange
img_t = img_t.permute(2, 0, 1)
print(img_t.shape)
# Put the image in a batch by adding a new dimension at dimension 0, for pytorch it becomes batch x channels x height x width
batch_t = img_t.unsqueeze(0)
batch_t = batch_t.expand(5, -1, -1, -1)
print(batch_t.shape)
# Or create a batch_t at first and then copy the image to the batch
batch_t = torch.zeros(5, 3, 183, 275)
batch_t[0] = img_t
print(batch_t.shape)
# Images by default are stored as unsigned 8-bit integers. 
print(batch_t[0, 0, 0, 0].dtype)
# We need to convert them into floating point numbers.
batch_t = batch_t.float()
print(batch_t[0, 0, 0, 0].dtype)
# And better normalize them to [0, 1] or [-1, 1] which is required for efficient optimization when training the neural network
# Normalize them to [0, 1]
batch_t = batch_t / 255.0
# Standardize them to have zero mean and unit standard variation
# The mean is computed for each channel across the batch
mean_c_t = batch_t.mean([0, 2, 3]).unsqueeze(1).unsqueeze(1)
std_c_t = batch_t.std([0, 2, 3]).unsqueeze(1).unsqueeze(1)
print(mean_c_t.shape)
print(std_c_t.shape)
# Using broadcasting semantics, the trailing singleton dimensions (height, width) will be expanded automatically to do pointwise operations 
n_batch_t = (batch_t - mean_c_t) / std_c_t
print(n_batch_t.mean([0, 2, 3])) # Close to 0, because this is numerical
print(n_batch_t.std([0, 2, 3]))

Output results

(183, 275, 3)
torch.Size([183, 275, 3])
torch.Size([3, 183, 275])
torch.Size([5, 3, 183, 275])
torch.Size([5, 3, 183, 275])
torch.float32
torch.float32
torch.Size([3, 1, 1])
torch.Size([3, 1, 1])
tensor([ 4.6572e-08, -3.1048e-08, -9.3145e-08])
tensor([1.0000, 1.0000, 1.0000])

Dealing with 3D images or videos

In addition to channels, 3D images and videos will have one more dimension for depth, and for time. Therefore, the shape of a batch of 3D images would be: batch x channel x depth x height x width; while the shape of a batch of videos would be: batch x channel x time x height x width.

Example of 3D CT images

dl_pytorch_c4_1

– From the book “Deep Learning with Pytorch”

Related APIs and resources to load and transform videos in PyTorch.

  1. torchvision.io.video
  2. pytorch-VideoDataset
  3. torch_videovision

Dealing with tabular data

Pytorch tensors are homogeneous, but tabular data may include categorical, atomic data, which need to be converted to numerical data, usually with One-hot encoding.

By convention, the dimension of tabular data in PyTorch may be represented as batch x channel.

  1. Continuous values: as real numbers, strictly ordered, strict meaning in the difference between values. Support mathematical operations on the values.
    1. Ratio scale: the difference between values can be computed as the ratio of the values.
    2. Interval scale: the difference between values can only be computed as interval between values.
  2. Ordinal values: as real integers, strictly ordered, but no fixed meaning in the difference between values. Do not support mathematical operations on the values but just ordering operations.
  3. Categorical values: as real integers, no order, no fixed meaning in the difference between values. Do not support mathematical and ordering operations on the values. They are on nominal scale. Input code
# Load the wine quality tabular dataset
# !wget http://mng.bz/90Ol --no-check-certificate -O wine.csv
    
import pandas as pd
import numpy as np
wineq_pd = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv', dtype=np.float32, delimiter=";",
skiprows=1)
wineq_numpy = wineq_pd.values
wineq = torch.from_numpy(wineq_numpy)
wineq.size()

Output results

torch.Size([4897, 12])

Input code

X = wineq[:, :-1]
y = wineq[:, -1].long()
X.shape, y.shape, X.dtype, y.dtype

Output results

(torch.Size([4897, 11]), torch.Size([4897]), torch.float32, torch.int64)

One hot encoding on y if we want to use it as model input, otherwise we may use it directly as the target of the model.

Note the size of the onehot encoded array should be long enough to include the max of y (sometimes y does not start from 0). Input code

encoded_y = torch.zeros(y.shape[0], y.max() + 1)
print(y.unique())
encoded_y.scatter_(1, y.unsqueeze(1), 1.0)
encoded_y.shape, encoded_y.dtype

Output results

tensor([3, 4, 5, 6, 7, 8, 9])

Output results

(torch.Size([4897, 10]), torch.float32)

Dealing with time series data

  1. Time series tabular data

By convention, the dimensions of time series tabular data would typically be batch x channel x time, where each channel represents one feature.

  1. Audio or sensor data

By convention, the dimensions of audio or sensor data (represented in spectrogram) would typically be batch x channel x freq x time, where channel is the "audio channel (e.g., left and right) or “axial channel” (e.g., x, y, and z for multiaxial sensors) and freq x time look like an image. Therefore, networks used for images may be used for audio tensors as well.

Some good tools and resources to deal with time series data in PyTorch.

  1. torchaudio
  2. pytorch-ts

Dealing with text

Text needs to be converted to numbers, using embedding. Input code

import re
response = requests.get('https://www.gutenberg.org/files/1342/1342-0.txt')
text = BytesIO(response.content).getvalue().decode('utf-8')
text = text.lower().strip()
tokens = re.split('\n+', text)
print(tokens[1:5])
print(len(tokens), len(np.unique(tokens)))

Output results

['the project gutenberg ebook of pride and prejudice, by jane austen\r', '\r', 'this ebook is for the use of anyone anywhere at no cost and with\r', 'almost no restrictions whatsoever.  you may copy it, give it away or\r']
14592 12120

Input code

# Apply onehot encoding to each character for each line with ASCII encoding

# 128 ASCII characters
onehot_tokens = []
for token in tokens:
    token = token.lower().strip()
    onehot_token = torch.zeros(len(token), 128)
    # Use ord to get ASCII code for the character
    for i, c in enumerate(token):
        if ord(c) < 128:
            onehot_token[i][ord(c)] = 1
    onehot_tokens.append(onehot_token)

Input code

# Apply onehot encoding to each word

Encoding schemes

Encoding is needed for categorical data. Depending on whether ordering matters or not, the categorical data may be encoded as continuous values or onehot coded integers. More than that, variables may be encoded via embedding that converts categorical relational data (such as words) into numerical vectors, and use the distances between the encoded vectors to measure the similarity between the categorical data.

Here’s the workflow for encoding different types of data as tensors for pytorch models, embedding is not included.

dl_pytorch_c4_2

– From the book “Deep Learning with Pytorch”


The curated list related to PyTorch. https://www.ritchieng.com/the-incredible-pytorch/


  1. Andrej Karpathy and Li Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions,” https://cs.stanford.edu/people/karpathy/cvpr2015.pdf. ↩︎

  2. https://github.com/deep-learning-with-pytorch/ImageCaptioning.pytorch ↩︎

  3. https://github.com/qutang/ImageCaptioning.pytorch ↩︎

  4. https://pytorch.org/hub/pytorch_fairseq_roberta/ ↩︎