アプリとサービスのすすめ

アプリやIT系のサービスを中心に書いていきます。たまに副業やビジネス関係の情報なども気ままにつづります

顔から性別・年齢推定アルゴリズム(age gender estimation model)の工夫まとめ【機械学習】

人気の「age gender estimation」とかいう、人間の顔から性別と年齢を予測するモデルを作った時の、テクニックを備忘録として忘れないようにまとめとく。

目次
1.Age-gender-estimation model本体
2.後処理での工夫
3.予測結果


1. Age-gender-estimation model本体


Inceptionv3を使った。ラストの部分以外、特に変わった工夫はしてない。

from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import *

EPOCHS = 5
BATCH_SIZE = 8
HEIGHT = WIDTH = 299

def load_model(gender_cls=2, generation_cls=6, identity_cls=21):
    adam = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-07, decay=0.0)
    input_shape = (HEIGHT, WIDTH, 3)
    base_model = InceptionV3(input_shape=input_shape, weights='imagenet', include_top=True)
    bottle = base_model.get_layer(index=-3).output
    bottle = Dropout(rate=0.3)(bottle)
    bottle = GlobalAveragePooling2D()(bottle)
    # gender
    gender_output = Dense(units=gender_cls, activation='softmax', name='gender_output')(bottle)
    
    # generation
    generation_output = Dense(units=generation_cls, activation='softmax', name='generation_output')(bottle)
    
    # identity age
    identity_outout = Dense(units=identity_cls, activation='softmax', name='identity_outout')(bottle)

    model = Model(inputs=base_model.input, outputs=[generation_output, identity_outout, gender_output])
    model.compile(optimizer=adam,
                  loss={'generation_output': 'categorical_crossentropy',
                      'identity_outout': 'categorical_crossentropy',
                      'gender_output': 'binary_crossentropy'},
                  #loss_weights={'realage_output': 1, 'gender_output': 10},
                  metrics={'generation_output':'accuracy',
                           'identity_outout':'accuracy',
                           'gender_output': 'accuracy'})

    model.summary()
    return model

if __name__=='__main__':
    model = load_model()

2.後処理での工夫

人間はそれぞれ個性があるので顔自体は2〜3歳でもほとんど変化しないという性質を利用。

下は人間の顔のしわの数が年齢毎に増加するのをグラフにしたもの。

f:id:trafalbad:20210524111813p:plain

歳をとるごとにしわが増えているのがわかる。



そこでまずは全年齢を予測するのは難しいので、以下の手順で問題を細分化して簡単にした。


1.generation(世代)=6カテゴリ、identity-age(大まかな年齢)=21カテゴリを予測対象
2.generationとidentity-ageを予測


f:id:trafalbad:20210524111852p:plain



3.予測したgenerationから、identity-ageの範囲を絞り、そこから年齢を求める

f:id:trafalbad:20210524111940p:plain

というふうに難しい問題を簡単に分割して予測誤差を減らした。


こうすれば100歳分の予測やカテゴリ分類よりは簡単かつ正確にできる。



後処理コード

# 世代で0~7, 7~15, 15~25, 25~45....の6カテゴリに分類

def return_generation(age):
    if age <7:
        return 0
    elif age >=7 and age<15:
        return 1
    elif age >=15 and age < 25:
        return 2
    elif age >=25 and age < 45:
        return 3
    elif age >=45 and age <70:
        return 4
    elif age >=70:
        return 5

6クラスで世代を予測。 正解率は 83%

identity ageを21クラスで予測。 正解率は 46%



まず世代を予測して、indeentity-ageの範囲を絞り、次にidentity-ageとnp.argmaxで大まかな実年齢を求める。

難しい問題を簡単な問題に分割してやることで劇的に正解率が向上した。

class PostProcess(object):
    def __init__(self, pred_generation, pred_identity):
        self.pred_generation = pred_generation
        self.pred_identity = pred_identity
    
    def generation2identity(self, generation):
        if generation==0:
            return 0, 3
        elif generation==1:
            return 3, 5
        elif generation==2:
            return 6, 9
〜〜〜〜略〜〜〜〜
        
        
    def post_age_process(self):
  
        # generation(予測した世代)からidentity-ageの範囲のindexを取り出す
        lowidx, largeidx = self.generation2identity(np.argmax(self.pred_generation))
        print("lowidx, largeidx from generation", lowidx, largeidx, np.argmax(self.pred_generation))

   # identity-ageの範囲を絞る
        slice_pred_identity = self.pred_identity[0][lowidx:largeidx]
        print('pred_identity', self.pred_identity)
        print('list', slice_pred_identity)
        print("pred identity idx", np.argmax(slice_pred_identity)+lowidx)

   # identity-ageを求めて、実年齢に変換
        a = np.argmax(slice_pred_identity)+lowidx
        if a==0:
            return 2
        elif a==1:
            return 4
        elif a==2:
            return 6
        elif a==3:
            return 9
〜〜〜略〜〜〜

3.予測結果

顔1

f:id:trafalbad:20210524164046j:plain

予測年齢:21歳 Famale
f:id:trafalbad:20210524164130p:plain

顔2

f:id:trafalbad:20210524164118j:plain

予測年齢:2歳 Female

f:id:trafalbad:20210524164205p:plain





かなりうまく予測できてる感じする。


全体コード

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

import sys
import time
import cv2
import numpy as np

from models import model, load, identity_age
from models.load import to_mean_pixel, MEAN_AVG

x = 100
y = 250
savepath='/Users/~/desktop/'

age_estimate_model = model.load_model()
age_estimate_model.load_weights('weights/best.hdf5')

def draw_label(image, point, label, font=cv2.FONT_HERSHEY_PLAIN,
               font_scale=1.5, thickness=2):
    text_color = (255, 255, 255)
    cv2.putText(image, label, point, font, font_scale, text_color, thickness, lineType=cv2.LINE_AA)


img_path='image1.jpg'
img = cv2.imread(img_path)
img = cv2.resize(img, (299, 299))
cimg = img.copy()
img = to_mean_pixel(img, MEAN_AVG)
img = img.astype(np.float32)/255
img = np.reshape(img, (1, 299, 299, 3))

generation, iden, gender = age_estimate_model.predict(img)
pred_identity = PostProcess(generation, iden).post_age_process()
    
pred_gender = "Male" if np.argmax(gender) < 0.5 else "Female"
print('gender is {0} and predict age is {1}'.format(pred_gender, pred_identity))
label = "{0} age  {1}".format(int(pred_identity), "Male" if np.argmax(gender) < 0.5 else "Female")

draw_label(cimg, (x, y), label)
cv2.imwrite('prediction.png', cimg)


こういう誤差をなくすように工夫することで、重いベンチマークのモデルを使わなくても簡単なモデルでかなりいい具合の予測ができた。

人間の脳と同じで問題がむずければ完璧主義はやめて、園児が解けるレベルに簡単にしても十分性能の良いモデルができる。

Google openImage Datasetでyolov4のデータセットをdownload & annotationファイルの作成

今回はgoogle open Image datasetのyolov4データをdownloadする方法。

google open Image datasetは物体検出からセグメンテーションまで良質なデータが揃ってtて、v1〜v6まである。

直でdownloadすると割と面倒。(調べるのがめんどい)

なので今回は物体検出の特定のclassのデータをdownloadする方法のメモ。

データのDownload

OIDv4_ToolKitを使う。

  • Open Images Dataset V4 の任意のクラスだけの画像とアノテーションデータをダウンロードすることができる
  • アノテーションデータは、物体検出 のみ。セグメンテーションは対応していない。
  • bbox は [name_of_the_class, left, top, right, bottom] の .txt フォーマットで得られるため、場合によっては変換が必要
$ git clone https://github.com/EscVM/OIDv4_ToolKit.git
$ cd OIDv4_ToolKit
$ pip3 install -r requirements.txt

今回はclass==Knife をdownload。
classはGoogle open Image Datsetの検索欄から見れる。


--type_csv[train/validation/test/all]と選択可能。
全部欲しいのでallを指定。

引数はgithubに書いてある通りに指定できる


IsOccluded: Indicates that the object is occluded by another object in the image.

IsTruncated: Indicates that the object extends beyond the boundary of the image.

IsGroupOf: Indicates that the box spans a group of objects (e.g., a bed of flowers or a crowd of people). We asked annotators to use this tag for cases with more than 5 instances which are heavily occluding each other and are physically touching.

IsDepiction: Indicates that the object is a depiction (e.g., a cartoon or drawing of the object, not a real physical instance).

IsInside: Indicates a picture taken from the inside of the object (e.g., a car interior or inside of a building).

n_threads: Select how many threads you want to use. The ToolKit will take care for you to download multiple images in parallel, considerably speeding up the downloading process.

limit: Limit the number of images being downloaded. Useful if you want to restrict the size of your dataset.

y: Answer yes when have to download missing csv files.

$ python3 main.py downloader --classes Knife --type_csv all

全部 [Y]で進み、download。





フォルダ構造

$ tree OID
>>>>

OID
|-- Dataset
|   |-- test
|   |   |
|   |   |
|   |   |-- Knife
            |--〜.jpg (Knife画像)
            -- Label
                 |-- ~.txt (box用label text)
|   |-- train
|   |   |
|   |   |
|   |   |-- Knife
            |--〜.jpg (Knife画像)
            -- Label
                 |-- ~.txt (box用label text)
|   |-- validation
|   |   |
|   |   |
|   |   |-- Knife
            |--〜.jpg (Knife画像)
            -- Label
                 |-- ~.txt (box用label text)
`-- csv_folder
    |-- class-descriptions-boxable.csv
    |-- test-annotations-bbox.csv
    |-- train-annotations-bbox.csv
    `-- validation-annotations-bbox.csv


データの情報・中身

# OIDv4_ToolKit/OID/Dataset/train/Knife/Labelのtxtファイルの中身
# validationとtestも同じ

$ cat 870eb1cdddbcce5a.txt
Knife 24.320256 23.04 767.360256 849.92

# OIDv4_ToolKit/OID/Dataset/train/Knife/Labelのデータ数
$ ls -l | wc -l
611

# OIDv4_ToolKit/OID/Dataset/train/Knifeの画像枚数
$ ls |wc -l 
611

# OIDv4_ToolKit/OID/Dataset/test/Knifeの画像とラベル数
161
# OIDv4_ToolKit/OID/Dataset/validation/Knifeの画像とラベル数
56
# csv_folderのフィイルの中身
$ cat class-descriptions-boxable.csv

~~
/m/0pcr,Alpaca
/m/0pg52,Taxi
/m/0ph39,Canoe
/m/0qjjc,Remote control
/m/0qmmr,Wheelchair
/m/0wdt60w,Rugby ball
/m/0xfy,Armadillo
/m/0xzly,Maracas
/m/0zvk5,Helmet

$ cat test-annotations-bbox.csv
>>>
fffc6543b32da1dd,freeform,/m/0jbk,1,0.013794,0.999996,0.388438,0.727906,0,0,1,0,0
fffd0258c243bbea,freeform,/m/01g317,1,0.000120,0.999896,0.000000,1.000000,1,0,1,0,0

$ cat validation-annotations-bbox.csv

>>>
ffff21932da3ed01,freeform,/m/0c9ph5,1,0.540223,0.624863,0.493633,0.577892,1,0,1,0,0
ffff21932da3ed01,freeform,/m/0cgh4,1,0.002521,1.000000,0.000000,0.998685,0,0,0,0,1

Knifeの画像データ

f:id:trafalbad:20210510112838j:plain


データをyolov4で読み込ませる

dataフォルダにKnifeフォルダを入れる。

そんでyolov4用のtextファイルの作成

classes = ['Knife']
classes_dicts = {key:idx for idx, key in enumerate(classes)}

def main(label_path, jpg_path_name, save_filetxt_name):
    with open(save_filetxt_name, 'w') as f:
        for path in os.listdir(label_path):
            filename = path.replace('txt', 'jpg')
            f.write(os.path.join(jpg_path_name, filename))
            
            loadf = open(os.path.join(label_path, path), 'r', encoding='utf-8')
            for line in loadf.readlines():
                cls, x_min, y_min, x_max, y_max = line.split(" ")
                ## rewrite
                y_max = y_max.rstrip('\n')
                x_min, y_min, x_max, y_max = int(float(x_min)), int(float(y_min)), int(float(x_max)), int(float(y_max))
                cls = classes_dicts[cls]
                box_info = " %d,%d,%d,%d,%d" % (
                x_min, y_min, x_max, y_max, int(cls))
                f.write(box_info)
            f.write('\n')
            
if __name__=='__main__':
    data_type='test'
    assert data_type in ['train', 'validation', 'test'], 'corecct word from [train, validation, test]'
    jpg_path_name = 'data/Knife/Dataset/{}/Knife'.format(data_type)
    save_filetxt_name = 'data/pytorch_yolov4_{}.txt'.format(data_type)
    label_path = 'data/Knife/Dataset/{}/Knife/Label'.format(data_type)
    main(label_path, jpg_path_name, save_filetxt_name)


出来上がった、「pytorch_yolov4_validation.txt」を開いてみる。

# load用関数
data_type='validation'
save_filetxt_name = 'data/pytorch_yolov4_{}.txt'.format(data_type)
lable_path = save_filetxt_name

def open_txtfile(label_path):
    truth = {}
    f = open(lable_path, 'r', encoding='utf-8')
    for line in f.readlines():
        data = line.split(" ")
        truth[data[0]] = []
        for i in data[1:]:
            truth[data[0]].append([int(float(j)) for j in i.split(',')])
            print(truth)

open_txtfile(label_path)

>>>

data/Knife/Dataset/validation/Knife/2497ac78d31d89d5.jpg 15,166,942,489,0
data/Knife/Dataset/validation/Knife/09a9a9d1fe0a592a.jpg 55,313,333,1024,0
data/Knife/Dataset/validation/Knife/f2a2a1a0095f5d79.jpg 108,481,1024,636,0
data/Knife/Dataset/validation/Knife/4b6d3c391753e5ce.jpg 225,59,372,219,0 539,242,1024,292,0 611,478,1024,720,0 776,179,1024,244,0
data/Knife/Dataset/validation/Knife/4b1fc77d58646a7e.jpg 65,66,983,744,0
〜〜〜

一応githubのREADME.mdのやつと同じにできてる。これでyolov4用のannotation txt fileができた。





yolov4ファイルの変更ポイント

読み込ませるには以下の点を変更した。

  • dataset.pyのYolo_datasetクラスのimageをloadするときのos.path.joinを消した。
# dataset.py
class Yolo_dataset(Dataset):
〜〜
  def __getitem__(self, index):
        if not self.train:
            return self._get_val_item(index)
        img_path = self.imgs[index]
        bboxes = np.array(self.truth.get(img_path), dtype=np.float)
        img_path = img_path
        use_mixup = self.cfg.mixup
        if random.randint(0, 1):
            use_mixup = 0

  for i in range(use_mixup + 1):
            if i != 0:
                img_path = random.choice(list(self.truth.keys()))
                bboxes = np.array(self.truth.get(img_path), dtype=np.float)
                img_path = img_path
            img = cv2.imread(img_path)
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
  • メモリ調整のためtrain.pyの引数にnum_workerを追加
def train(model, device, config, epochs=5, batch_size=1, save_cp=True, num_worker = 0, log_step=20, img_scale=0.5):
    train_dataset = Yolo_dataset(config.train_label, config, train=True)
    val_dataset = Yolo_dataset(config.val_label, config, train=False)
   〜〜〜〜
 writer.close()
#  train.py => 実行
# num_workerでメモリ加減を調整

try:
    train(model=model,
          device=device,
          config=cfg,
          epochs=cfg.TRAIN_EPOCHS,
          num_worker=0)
except KeyboardInterrupt:
    torch.save(model.state_dict(), 'INTERRUPTED.pth')
    logging.info('Saved interrupt')
    try:
        sys.exit(0)
    except SystemExit:
        os._exit(0)

>>>>>

2021-05-11 07:57:09,935 <ipython-input-3-6cfc1c1d5a28>[line:36] INFO: Starting training:
        Epochs:          300
        Batch size:      64
        Subdivisions:    16
        Learning rate:   0.001
        Training size:   610
        Validation size: 56
        Checkpoints:     True
        Device:          cpu
        Images size:     608
        Optimizer:       adam
        Dataset classes: 1
        Train label path:data/pytorch_yolov4_train.txt
        Pretrained:
    
Epoch 1/300:   0%|       | 0/610 [00:07<?, ?img/s]


無事動いた。

参考サイト

はじめての Google Open Images Dataset V6

OIDv4_ToolKit

M1 MacBook Air のsetupの記録

M1のMac、2021/04の時点で、brewはいかれてるは、tensorflowはinstallできないはで普通に使えない。
試行錯誤した時のメモ。


時系列順に実行した記録。


f:id:trafalbad:20210406003747j:plain

python3とpip3のinstall

python3.9にtensorflow非対応なので、python3.8に下げる。

まずIntelと混ざるのを防ぐため、brewのpython3を消す

$ brew uninstall --ignore-dependencies python3
$ python3 --version
>>>
Python 3.8.2

$ python3 -m pip install --upgrade pip --user
$ pip3 --version   
>>>>   
pip 21.0.1 from /Users/ha~/Library/Python/3.8/lib/python/site-packages/pip (python 3.8)


$ which python3
>>>>>
/usr/bin/python3


******pay attention
M1のMac(2021/04時点)では「/opt」以下にanacondaとかbrewがinstallされてます。M1 Macでのhomebrewは 公式のドキュメント で /opt/homebrew にインストールすることが推奨されています(Intel版との衝突を避けて共存のため)。



condaのinstall

まず普通にanacondaのdownload。

$ conda
>>>>
zsh: command not found: conda

エラー吐くクソ野郎なので、正常にinstallされてるか確認

$ /opt/anaconda3/bin/conda init zsh

エラーが出なければOK

$ /opt/anaconda3/bin/conda --version
>>>
conda 4.9.2


やっぱりこの類のエラーかよ。



めんどいのでcondaのショートカットを作成

オリジナルのコマンド「conde」でcondaを使えるようにした。

# コマンドを追加
$ sudo vi ~/.bashrc
$ sudo vi ~/.zshenv

>>>>>
alias conde='/opt/anaconda3/bin/conda'
# パスを通す
$ sudo vi ~/.bash_profile
>>>
source ~/.bashrc
source ~/.zshenv


$ source ~/.bash_profile
# 確認
$ conde --version
>>>>
conda 4.9.2


できた。


アーキテクチャの確認・切替

# 買った時のモード
$ uname -m
>>>>arm64
$ arch 
>>>>arm64


archコマンドでアーキテクチャの切替

$ arch -x86_64 bash

# Rosetta2(Intel)で動いている多分
$ arch ($uname -m)
i386
元に戻す。
$ arch -arm64 bash


tensorflowのinstall

pip3でinstallしたtensorflowを実行するとzsh illegal hardwareとエラーが出る。
もうcondaでinstallするしか、方法がわかりませんでした。

# pip3 でinstallしたtensorflowをuninstall
$ pip3 uninstall tensorflow

# condaでtensorflowをinstall
$ (/opt/anaconda3/bin/conda) conde install tensorflow

>>>>

Specifications:

  - tensorflow -> python[version='2.7.*|3.7.*|3.6.*|3.5.*']

Your python: python=3.8


python 3.6か3.5にしろと言われた。

$ (/opt/anaconda3/bin/conda) conde install python=3.6

$ (/opt/anaconda3/bin/conda) conde install tensorflow


実行できるか確かめる

# tf.py
import tensorflow
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.models import Model
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import RMSprop, Adam, SGD
$ python3 tf.py


エラーが出ないのでtensorflowをやっと実行できた。

****Pay attention
最初はM1 macbookairはpython3.9だったので、python3.9をサポートしてないtensorflowはpip3でinstallできませんでした。
最終手段でもうcondaでinstallするしかなかったです。



関連ライブラリもcondaでinstall (almost never pip3)

python3.6に下げたので、ライブラリの入れ直し。
pip3との混在を避けるためpip3でinstallするしかないライブラリ以外は、condaでinstallする

# opencvの例
$ (/opt/anaconda3/bin/conda) conde install -c conda-forge opencv


参考

M1 Mac+tensorflow-macosでディープラーニングする

M1 Mac買ったので行ったセットアップを書いていく

GCSの画像を使ってGCRのDockerコンテナでAI Platformのトレーニングジョブを動かす【MLops building2】

今度はGCSにuploadした画像を読み込んで自前スクリプトをDocker化して、GCRにpush。
それからAIplatformでトレーニングジョブを実行してみる。

おおおまかにここのチュートリアルを参考にしてDockerをGCRにpushしてから、トレーニングジョブを実行してみた。

まずは「AI Platform Training & Prediction, Compute Engine and Container Registry API 」を有効にしてから、
GCPVMインスタンス上でジョブを実行。


目次
1.GCSに画像をupload
2.環境変数の設定
3.トレーニング用pythonスクリプト
4.DockerコンテナをGCRにpushして作成
5.AI Platformでジョブを実行


1.GCSに画像をupload

AIPlatformでノートブックインスタンスを作成。regionはus-central1

ログインして、画像upload用のバケット「mlops-test-bakura」作成

$ gsutil mb gs://mlops-test-bakura/
>>>
Creating gs://mlops-test-bakura/...


画像をGUIでフォルダ「right」をupload

# GCSに「right」フォルダ内に画像があることを確認
$  gsutil ls gs://mlops-test-bakura/right/*.jpg
>>>
〜
gs://mlops-test-bakura/right/ml_670008765.jpg
gs://mlops-test-bakura/right/nm_78009843.jpg
gs://mlops-test-bakura/right/kj_78009847.jpg

2.環境変数の設定

# output用バケットの作成と環境変数の設定
export BUCKET_ID=output-aiplatform
gsutil mb gs://$BUCKET_ID/
export PROJECT_ID=$(gcloud config list project --format "value(core.project)")



3.トレーニング用pythonスクリプト

# コードをgit clone
$ git clone https://github.com/GoogleCloudPlatform/cloudml-samples
$ cd cloudml-samples/tensorflow/con*/un*/ 

# 訓練用スクリプト 
$ tree
>>>
├── Dockerfile
├── data_utils.py
├── model.py
└── task.py

ここで訓練用スクリプトを自分用に書き換える

model.py
自前ネットワーク

from tensorflow.keras import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.applications.vgg16 import VGG16
from tensorflow.keras.models import Model

def sonar_model():
    base_model = VGG16(weights='imagenet', include_top=True, input_tensor=Input(shape=(224,224,3)))
    x = base_model.get_layer(index=-5).output
    x = Dropout(rate=0.3)(x)
    x = GlobalAveragePooling2D()(x)
    o = Dense(3, activation='softmax')(x)
    model = Model(inputs=base_model.input, outputs=o)
    model.compile(loss='categorical_crossentropy', optimizer='sgd',
                  metrics=['accuracy'])
    return model



data_utils.py
さっきのGCSから画像を読み込む。

import datetime
from google.cloud import storage
import tempfile
import os, cv2
import numpy as np
from sklearn.model_selection import train_test_split
import tensorflow
from tensorflow.keras.utils import to_categorical


client = storage.Client()
BUCKET_NAME = "mlops-test-bakura"
FOLDER_NAME = "right"
NUM_CLS = 2+1


def load_label(y, num_classes=3):
    return to_categorical(y, num_classes=num_classes)
    
    
def download(bucket_name, folder_name):
    images = []
    labels = []
    c = 0
    for blob in client.list_blobs(bucket_name, prefix=folder_name):
        _, _ext = os.path.splitext(blob.name)
        _, temp_local_filename = tempfile.mkstemp(suffix=_ext)
        blob.download_to_filename(temp_local_filename)
        img = cv2.imread(temp_local_filename)
        images.append(cv2.resize(img, (224, 224)))
        if len(images)==200:
            c += 1
        elif len(images)==400:
            c += 1
        labels.append(c)
        #print(f"Blob {blob_name} downloaded to {temp_local_filename}.")
    return np.array(images)/255, np.array(labels)
    
    
def load_data(args):
    imgs, labels = download(BUCKET_NAME, FOLDER_NAME)
    labels = load_label(labels, num_classes=NUM_CLS)
    print(imgs.shape, labels.shape)
    train_f, test_f, train_l, test_l = train_test_split(
            imgs, labels, test_size=args.test_split, random_state=args.seed)
    return train_f, test_f, train_l, test_l
    
    
def save_model(model_dir, model_name):
    """Saves the model to Google Cloud Storage"""
    bucket = storage.Client().bucket(model_dir)
    blob = bucket.blob('{}/{}'.format(
        datetime.datetime.now().strftime('sonar_%Y%m%d_%H%M%S'),
        model_name))
    blob.upload_from_filename(model_name)


task.py

import argparse
import data_utils
import model


def train_model(args):
    train_features, test_features, train_labels, test_labels = \
        data_utils.load_data(args)

    sonar_model = model.sonar_model()

    sonar_model.fit(train_features, train_labels, epochs=args.epochs,
                    batch_size=args.batch_size)

    score = sonar_model.evaluate(test_features, test_labels,
                                 batch_size=args.batch_size)
    print(score)

    # Export the trained model
    sonar_model.save(args.model_name)

    if args.model_dir:
        # Save the model to GCS
        data_utils.save_model(args.model_dir, args.model_name)


def get_args():
    parser = argparse.ArgumentParser(description='Keras Sonar Example')
    parser.add_argument('--model-dir',
                        type=str,
                        help='Where to save the model')
    parser.add_argument('--model-name',
                        type=str,
                        default='sonar_model.h5',
                        help='What to name the saved model file')
    parser.add_argument('--batch-size',
                        type=int,
                        default=4,
                        help='input batch size for training (default: 4)')
    parser.add_argument('--test-split',
                        type=float,
                        default=0.2,
                        help='split size for training / testing dataset')
    parser.add_argument('--epochs',
                        type=int,
                        default=1,
                        help='number of epochs to train (default: 10)')
    parser.add_argument('--seed',
                        type=int,
                        default=42,
                        help='random seed (default: 42)')
    args = parser.parse_args()
    return args

def main():
    args = get_args()
    train_model(args)

if __name__ == '__main__':
    main()


Dockerfile

FROM tensorflow/tensorflow:nightly
WORKDIR /root
ENV DEBIAN_FRONTEND=noninteractive

# Installs pandas, google-cloud-storage, and scikit-learn
# scikit-learn is used when loading the data

RUN pip install pandas google-cloud-storage scikit-learn
RUN apt-get install -y python-opencv python3-opencv

# Install curl
RUN apt-get update; apt-get install curl -y
# The data for this sample has been publicly hosted on a GCS bucket.
# Download the data from the public Google Cloud Storage bucket for this sample
RUN curl https://storage.googleapis.com/cloud-samples-data/ml-engine/sonar/sonar.all-data --output ./sonar.all-data
# Copies the trainer code to the docker image.
COPY model.py ./model.py
COPY data_utils.py ./data_utils.py
COPY task.py ./task.py
# Set up the entry point to invoke the trainer.
ENTRYPOINT ["python", "task.py"]

3.DockerコンテナをGCRにpushして作成

次にGCRにDockerのカスタムコンテナを作成

# gcloudでdockerを認証
sudo docker run busybox date
gcloud auth configure-docker
# Docker iamgeのbuild
REGION=us-central1
export IMAGE_REPO_NAME=sonar_tf_nightly_container
export IMAGE_TAG=sonar_tf

# IMAGE_URI: the complete URI location for Cloud Container Registry
export IMAGE_URI=gcr.io/$PROJECT_ID/$IMAGE_REPO_NAME:$IMAGE_TAG
export JOB_NAME=custom_container_tf_nightly_job_$(date +%Y%m%d_%H%M%S)



# docker ビルド (docker build -t gcr.io/[project id]/[app]:latest .)
sudo docker build -f Dockerfile -t $IMAGE_URI ./

# 正常に動作してるか確認
sudo docker run $IMAGE_URI --epochs 1
>>>>
〜〜〜〜
553467904/553467096 [==============================] - 3s 0us/step
2021-03-13 03:46:29.032178: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimi
zation Passes are enabled (registered 2)
2021-03-13 03:46:29.032776: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2299995000 Hz
30/30 [==============================] - 103s 3s/step - loss: 0.2690 - accuracy: 0.8701 
8/8 [==============================] - 7s 847ms/step - loss: 4.4660e-04 - accuracy: 1.0000
[0.0004466005484573543, 1.0]

# imageをGCRにpush
# docker push gcr.io/[project id]/[app]:latest
sudo docker push $IMAGE_URI

GCRにdocker imageがpushされてる
f:id:trafalbad:20210303165654p:plain




4.AI Platformでジョブを実行

# ジョブを実行
$ gcloud components install beta
$ gcloud beta ai-platform jobs submit training $JOB_NAME --region $REGION --master-image-uri $IMAGE_URI --scale-tier BASIC -- --model-dir=$BUCKET_ID --epochs=1


# ジョブテータスとストリームログをモニタリング
gcloud ai-platform jobs describe $JOB_NAME
gcloud ai-platform jobs stream-logs $JOB_NAME

>>>>
〜〜〜〜〜
INFO    2021-03-03 07:28:04 +0000       master-replica-0                Test set: Average loss: 0.0516, Accuracy: 9
839/10000 (98%)
INFO    2021-03-03 07:28:04 +0000       master-replica-0
INFO    2021-03-03 07:30:30 +0000       service         Job completed successfully.
# GCSにモデルが保存されてるか確認
$ gsutil ls gs://$BUCKET_ID/sonar_*
>>>
gs://output-aiplatform/sonar_20210313_055918/sonar_model.h5

ちゃんとジョブが成功してGCSにh5の重みが保存されてた。




参考サイト

スタートガイド: カスタム コンテナを使用したトレーニング
AI Platform(GCP)でGPU 100個同時に使いテンションあがった
github:cloudML-sample

GCPのAI Platfromでトレーニングジョブの実行(MLops building1)

MLops構築のためにGCPのAI Platformをいじってみる備忘録。
基礎本の代わりにqiitaとかの良記事を自分で動かしてならしてくことにしました。

ちなみに今回は「pythonのコマンドとして実行できる形式でパッケージ」でトレーニングジョブを実行。

目次
1.実行リージョンの統一
2.必要ライブラリのinstall
3.GCSからトレーニング用ファイルをdownload
4.トレーニングジョブの実行
5.トレーニング後のファイル確認



version

python==3.7.9
tensorflow==2.4.0
keras==2.3.2


1.実行リージョンの統一

実行するリージョンをGCSとAI Platformとかで繋がってるリソースで統一する。今回は「us-central1」


GCSにデータ保存用バケット(mlops-test-bakura)を作る(region= us-central1)



今回は使わないけどデータnumpyファイル、「X.npy, y.npy」をUpload。


AI platformでトレーニング用ノートブックインスタンスを作る(region= us-central1)




AI platformからだとコンソールにアクセスできないので、VMインスタンスからアクセスする。

f:id:trafalbad:20210302183115p:plain


AI platformで作ったインスタンスVMと連携されてるっぽい。

右側のSSHバーの「ブラウザウィンドウで開く」からコンソールにlogin。

f:id:trafalbad:20210302183019p:plain



2.必要ライブラリのinstall

# 仮想環境構築 & login
$ pip install virtualenv
$ virtualenv mlops (your-env-name)
$ cd mlops && source bin/activate

# google-cloud-storageはpythonでGCSの操作に必要
$ pip install google-cloud-storage 
$ pip install tensorflow==2.4.0
$ pip install keras==2.3.2
# 設定確認
$ gcloud config list

# プロジェクトID設定(ここはその時にIDに置き換える)
gcloud config set project sturdy-willow-ops<project-id>

# Google Cloud SDKを最新にupdate
$ sudo apt-get update && sudo apt-get --only-upgrade install kubectl google-cloud-sdk google-cloud-sdk-app-engine-grpc google-cloud-sdk-pubsub-emulator google-cloud-sdk-app-engine-go google-cloud-sdk-cloud-build-local google-cloud-sdk-datastore-emulator google-cloud-sdk-app-engine-python google-cloud-sdk-cbt google-cloud-sdk-bigtable-emulator google-cloud-sdk-app-engine-python-extras google-cloud-sdk-datalab google-cloud-sdk-app-engine-java


レーニング用ファイルをgit cloneさせてもらう

$ git clone https://github.com/YoheiFukuhara/gcp-aip-test02
# ファイル構成
$ tree
├── README.md
└─trainer
    ├── __init__.py
    ├── model.py
    ├── task.py
    └── util.py



3.GCSからトレーニング用ファイルをdownload

# GCSにトレーニング用データがあることを確認
$ gsutil ls gs://mlops-test-bakura/*

uploadしたファイル
f:id:trafalbad:20210302183304p:plain



GCSからファイルをdownloadするファイル
今回は権限周りの関係でエラーが起こるのでこのデータは使わない

gcs.py

from google.cloud import storage
client = storage.Client()
bucket_name = "mlops-test-bakura"
bucket = client.get_bucket(bucket_name)
def downloads(filenames):
    blob = bucket.get_blob(filenames)
    blob.download_to_filename(filenames)

if __name__=='__main__':
    downloads('X.npy')
    downloads('y.npy')

util.py

import numpy as np
NUM_TRAIN = 128

def load_data():
    data = np.random.rand(NUM_TRAIN, 2)
    labels = (np.sum(data, axis=1) > 1.0) * 1
    labels = labels.reshape(NUM_TRAIN, 1)
    return data, labels

task.py

import tensorflow as tf
import model
import util


def train_and_evaluate():

    train_x, train_y = util.load_data()


    # Create the Keras Model
    keras_model = model.create_keras_model()

    keras_model.fit(train_x, train_y, epochs=300, validation_split=0.2)


if __name__ == '__main__':
    train_and_evaluate()

model.py

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense


def create_keras_model():
    # Sequentialモデル使用(Sequentialモデルはレイヤを順に重ねたモデル)
    model = Sequential()

    # 結合層(2層->4層)
    model.add(Dense(4, input_dim=2, activation="tanh"))

    # 結合層(4層->1層):入力次元を省略すると自動的に前の層の出力次元数を引き継ぐ
    model.add(Dense(1, activation="sigmoid"))

    # モデルをコンパイル
    model.compile(loss="binary_crossentropy", optimizer="sgd", metrics=["accuracy"])

    model.summary()

    return model


4.トレーニングジョブの実行

GCPVMインスタンスのコンソール上でトレーニングジョブを実行。

# output用バケット作成
BUCKET_NAME="output-mlops134"
gsutil mb gs://$BUCKET_NAME/
>>> Creating gs://output-mlops1/...
now=$(date +"%Y%m%d_%H%M%S")
JOB_NAME="output_mlops_$now"$(date +"%Y%m%d_%H%M%S")
OUTPUT_PATH=gs://$BUCKET_NAME/$JOB_NAME
# トレーニングジョブ実行
$ gcloud ai-platform jobs submit training $JOB_NAME --job-dir $OUTPUT_PATH --module-name trainer.task --package-path trainer/ --region us-central1 --python-version 3.5 --runtime-version 1.14
>>>>
Job [output_mlops_20210302_02341620210302_023430] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe output_mlops_20210302_02341620210302_023430

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs output_mlops_20210302_02341620210302_023430
jobId: output_mlops_20210302_02341620210302_023430
state: QUEUED
$ gcloud ai-platform jobs describe output_mlops_20210302_02341620210302_023430

>>>>
snap/google-cloud-sdk/170/lib/third_party/requests/__init__.py:83: RequestsDependencyWarning: Old version of crypt
ography ([1, 2, 3]) may cause slowdown.
  warnings.warn(warning, RequestsDependencyWarning)
createTime: '2021-03-02T06:03:08Z'
etag: uTM4hhyiW_o=
jobId: output_mlops_20210302_06023220210302_060238
startTime: '2021-03-02T06:03:41Z'
state: RUNNING
trainingInput:
  jobDir: gs://output-mlops134/
  packageUris:
  - gs://output-mlops134/packages/453debf47cd4ec4a32f47f04f3a18dcb2686609e5fb812662b28044f1b8c6aeb/trainer-0.0.0.tar.
gz
  pythonModule: trainer.task
  pythonVersion: '3.5'
  region: us-central1
  runtimeVersion: '1.14'
trainingOutput: {}
View job in the Cloud Console at:
https://console.cloud.google.com/mlengine/jobs/output_mlops_20210302_06023220210302_060238?project=sturdy-willow-16
7902
View logs at:
https://console.cloud.google.com/logs?resource=ml_job%2Fjob_id%2Foutput_mlops_20210302_06023220210302_060238&projec
t=sturdy-
# ジョブの実行状況を見る
$ gcloud ai-platform jobs stream-logs output_mlops_20210302_02341620210302_023430

>>>
〜〜〜〜
service     Job completed successfully.

5.トレーニング後のファイル確認

# GCSの保存ファイルを確認
$ gsutil ls gs://output-mlops134/*

gs://aip-test01/〜〜〜〜〜〜/trainer-0.0.0.tar.gz

# 中身
└── trainer-0.0.0
    ├── PKG-INFO
    ├── setup.py
    └── trainer
        ├── __init__.py
        └── task.py

参考サイト

github:googleapis/python-storage
Downloading a file from google cloud storage inside a folder
Google AI Platform - Cloud ML Engineを初心者が動かして理解(後編)

コーディング試験-思考力錬成用-応用問題 from Codility at 2021/01

Codility problems

2021/01月の記録です。

Codilityの難易度
「PAINLESS」<「RESPECTABLE」<「AMBITIOUS」
の順でむずくなってる

まずは分割統治法で簡単なの解いてみて、test=> 汎用的なコード書くこと
エラーはpythonでも、javaとかc++でもググって応用してみる。特にjavaは回答が充実してるのでjavaからの応用はおすすめ

f:id:trafalbad:20210128211805p:plain


Iteration:BinaryGap(PAINLESS)

my solution

def reset(ones, zeros):
    ones = 1
    zeros = 0
    return ones, zeros
    
def solution(N):
    binari = bin(N)[2:]
    ones, zeros = 0, 0
    lenth = []
    for i, val in enumerate(binari):
        if val==str(1):
            ones+=1
        else:
            zeros+=1          
        if ones==2:
            lenth.append(zeros)
            ones, zeros = reset(ones, zeros)
    return max(lenth) if lenth else 0


smart code solution

def solution(N):
    N = str(bin(N)[2:])
    count = False
    gap = 0
    max_gap = 0
    for i in N:
        if i == '1' and count==False:
            count = True
        if i == '0' and count == True:
            gap += 1
        if i == '1' and count == True:
            max_gap = max(max_gap, gap)
            gap = 0
    return max_gap

Array:CyclicRotation(PAINLESS)

My solution

def solution(A, K):
    n = len(A)
    for _ in range(K):
        last = A[-1]
        del A[-1]
        A=[last]+A
    return A


smart solution

def solution(A, K):
    # write your code in Python 2.7
    l = len(A)
    if l < 2:
        return A
    elif l == K:
        return A
    else:
        B = [0]*l
        for i in range(l):
            B[(i+K)%l] = A[i]
        return B

Time Complexity:TapeEquilibrium(PAINLESS)

My solution

def solution(A):
    diff = float('inf')
    for i in range(1, len(A)-1):
        s1 = sum(A[:i])
        s2 = sum(A[i:])
        diff = min(diff, abs(s1-s2))
    return diff


smart solution

def solution(A):
    total, minimum, left = sum(A), float('inf'), 0
    for a in A[:-1]:
        left += a
        minimum = min(abs(total - left - left), minimum)
    return minimum


Counting Elements:MaxCounters(RESPECTABLE)

My solution

def solution(N, A):
    arr = [0]*N
    maxim = max(A)
    for val in A:
        if val == maxim:
            arr = [max(arr)]*N
        else:
            arr[val-1]+=1
    return arr

smart solution

def solution2(N, A):
    counters = [0] * N
    for el in A:
        if el <= N:
            counters[el - 1] += 1
        else:
            counters = [max(counters)] * N
    return counters

CoderByte
CoderByte Challenge Libarary

f:id:trafalbad:20210130212935p:plain

Easy & Algorithm

Find Intersection

FindIntersection(strArr) read the array of strings stored in strArr which will contain 2 elements: the first element will represent a list of comma-separated numbers sorted in ascending order, the second element will represent a second list of comma-separated numbers (also sorted). Your goal is to return a comma-separated string containing the numbers that occur in elements of strArr in sorted order. If there is no intersection, return the string false.

Input: ["1, 3, 4, 7, 13", "1, 2, 4, 13, 15"] 
Output: 1,4,13
def FindIntersection(strArr):
    st1 = list(map(int, strArr[0].split(', ')))
    st2 = list(map(int, strArr[1].split(', ')))
    string = []
    for s in st1:
        if s in st2:
            string.append(str(s))
    return ','.join(string) if string else False

Codeland Username Validation

Have the function CodelandUsernameValidation(str) take the str parameter being passed and determine if the string is a valid username according to the following rules:

1. The username is between 4 and 25 characters.
2. It must start with a letter.
3. It can only contain letters, numbers, and the underscore character.
4. It cannot end with an underscore character.

If the username is valid then your program should return the string true, otherwise return the string false.

# sample1
input: "aa_" 
Output: false

#sample2
Input: "u__hello_world123" 
Output: true
def CodelandUsernameValidation(strParam):
    stack = []
    if len(strParam)<4 or len(strParam)>25:
        return 'false'
    if not strParam[0].isalpha() or strParam[-1]=='_':
        return 'false'
    for x in list(strParam):
        if x.isalpha() or x=='_' or x.isdigit():
            stack.append(x)
    return 'true' if stack else 'false'

Questions Marks

Have the function QuestionsMarks(str) take the str string parameter, which will contain single digit numbers, letters, and question marks, and check if there are exactly 3 question marks between every pair of two numbers that add up to 10. If so, then your program should return the string true, otherwise it should return the string false. If there aren't any two numbers that add up to 10 in the string, then your program should return false as well.

For example: if str is "arrb6???4xxbl5???eee5" then your program should return true because there are exactly 3 question marks between 6 and 4, and 3 question marks between 5 and 5 at the end of the string.

# sample1
Input: "aa6?9" 
Output: false

#sample2
Input: "acc?7??sss?3rr1??????5" 
Output: true
def QuestionsMarks(strParam):
  question = []
  total = 0
  for s in list(strParam):
    if s.isdigit() and len(question)<3:
      total += int(s)
    elif s.isdigit() and len(question)>3:
      total += int(s)
      if total==10:
        return 'true'
    elif s=='?':
       question.append(s)
  return 'false'


Longest Word

Have the function LongestWord(sen) take the sen parameter being passed and return the largest word in the string. If there are two or more words that are the same length, return the first word from the string with that length. Ignore punctuation and assume sen will not be empty.

# sample1
Input: "fun&!! time" 
Output: time

#sample2
Input: "I love dogs" 
Output: love
def LongestWord(sen):
    stack = {}
    string=''
    for s in list(sen):
        if s.isalpha():
            string +=s
        else:
            stack[string]=len(string)
            string=''
    stack[string]=len(string)
    return max(stack, key=stack.get)


First Factorial

Have the function FirstFactorial(num) take the num parameter being passed and return the factorial of it. For example: if num = 4, then your program should return (4 * 3 * 2 * 1) = 24. For the test cases, the range will be between 1 and 18 and the input will always be an integer.

# sample1
Input: 4 
Output: 24

#sample2
Input: 8 
Output: 40320
def FirstFactorial(num):
  factorial = 1
  for i in range(num, 0, -1):
    factorial *= i
  return factorial

Min Window Substring(Mediam)

#algorithm #Facebok

MinWindowSubstring(strArr) take the array of strings stored in strArr, which will contain only two strings, the first parameter being the string N and the second parameter being a string K of some characters, and your goal is to determine the smallest substring of N that contains all the characters in K. For example: if strArr is ["aaabaaddae", "aed"] then the smallest substring of N that contains the characters a, e, and d is "dae" located at the end of the string. So for this example your program should return the string dae.

Another example: if strArr is ["aabdccdbcacd", "aad"] then the smallest substring of N that contains all of the characters in K is "aabd" which is located at the beginning of the string. Both parameters will be strings ranging in length from 1 to 50 characters and all of K's characters will exist somewhere in the string N. Both strings will only contains lowercase alphabetic characters.

Input: ["ahffaksfajeeubsne", "jefaa"] 
Output: aksfaje
def MinWindowSubstring(strArr):
    N = list(strArr[0])
    K = list(strArr[1])
    Ks = list(strArr[1]).copy()
    string = ''
    for i, s in enumerate(N):
        if s in K:
            K.remove(s)
            string +=s
            if not K:
                break
        elif string:
            string +=s
    submit = ''
    for r in string[::-1]:
        if r in Ks:
            Ks.remove(r)
        submit += r
        if not Ks:
            return submit[::-1]

コーディング試験用基礎問 from Letcode

Letcode problems

f:id:trafalbad:20210127075001p:plain


1. Two Sum

# exactly one solution
Input: nums = [2,7,11,15], target = 9
Output: [0,1]
Output: Because nums[0] + nums[1] == 9, we return [0, 1].
class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        # stack = {}
        stack = []
        for idx, p in enumerate(nums):
            if p in stack:
                # idx2 = stack[p]
                idx2 = stack.index(p)
                return [idx2, idx]
            else:
                # stack[target-p]=idx
                stack.append(target-p)


7. Reverse Integer

Given a signed 32-bit integer x, return x with its digits reversed. If reversing x causes the value to go outside the signed 32-bit integer range [-231, 231 - 1], then return 0.

Assume the environment does not allow you to store 64-bit integers (signed or unsigned).

Input: x = 123
Output: 321

Input: x = 120
Output: 21
class Solution:
    def reverse(self, x: int) -> int:
        reverse_str = str(int(abs(x)))[::-1]
        submit = int(reverse_str)
        # must check first 
        if submit>= 2** 31 -1 or submit<= -2** 31:
            return 0
        elif x<0:
            return -submit
        else:
            return submit

9. Palindrome Number

Given an integer x, return true if x is palindrome integer.

An integer is a palindrome when it reads the same backward as forward. For example, 121 is palindrome while 123 is not.

Input: x = 121
Output: true

Input: x = -121
Output: false
class Solution:
    def isPalindrome(self, x: int) -> bool:
        return str(x)==str(x)[::-1]

13. Roman to Integer

Input: s = "MCMXCIV"
Output: 1994
Explanation: M = 1000, CM = 900, XC = 90 and IV = 4.
class Solution:
    def romanToInt(self, s: str) -> int:
        d = {'M': 1000,'D': 500 ,'C': 100,'L': 50,'X': 10,'V': 5,'I': 1}
        total = 0
        for i in range(0, len(s)-1):
            if d[s[i]]>=d[s[i+1]]:
                total += d[s[i]]
            else:
                total -= d[s[i]]
        # last facter dose not be included in above loop
        total += d[s[-1]]
        return total


20. Valid Parentheses

Given a string s containing just the characters '(', ')', '{', '}', '[', ']', determine if the input string is valid.
An input string is valid if:

・Open brackets must be closed by the same type of brackets.
・Open brackets must be closed in the correct order.

# sample1
Input: s = "{[]}"
Output: true

# sample2
Input: s = "()[]{}"
Output: true


# sample3
Input: s = "([)]"
Output: false
class Solution(object):
    def isValid(self, s):
        stack = []
        mapping = {")": "(", "}": "{", "]": "["}
        for char in s:
            if char in mapping.keys():
                # when else, stack is empty
                c = stack.pop() if stack else '#'
                if mapping[char] != c:
                    return False
            else:
                stack.append(char)
        # for cases like '['
        return not stack

26. Remove Duplicates from Sorted Array

Given a sorted array nums, remove the duplicates in-place such that each element appears only once and returns the new length.

Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory.

# 訳:新しいlistを使わずに重複要素を削除して、listの長さをreturn しな
Input: nums = [0,0,1,1,1,2,2,3,3,4]
Output: 5, nums = [0,1,2,3,4]
Explanation: Your function should return length = 5, with the first five elements of nums being modified to 0, 1, 2, 3, and 4 respectively. It doesn't matter what values are set beyond the returned length.
class Solution:
    def removeDuplicates(self, nums: List[int]) -> int:
        i=0
        n = len(nums)
        for _ in range(n-1):
            if nums[i]==nums[i+1]:
                del nums[i]
            else:
                i +=1
        return len(nums)

35. Search Insert Position

Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order.

# sample1
Input: nums = [1,3,5,6], target = 5
Output: 2
# sample2
Input: nums = [1,3,5,6], target = 2
Output: 1
class Solution:
    def searchInsert(self, nums: List[int], target: int) -> int:
        for i,n in enumerate(nums):
            if nums[i] >= target:
                return i
            elif i == len(nums) - 1:
                return len(nums)

53. Maximum Subarray

Given an integer array nums, find the contiguous subarray (containing at least one number) which has the largest sum and return its sum.

Input: nums = [-2,1,-3,4,-1,2,1,-5,4]
Output: 6
Explanation: [4,-1,2,1] has the largest sum = 6.
class Solution:
    def maxSubArray(self, nums: List[int]) -> int:
        if not nums:
            return 0
       # curSum : save add subarray
       # maxSum :maximum add subarray in loop
        curSum = maxSum = nums[0]
        for num in nums[1:]:
            curSum = max(num, curSum + num)
            maxSum = max(maxSum, curSum)

        return maxSum

100. Same Tree

Given the roots of two binary trees p and q, write a function to check if they are the same or not.
Two binary trees are considered the same if they are structurally identical, and the nodes have the same value.

# sample 1
Input: p = [1,2,3], q = [1,2,3]
Output: true
# sample2
Input: p = [1,2], q = [1,null,2]
Output: false
# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
# TreeNodeインスタンスのpとqが両方なければTrue、どちらか一方なければFalse、valが異なっていればFalse」と処理
class Solution:
    def isSameTree(self, p: TreeNode, q: TreeNode) -> bool:
        if not p and not q:
            return True
        if not p or not q:
            return False
        if p.val!=q.val:
            return False
       # recursion (再帰関数)
        return self.isSameTree(p.left, q.left) and self.isSameTree(p.right, q.right)           


104,Maximum Depth of Binary Tree

# return 最も深い木の深さ
Given the root of a binary tree, return its maximum depth.

A binary tree's maximum depth is the number of nodes along the longest path from the root node down to the farthest leaf node.

Input   root = [3,9,20,null,null,15,7]
   3
         /  \
       9   20
             / \
          15  7

Output: 3
# 再帰関数:もし木が存在していれば1+(左右の木のうち、より深い木の深さ)を返し、木が存在しなければ0を返す

class Solution:
    def maxDepth(self, root: TreeNode) -> int:
        return 1 + max(self.maxDepth(root.left), self.maxDepth(root.right)) if root else 0

167. Two Sum II - Input array is sorted

Given an array nums of size n, return the majority element.

The majority element is the element that appears more than ⌊n / 2⌋ times. You may assume that the majority element always exists in the array.

# sample1
Input: nums = [3,2,3]
Output: 3
# sample2
Input: nums = [2,2,1,1,1,2,2]
Output: 2
class Solution:
    def majorityElement(self, nums: List[int]) -> int:
        d={}
        for val in nums:
            if val in d:
                d[val] +=1
            else:
                d[val] =1
        return max(d, key=d.get) # dictのvalueの最も大きいkeyをgetできる


171. Excel Sheet Column Number(解決法:ググリ力)

Given a column title as appear in an Excel sheet, return its corresponding column number.

For example:

A -> 1
    B -> 2
    C -> 3
    ...
    Z -> 26
    AA -> 27
    AB -> 28 
    ...
# sample1
Input: "A"   Output: 1
# sample 2
Input: "ZY"  Output: 701

解決法:「アルファベット 数字 python」でググった。

class Solution:
    def titleToNumber(self, alpha: str) -> int:
        num=0
        for index, item in enumerate(list(alpha)):
            num += pow(26,len(alpha)-index-1)*(ord(item)-ord('A')+1)
        return num