1 / 3
文档名称:

SpamHamEmailClassification垃圾邮件分类RNNG.pdf

格式:pdf   大小:198KB   页数:3页
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

SpamHamEmailClassification垃圾邮件分类RNNG.pdf

上传人:鼠标 2023/6/8 文件大小:198 KB

下载得到文件列表

SpamHamEmailClassification垃圾邮件分类RNNG.pdf

相关文档

文档介绍

文档介绍:该【SpamHamEmailClassification垃圾邮件分类RNNG】是由【鼠标】上传分享,文档一共【3】页,该文档可以免费在线阅读,需要了解更多关于【SpamHamEmailClassification垃圾邮件分类RNNG】的内容,可以使用淘豆网的站内搜索功能,选择自己适合的文档,以下文字是截取该文章内的部分文字,如需要获得完整电子版,请下载此文档到您的设备,方便您编辑和打印。[Kaggle]SpamHamEmailClassificatio垃n圾邮件分类(RNNG。。。?章?录练****地址:相关博?1.?读?数据读取数据,test集没有标签impor?tpandasa?s?pdimpor?tnumpya?s?nptrain=??(")"test?=?("")()数据有?效的单元prin((((l)==True),?axis=0))prin((((l)==True),?axis=0))存在?Na?单元格[0?6?0?0][0?1?0]fillna?填充处理train=??("?")test?=?("?")prin((((l)==True),?axis=0))prin((((l)==True),?axis=0))填充完成,显??sum?=?0[0?0?0?0][0?0?0]y?标签?只有?0?不是垃圾邮件,?1?是垃圾邮件prin(ttrain['spam]'.unique())[0?1]2.??本处理邮件内容和主题合并为?个特征X_train?=?train['subject]'?+?'??'+?train['email]'y_train?=?train['spam]'X_test?=?tes[t'subject]'?+?'??'+?tes[t'email]'?本转成?tokens?ids?序列from??mpor?tTokenizermax_words=??300tokenizer=??Tokenize(rnum_words=max_words,?lower=True,?spli=t'?)'#?只给频率最?的300个词分配?id,(list(X_train)+list(X_test))?#?tokenizer训?练X_train_tokens=??(sX_train)X_test_tokens=??(sX_test)pad?ids?序列,使之长度?样#?样本?tokens的?长度不?样,padmaxlen=??100from??por?tsequenceX_train_tokens_pad=??(sX_train_token,s?maxlen=maxlen,padding='post)'X_test_tokens_pad=??(sX_test_tokens,?maxlen=maxlen,padding='post)'3.?建模embeddings_dim=?30?#?词嵌?向量维度from??mpor?tMode,?lSequentialfrom??impor?tEmbeddin,g?LSTM,?GRU,?SimpleRNN,?Densemodel=??Sequentia()(Embeddin(ginput_dim=max_word,s?#?Size?of?the?vocabulary????????????????????o=uembeddings_dimtput_dim,?#?词嵌?的维度????????????????????in=pmuat_xlleenn)g)(GRU(units=64))?#?可以改为?SimpleRNN,??(Dense(units=1,?activation='sigmoid))'()模型结构:Mode:l?"sequential_5"_________________________________________________________________Layer?(type)?????????????????Output?Shape????#?????????????Param?=================================================================embedding_2(?Embeddin)g????(N??one,?100,?30)??????9??0?0?0???????_________________________________________________________________gru?(GRU)???????????(?N??o?n?e?,??6?4?)?????????1?8?4??3?2?????_________________________________________________________________dense_2?(Dense)????????(?N?o??n?e,??1)??????????6?5??????????????=================================================================Total?params:?27,497Trainable?params:?27,497Non-trainable?params:?0_________________________________________________________________4.?(optimizer='adam',????????????='binar??lossy_crossentropy,'??????????????=m['aectrciucsracy]')?#?配置模型history=??(X_train_tokens_pad,?y_train,????????????????????b=at12ch_size8,?epoch=s10,?validation_sp=)(")?"#?保存训练好的模型绘制训练曲线from?matplotlibim?por?tpyploat?s?().plot(figsize=(8,?5))(True)()5.?测试pred_prob=??(X_test_tokens_pad).squeeze()pred_class?=?(pred_prob>??).astype()id?=?tes[t'id]'output=??({'id:'id,?'Class':?pred_clas})("",??index=False)3种RNN模型对?: