1 / 25
文档名称:

common features in lncrna annotation and classification a survey 2021 christopher klapproth文献参考.pdf

格式:pdf   大小:1,001KB   页数:25页
下载后只包含 1 个 PDF 格式的文档,没有任何的图纸或源代码,查看文件列表

如果您已付费下载过本站文档,您可以点这里二次下载

分享

预览

common features in lncrna annotation and classification a survey 2021 christopher klapproth文献参考.pdf

上传人:碧痕 2023/6/8 文件大小:1001 KB

下载得到文件列表

common features in lncrna annotation and classification a survey 2021 christopher klapproth文献参考.pdf

相关文档

文档介绍

文档介绍:该【common features in lncrna annotation and classification a survey 2021 christopher klapproth文献参考 】是由【碧痕】上传分享,文档一共【25】页,该文档可以免费在线阅读,需要了解更多关于【common features in lncrna annotation and classification a survey 2021 christopher klapproth文献参考 】的内容,可以使用淘豆网的站内搜索功能,选择自己适合的文档,以下文字是截取该文章内的部分文字,如需要获得完整电子版,请下载此文档到您的设备,方便您编辑和打印。:..non-codingRNAReviewCommonFeaturesinlncRNAAnnotationandClassi?cation:ASurveyChristopherKlapproth1,RituparnoSen2,,3,4,5,6,7,SvenFindei?1andJ?rgFallmann1,*1BioinformaticsGroup,puterScience,andInterdisciplinaryCenterforBioinformatics,UniversityofLeipzig,H?rtelstra?e16-18,D-04107Leipzig,Germany;******@-(.);******@-(.);******@-(.)2HelmholtzInstituteforRNA-BasedInfectionResearch(HIRI),Helmholtz-CenterforInfectionResearch(HZI),D-97080Würzburg,Germany;******@-(iDiv)Halle-Jena-petenceCenterforScalableDataServicesandSolutions,andLeipzigResearchCenterforCivilizationDiseases,UniversityLeipzig,D-04103Leipzig,Germany4MaxPlanckInstituteforMathematicsintheSciences,Inselstra?e22,D-04103Leipzig,Germany5InstituteforTheoreticalChemistry,UniversityofVienna,W?hringerstra?e17,A-1090Vienna,Austria6FacultaddeCiencias,UniversidadNationaldeColombia,BogotáCO-111321,Colombia7SantaFeInstitute,1399HydeParkRd.,SantaFe,NM87501,USA*Correspondence:******@-:Longnon-codingRNAs(lncRNAs)-associatedmecha-nisms,leadingtoeffectsindiseaseprogressionandestablishingthemasdiagnosticandtherapeutic,onlyafewrepresentativesofthisdiverseclassofRNAsarewellstudied,whilethevastmajorityispoorlydescribedbeyondtheexistenceoftheirtranscripts.-Citation:Klapproth,C.;Sen,R.;-establishedsetsoffeaturesStadler,.;Findei?,S.;Fallmann,?cationanddiscusstheirspeci??cation:toolsperformverywellforthetaskofdistinguishingcodingsequencefromotherRNAs,we?-codingRNA2021,7,-protein-https:///.AcademicEditor:PanagiotisAlexiouKeywords:lncRNA;featureextraction;machinelearning;codingsequence;classi?cationproblemsReceived:12November2021Accepted:6December2021Published:’sNote:MDPIstaysneutraldecade,eevidentthatmanyofthesenon-codingtranscriptsplayanimportantwithregardtojurisdictionalclaimsinroleasregulatorsofgeneexpression[1–4].Especiallythesubclassoflongnon-codingpublishedmapsandinstitutionalaf?l-RNAs(lncRNAs)isalsoassociatedwithawidearrayofdiseases,inparticularcancer,[5–8].Theidenti?,:?(ML)ethedefactostandardforautomatedannotationLicenseeMDPI,Basel,,enablingsigni?cantbreakthroughsinthe?eld[9,10].Still,thedistinctionofBY)license(https://protein-codingtranscriptsandlncRNAshasremainedanon-trivialproblem,inparticular/licenses/by/sincelncRNAsandprotein-/).organizationintointronsandexons[11].Non-codingRNA2021,7,:///://rnal/ncrna:..Non-codingRNA2021,7,772of25putationaltoolsforlncRNAannotationandfunctionpredictionwithafocusonthefeaturesthatareusedinthede--basedparameters(suchasnucleotideork-merfrequencies)plexphysico-,puteparametersthatquantifycodingpo-tential[12],-CodingTranscriptAnnotationMostclassicalmachinelearningapproachesthataddresslncRNAsarebuiltaroundlimitedsetsoffeaturesowingtothelimitedcataloguesofwell-,thesefeaturesaredesignedtocapturethe“codingpotential”-cutandmeasurabledistinctionbetweencodingandnon-(ORFs)[13–15]icregionsthatproduceisoformswithbothcodingandanon-codingmodeofaction[16,17],however,-establishedfeaturesandalgorithmsforlncRNAclassi?(blue)andalgorithms(orange)(ORF)(k-mer),astheyaretwoofthebyfarmost?exibleapproachesfornonlinearclassi?cation.?k-mersarek--mer,whereascodonsarebase-tripletsandthus3---mersaremoreabundantandtheirrelativefrequenciesaremorestronglycross-correlatedthanforlongerk--merwouldencodemoreinformationthana3-mer,forexample,astheurrenceofaparticular7-merismuchlowerthanthatofa3-,frequenciesoflongerk-putationallymoreexpensivetocalculate[18,19]:Fork=7therearealready16,384distinctfeatures,,suf?cientlylargetrainingsetsandtestsetsarerequired.?Euclideanandlogarithmicdistancesofthefrequencyvectorsofcertainfeaturessuchask-merfrequenciesrelativetotheexpectedvaluesofthesefrequenciesinarefer-:..Non-codingRNA2021,7,773of25encesetoflncRNAsandprotein-codingsequences,respectively,areutilizedin,.,LncFinder[20].?GCcontentisthenumberofpurinebases(eitherGorC)[21].ontentcanthereforeserveasanindicatorforcodingpotential.?FickettTESTCODEwasthe?rstmethodproposedto?ndadistinguishingfactorbetweenthetwoclassesofRNA[22].Tocircumventthehardproblemofidentifyinginitiationsignalsinasequence,theauthorsdevisedatesttoidentifywhetheragivenpieceofDNAorRNAiscodingornon--,the?rstfourbeingmeasuresofthebasesA,T,GandC,,puted,=?piwi.(1)i=1ThedetectiontoolCPAT[23],derivestheprobabilityofabaseB2fA,C,G,Tgbeingfavoredatacertainpositionas:B1=CardinalityofbaseBinposition0,3,6,...B2=CardinalityofbaseBinposition1,4,7,...B3=CardinalityofbaseBinposition2,5,8,...max(B1,B2,B3)Bpos=.min(B1,B2,B3)+1Thederivedvaluesarethenconvertedintoprobabilities(p)usingthelookuptableprovidedbyFickett[22]ountnewlyannotatedtranscripts,asprovidedforexamplebyWangetal.[23].Inprinciple,thisgivesameasureforhownon-randomanucleotideisdistributedacross3-mersofagivense-,winEquation(1)%and97%sensitivityandspeci?city,respectively,onlncRNAsequences,beinginconclusivefor18%ofthesequences[23].?[23],,non-codingsequencesanegativescore[24].Thereareseveralwaysofde?[putesthelog-likelihoodratiobetweencodingandnon-=H1,H2,...,Hmwithmhexamersisderivedas:m1F(Hi)hexamer_score=?log0,mi=1F(Hi)whereF(H)andF0(H)representtheprobabilityofeachhexamertobepartofiiaprotein-codingandnon-codingsequence,respectively,with4096totalhexam-erspossible.?(AUG)andendsatoneofthestopcodons(UAA/UGA/UAG),[12].ItshouldbenotedthatnoteveryORFtranslatableonasequencelevelisinfacttranslatedtoapeptidechain.:..Non-codingRNA2021,7,774of25?,alowcoverageindicatesanon-codingsequence[25].?(AA)sequenceofagivenORFcanbeanalyzedforphysico--.?Hydropathyisameasureofhydrophilicorhydrophobicinteractionsofapotentialpeptidesequencewithawater-,hydrophilicorneutral,,see[26]forareviewofaminoacidhydrophobicityscales.?Isoelectricpoint:Theisoelectricpoint(pI),-,see,.,[27].?PolyAabundance:’-(5’-AAUAAA-3’)asafractionofsequencelengthcanbeusedameasureofcodingpotential.?RNAminimumfreeenergy(MFE)isoftenusedasametricfortheinherentstabil--codingRNAstendtohaveplexity,-codingandnon-codingsequencesdevelopedoverthelasttwodecadesde?nedtheproblemasbinaryclassi?-nationoffeaturesasdescribedinSection2areappliedtoseparatethesetwoclassesbasedontheircodingpotential[25].Intheseefforts,ORF-relatedfeaturesinparticularareoftenutilized,-codingfunction[28,29].monstrategyinvolvesthede-compositionofacandidatesequenceintok-merpatternsandreliesonaninherentk--thermore,welimitedthislisttotoolsthatperformtheclassi?-freeclassi?ershavetheadvantageofbeingapplicablealsotospecieswithfewornocloselyrelatedneighborswithwhichreliablea